Data Science newsletter – March 18, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for March 18, 2020


Data Science News

Automatic, wearable-based, in-field eating detection approaches for public health research: a scoping review

npj Digital Medicine, Brooke M. Bell et al.


Dietary intake, eating behaviors, and context are important in chronic disease development, yet our ability to accurately assess these in research settings can be limited by biased traditional self-reporting tools. Objective measurement tools, specifically, wearable sensors, present the opportunity to minimize the major limitations of self-reported eating measures by generating supplementary sensor data that can improve the validity of self-report data in naturalistic settings. This scoping review summarizes the current use of wearable devices/sensors that automatically detect eating-related activity in naturalistic research settings. Five databases were searched in December 2019, and 618 records were retrieved from the literature search. This scoping review included N = 40 studies (from 33 articles) that reported on one or more wearable sensors used to automatically detect eating activity in the field. The majority of studies (N = 26, 65%) used multi-sensor systems (incorporating > 1 wearable sensors), and accelerometers were the most commonly utilized sensor (N = 25, 62.5%). All studies (N = 40, 100.0%) used either self-report or objective ground-truth methods to validate the inferred eating activity detected by the sensor(s). The most frequently reported evaluation metrics were Accuracy (N = 12) and F1-score (N = 10). This scoping review highlights the current state of wearable sensors’ ability to improve upon traditional eating assessment methods by passively detecting eating activity in naturalistic settings, over long periods of time, and with minimal user interaction. A key challenge in this field, wide variation in eating outcome measures and evaluation metrics, demonstrates the need for the development of a standardized form of comparability among sensors/multi-sensor systems and multidisciplinary collaboration.

A fiasco in the making? As the coronavirus pandemic takes hold, we are making decisions without reliable data

STAT, John P.A. Ioannidis


The current coronavirus disease, Covid-19, has been called a once-in-a-century pandemic. But it may also be a once-in-a-century evidence fiasco.

At a time when everyone needs better information, from disease modelers and governments to people quarantined or just social distancing, we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.

Coronavirus will most hurt industries already suffering disruption

CNBC, Alex Sherman


  • Prolonged quarantines may alter consumer behavior in ways that won’t be reversed.
  • The effects of consumer behavior will be determined by the length of at-home stays and will differ from industry to industry.
  • Movie theaters, fitness centers, commercial real estate companies and traditional pay TV are all at-risk industries.

  • Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2)

    Science, Ruiyun Li et al


    Estimation of the prevalence and contagiousness of undocumented novel coronavirus (SARS-CoV2) infections is critical for understanding the overall prevalence and pandemic potential of this disease. Here we use observations of reported infection within China, in conjunction with mobility data, a networked dynamic metapopulation model and Bayesian inference, to infer critical epidemiological characteristics associated with SARS-CoV2, including the fraction of undocumented infections and their contagiousness. We estimate 86% of all infections were undocumented (95% CI: [82%–90%]) prior to 23 January 2020 travel restrictions. Per person, the transmission rate of undocumented infections was 55% of documented infections ([46%–62%]), yet, due to their greater numbers, undocumented infections were the infection source for 79% of documented cases. These findings explain the rapid geographic spread of SARS-CoV2 and indicate containment of this virus will be particularly challenging.

    Coronavirus: seven ways collective intelligence is tackling the pandemic

    The Conversation, Aleks Berditchevskaia and Kathy Peach


    Advances in digital technologies have transformed what can be achieved through collective intelligence in recent years – connecting more of us, augmenting human intelligence with machine intelligence, and helping us to generate new insights from novel sources of data. It is particularly suited to addressing fast-evolving, complex global problems such as disease outbreaks.

    Here are seven ways it is tackling the coronavirus pandemic:

    1) Predicting and modelling outbreaks

    Phones Could Track the Spread of Covid-19. Is It a Good Idea?

    WIRED, Business, Will Knight


    China and South Korea used smartphone apps to monitor people with the disease. But Americans have different views of privacy and data collection.

    NIH clinical trial of investigational vaccine for COVID-19 begins

    National Institutes of Health, News Releases


    A Phase 1 clinical trial evaluating an investigational vaccine designed to protect against coronavirus disease 2019 (COVID-19) has begun at Kaiser Permanente Washington Health Research Institute (KPWHRI) in Seattle. The National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health, is funding the trial. KPWHRI is part of NIAID’s Infectious Diseases Clinical Research Consortium. The open-label trial will enroll 45 healthy adult volunteers ages 18 to 55 years over approximately 6 weeks. The first participant received the investigational vaccine today.

    Microsoft taps Eric Horvitz as first chief scientific officer; Peter Lee takes over Microsoft Research – GeekWire

    GeekWire, Taylor Soper


    Microsoft today named its first chief scientific officer, promoting longtime research leader and technical fellow Eric Horvitz to the newly created role. It also announced Microsoft Healthcare chief Peter Lee as the new head of Microsoft Research.

    Investors Liquidate Everything in Record $137 Billion Cash Haul

    Yahoo Finance, Bloomberg, Cecile Gutscher


    Investors made their biggest dash for cash in history over the week that broke the bull market.

    They channeled $137 billion into cash-like assets and a record $14 billion into government bonds in the five days through March 11, according to Bank of America Corp. research citing EPFR Global data. Gold got its second-biggest inflow ever at $3 billion.

    As the coronavirus outbreak is met with policy impotence from Europe to America, money managers are liquidating en masse. The losers include investment-grade and high-yield bonds and those of developing nations. Even before Black Thursday, investors yanked an unprecedented $34 billion out of these risk assets in total in the week through Wednesday.

    A year without conferences? How the coronavirus pandemic could change research

    Nature, News, Giuliana Viglione


    As scientific meetings are cancelled worldwide, researchers are rethinking how they network — a move that some say is long overdue.

    Speeding up machine learning

    Washington State University, WSU Insider


    One of the biggest challenges in AI is that computers simply can’t do the computations fast and accurately enough, especially in regards to understanding fast-changing images.

    Washington State University researchers have recently developed a computer architecture that achieves similar accuracy as conventional graphical processing units (GPUs) but works more than 50 times faster. Led by graduate student Biresh Joardar and Partha Pande, Boeing Centennial Chair in Computer Engineering, the researchers presented their work virtually this month at the prestigious Design, Automation and Test in Europe Conference (DATE) in Grenoble, France.

    Mapping Friday’s 30% Drop in NYC Subway Ridership

    Ben Wellington, I Quant NY blog


    Each station saw it’s own unique drop in ridership, and I figured that those drops would not be evenly distributed. Though many companies moved to telecommuting last week, companies like retail operations and restaurants still need their employees to come if they are going to remain open. So, those employees can’t simply call in. Given that, I wondered if the drop in subway ridership was going to be larger in more affluent neighborhoods that have a higher proportion of employees that can telecommute?

    To find out, I made a map of NYC subway stations, where small circles indicate large decreases in the number entrances at that station, and larger ones indicate small decreases. This is a bit odd, but the idea was to make it clear where there was relatively MORE sustained ridership when compared to other stations. Clicking on a station will give you the name and the % change for that station.

    LinkedIn Petitions CFAA Data Scraping Case to Supreme Court

    Law Street, Tech, Kirsten Errick


    LinkedIn has filed a petition for a writ of certiorari to the Supreme Court of the United States in response to the Ninth Circuit’s decision in its case against hiQ Labs. LinkedIn has asked the Supreme Court to examine “[w]hether a company that deploys anonymous computer ‘bots’ to circumvent technical barriers and harvest millions of individuals’ personal data from computer servers that host public-facing websites – even after the computer servers’ owner has expressly denied permission to access the data – ‘intentionally accesses a computer without authorization’ in violation of the Computer Fraud and Abuse Act.” The Supreme Court had previously granted an extension for LinkedIn to file the petition.

    Pitt Turns Rapid Coronavirus Data Sharing into Sustainable Research Infrastructure

    UPMC & Pitt Health Sciences News Blog


    Wilbert van Panhuis, M.D., Ph.D., was scrolling through Twitter over the winter holiday break when he noticed chatter among infectious disease epidemiologists about a new virus infecting people in Wuhan, China.

    Those tweets spurred a scramble for his team at the University of Pittsburgh to establish a platform for research collaborations and data sharing on what would become the COVID-19 pandemic. Immediately, the researchers started to compile datasets and early research publications into a central COVID-19 repository for the scientific community, and last week, they launched the online portal for COVID-19 modeling research — a clearinghouse for sharing data-driven discoveries about COVID-19.

    W&M approved to offer bachelor’s degree in data science

    William & Mary, News & Media


    The State Council of Higher Education for Virginia has approved William & Mary’s proposal for a new bachelor of science degree in data science.

    A March 9 SCHEV letter to university Provost Peggy Agouris states that the new B.S. in data science degree is approved to begin in the fall 2020 semester.


    MinneWiADS – Women in Analytics and Data Science Conference



    Eden Prairie, MN May 15. “Join us for MinneWiADS, a brand new data science and tech conference bringing together leaders, experts, and interested individuals in analytics, especially those who identify female and non-binary, to share knowledge, make connections, and inspire one another.” [$$]


    COVID-19 Open Research Dataset Challenge (CORD-19)

    “A list of our initial key questions can be found under the Tasks section of this dataset. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics.” … “Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.” Round 1 submission deadline is April 16.
    Tools & Resources

    Announcing CHIME, A tool for COVID-19 capacity planning

    University of Pennsylvania, Penn Medicine


    As we prepare for the additional demands that the COVID-19 outbreak will place on our hospital system, our operational leaders need up-to-date projections of what additional resources will be required. Informed estimates of how many patients will need hospitalization, ICU beds, and mechanical ventilation over the coming days and weeks will be crucial inputs to readiness responses and mitigation strategies.

    To this end, the Predictive Healthcare team at Penn Medicine has developed a tool that leverages SIR modeling to assist hospitals with capacity planning around COVID-19.

    [2002.09405] Learning to Simulate Complex Physics with Graph Networks

    arXiv, Computer Science > Machine Learning; Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia


    Here we present a general framework for learning simulation, and provide a single model implementation that yields state-of-the-art performance across a variety of challenging physical domains, involving fluids, rigid solids, and deformable materials interacting with one another. Our framework—which we term “Graph Network-based Simulators” (GNS)—represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time. Our model was robust to hyperparameter choices across various evaluation metrics: the main determinants of long-term performance were the number of message-passing steps, and mitigating the accumulation of error by corrupting the training data with noise. Our GNS framework is the most accurate general-purpose learned physics simulator to date, and holds promise for solving a wide range of complex forward and inverse problems.

    COVID-19 Open Research Dataset (CORD-19)

    The Allen Institute for Artificial Intelligence, Semantic Scholar


    “In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.”


    Full-time positions outside academia

    Director, IARPA Office of Collections

    Office of the Deputy Director of National Intelligence for Strategy and Engagement (S&E); Bethesda, MD

    Director, IARPA Office of Analysis

    Office of the Deputy Director of National Intelligence for Strategy and Engagement (S&E); Bethesda, MD

    Leave a Comment

    Your email address will not be published.