Data Science newsletter – March 10, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for March 10, 2020


Data Science News

My colleagues & I have been studying/researching crisis informatics for over a decade, looking at how people converge online to make sense of crisis events. This field is informed by much older research in the sociology of disaster.

Twitter, Kate Starbird


Crisis events, as they unfold, are often characterized by high uncertainty — about what is happening and what we should do about it, individually and collectively. In these cases, the “facts” of the situation are dynamic, and there are still unknowns. [thread]

Researcher danah boyd on how to protect the census and fix tech

Protocol, Linda Kinstler


Today, boyd is focused on exposing and mitigating the myriad risks that have accompanied the explosion of commercial data and the acceleration of computing power. Most recently, she’s been obsessed with the upcoming census. She released a report on the new and unprecedented stakes of the count, which will implement a novel technical system to protect individual privacy for the first time.

Protocol’s Linda Kinstler spoke with boyd about what it will take to ensure that the census stays anonymous, what the “great reckoning” will look like for tech, and why it is still to come.

The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis

PLOS One; Laure Perrier, Erik Blondal, Heather MacDonald



Funding agencies and research journals are increasingly demanding that researchers share their data in public repositories. Despite these requirements, researchers still withhold data, refuse to share, and deposit data that lacks annotation. We conducted a meta-synthesis to examine the views, perspectives, and experiences of academic researchers on data sharing and reuse of research data.

We searched the published and unpublished literature for studies on data sharing by researchers in academic institutions. Two independent reviewers screened citations and abstracts, then full-text articles. Data abstraction was performed independently by two investigators. The abstracted data was read and reread in order to generate codes. Key concepts were identified and thematic analysis was used for data synthesis.

We reviewed 2005 records and included 45 studies along with 3 companion reports. The studies were published between 2003 and 2018 and most were conducted in North America (60%) or Europe (17%). The four major themes that emerged were data integrity, responsible conduct of research, feasibility of sharing data, and value of sharing data. Researchers lack time, resources, and skills to effectively share their data in public repositories. Data quality is affected by this, along with subjective decisions around what is considered to be worth sharing. Deficits in infrastructure also impede the availability of research data. Incentives for sharing data are lacking.

Researchers lack skills to share data in a manner that is efficient and effective. Improved infrastructure support would allow them to make data available quickly and seamlessly. The lack of incentives for sharing research data with regards to academic appointment, promotion, recognition, and rewards need to be addressed.

U.S. bans could make Huawei stronger

Axios, Erica Pandey


The U.S.’s blunt policy of walling itself off from Huawei could backfire, making the Chinese telecom giant even stronger in the long term.

Why it matters: The grand decoupling of American and Chinese tech amid trade tensions and cybersecurity concerns, of which Huawei is at the center, is pushing China’s companies to become increasingly self-reliant. Huawei’s progress could position it to take the lead in the global U.S.-China tech race, experts say.

CRISPR Used To Edit Genes Inside A Patient With A Rare Form Of Blindness

NPR, Shots blog, Rob Stein


For the first time, scientists have used the gene-editing technique CRISPR to try to edit a gene while the DNA is still inside a person’s body.

The groundbreaking procedure involved injecting the microscopic gene-editing tool into the eye of a patient blinded by a rare genetic disorder, in hopes of enabling the volunteer to see. They hope to know within weeks whether the approach is working and, if so, to know within two or three months how much vision will be restored.

“We’re really excited about this,” says Dr. Eric Pierce, a professor of ophthalmology at Harvard Medical School and director of the Inherited Retinal Disorders Service at Massachusetts Eye and Ear. Pierce is leading a study that the procedure launched.

All numbers are made up, some are useful

Normcore Tech, Vicki Boykis


What I mean is that all the data that we trust and believe on a daily basis, is only accurate in a specific context, at a specific time, and at a specific level. If you dig deep enough, ultimately all of the data in the world that drives major and minor decisions alike is built on wobbly foundations.

Take, for example, the coronavirus mortality rate. We have no idea what the true number is. I mean we have some ideas of true numbers. But we’re not taking into account: undercounting minor cases that never get tested and never go to the hospital. Undercounting deaths that haven’t happened yet. Undercounting due to political reasons. Undercounting, simply because maybe hospitals are overwhelmed with the number of cases. Overcounting or undercounting recoveries. And much, much, more. As Elea says, human behavior is inconsistent and difficult to measure, and the virus is exactly an example of trying to measure human behavior on a large scale.

How new data can make ecological forecasts as good as weather forecasts

University of Wisconsin, News


When El Nino approaches, driven by warm Pacific Ocean waters, we’ve come to expect both drenching seasonal rains in the southern U.S. and drought in the Amazon. Those opposite extremes have huge effects on society and are increasingly predictable thanks to decades of weather data.

Soon, University of Wisconsin–Madison ecologist Ben Zuckerberg thinks we’ll be able to pull off the same forecasting feat for bird migrations and wildlife populations. That’s because just as those recurring changes in climate have predictable consequences for humans, they also have predictable effects on plants and animals.

For instance, ecological predictions could help us prepare for diseases in crops or population crashes in endangered species. Good forecasting could tell us where conservation measures are needed most in the coming year or decade.

Carnegie Mellon Extends Forecasting of Flu-Like Illness

Carnegie Mellon University, School of Computer Science, Machine Learning Department


The flu season peaked weeks ago and the Delphi Research Group at Carnegie Mellon University normally would plan to halt its national and regional forecasts of influenza-like illness by the end of May.

But this is no ordinary disease season. At the request of the Centers for Disease Control and Prevention, the CMU researchers will continue their forecasts into the summer as the nation braces for the spread of the flu-like COVID-19 illness, caused by a new strain of coronavirus officially known as SARS-2-CoV.

“We likely will see another wave of influenza-like illness in the coming weeks or month,” said Roni Rosenfeld, leader of the Delphi group and head of CMU’s Machine Learning Department. Delphi is one of two Influenza Forecasting Centers of Excellence designated last year by the CDC.

Machine Learning Takes On Antibiotic Resistance

Quanta Magazine, Katherine Harmon Courage


Using computers and machine learning to make sense of mountains of biomedical data is nothing new. But the team at the Massachusetts Institute of Technology, led by James Collins, who studies applications of systems biology to antibiotic resistance, and Regina Barzilay, an artificial intelligence researcher, achieved success by developing a neural network that avoids scientists’ potentially limiting preconceptions about what to look for. Instead, the computer develops its own expertise.

With this discovery platform, which has been made freely available, “you’re going to identify molecules that don’t look like antibiotics you’re used to seeing,” Collins said. “It really shows how you can use the emerging technology of deep learning in an innovative manner to discover new chemistries.”

Biodiversity loss is hurting our ability to combat pandemics

World Economic Forum, John Scott


The frequency of disease outbreaks has been increasing steadily. Between 1980 and 2013 there were 12,012 recorded outbreaks, comprising 44 million individual cases and affecting every country in the world. A number of trends have contributed to this rise, including high levels of global travel, trade and connectivity, and high-density living – but the links to climate change and biodiversity are the most striking.

Deforestation has increased steadily over the past two decades and is linked to 31% of outbreaks such as Ebola, and the Zika and Nipah viruses. Deforestation drives wild animals out of their natural habitats and closer to human populations, creating a greater opportunity for zoonotic diseases – that is, diseases that spread from animals to humans. More broadly, climate change has altered and accelerated the transmission patterns of infectious diseases such as Zika, malaria and dengue fever, and has caused human displacement. Movements of large groups to new locations, often under poor conditions, increases displaced populations’ vulnerability to biological threats such as measles, malaria, diarrheal diseases and acute respiratory infections.

Coronavirus Pushes Job Interviews for College Students Online

Bloomberg World, Janet Lorin


The coronavirus is changing job recruiting at U.S. colleges, as travel bans force some employers to scrap in-person campus interviews for virtual ones and students adjust to new networking norms.

Intel Corp. isn’t participating in external recruiting events. Inc. is shifting some on-site interviews to a virtual format. Accounting giant PwC is converting in-person office visits to videoconferencing and phone calls for about 1,000 undergraduates with offers to work there, mostly in summer jobs.

“Bringing large volumes of students to offices, requiring them to travel — we didn’t feel that was the right thing to do,” said Rod Adams, who oversees PwC’s recruiting in the U.S. and Mexico.

Amazon is secretly working on a cure for the common cold

CNBC, Christina Farr


  • Amazon is working on a cure for the common cold in its highly secretive skunkworks group, “Grand Challenge.”
  • The code name is “Project Gesundheit,” according to several people familiar with the project.

  • Amazon starts selling cashierless checkout technology to other retailers

    CNBC, Annie Palmer


    The cashierless technology powers Amazon Go stores, which allow shoppers to enter a store by scanning the Go app and exit without needing to stand in a checkout line. Cameras and sensors track what items they choose and charge them when they leave.

    Equinor, Shell team up on digital transformation projects

    World Oil


    Equinor and Shell have signed a Memorandum of Understanding on digital collaboration to develop solutions and methods together through the exchange of expertise within areas like data science, artificial intelligence and 3D printing.

    The collaboration is expected to entail co-innovation across the whole value chain, such as maintenance, production optimization and supply chain management.

    “We are already collaborating closely in the Open Subsurface Data Universe (OSDU) initiative and see many mutual benefits as both companies have applied cloud-based digital solutions as an approach to our industry’s digital transformation. Such collaborations are increasingly important to strengthen safety, reduce carbon emissions and realize value by applying digital technologies,” says Torbjørn F. Folgerø, chief digital officer in Equinor ASA.

    ZipRecruiter: 4 states account for over 90% of AI jobs

    VentureBeat, Kyle Wiggers


    AI and machine learning jobs are in strong demand around the U.S., but not every region has benefited equally from the boom. That’s according to ZipRecruiter’s Future of Work Report 2020, which is out today. … Four states — California, Washington, New York, and Massachusetts — account for over 90% of jobs requiring AI expertise on ZipRecruiter. Growth in AI-related jobs grew 35% in those states from 2017 to 2019, and as of 2019 they had 60% of the combined share of AI jobs and 34% of AI-related jobs in the U.S.:


    NewsRank for Social Good Hackathon

    Tow-Knight Center for Entrepreneurial Journalism, Georgia Tech, and Hacks/Hackers.


    Atlanta, GA March 27-28 at Georgia Tech (Tech Square Research Square Banquet Hall). “Teams will prototype a news ranking and recommendation algorithm/service that surfaces articles from existing news and information online from a country of their choice. Information can come from existing news sites, social media platforms (Twitter, YouTube, Reddit… you name it!), or a combination of both.” [registration required]


    Have you responded to the #dataviz naming convention survey?

    I’m woring with @katycns, @AndreasBueckle & @Alicia_Arza to explore the names of graphs, charts, and diagrams.

    Call for Paper: International Workshop on Federated Learning at IJCAI 2020

    Yokohama, Japan July 13 (tentative). “In order to explore how the AI research community can adapt to this new regulatory reality, we organize this one-day workshop in conjunction with the 29th International Joint Conference on Artificial Intelligence.” Deadline for submissions is April 26.

    Department of Energy to Provide $40 Million for Artificial Intelligence Research at DOE Scientific User Facilities

    “The U.S. Department of Energy (DOE) announced plans to provide up to $40 million over three years for new research in data, artificial intelligence and machine learning to address the challenges of producing and managing data at DOE scientific user facilities.” Applications are due May 1.

    FOCI ’20 Preliminary Call for Papers

    Boston, MA “The 10th USENIX Workshop on Free and Open Communications on the Internet (FOCI ’20) will take place August 11, 2020, and will be co-located with the 29th USENIX Security Symposium.” Deadline for paper submissions is May 21.
    Tools & Resources

    The 5 Components Towards Building Production-Ready Machine Learning Systems

    Medium, Cracking the Data Science Interview blog, James Le


    To conceptualize this framework, there is a significant paper from Google called ML Test Score — A Rubric for Production Readiness and Technical Debt Reduction — which is an exhaustive framework/checklist from practitioners at Google. It is a follow-up to previous work from Google, such as (1) Hidden Technical Debt in ML Systems, (2) ML: The High-Interest Credit Card of Technical Debt, and (3) Rules of ML: Best Practices for ML Engineering.

    Google launches TensorFlow Quantum, a machine learning framework for training quantum models

    VentureBeat, Khari Johnson


    Google today announced the launch of TensorFlow Quantum, bringing together machine learning and quantum computing initiatives at the company. The framework can construct quantum datasets, prototype hybrid quantum and classic machine learning models, support quantum circuit simulators, and train discriminative and generative quantum models.

    How to Build High-Performing Engineering Teams

    HackerRank Blog, Vivek Ravisankar and Mike Tria


    “This post discusses how to assess developer candidates and build high-performing engineering teams.”

    Fixing academic posters: the #BetterPoster approach

    astrobites blog, Oliver Hall


    Mike Morrison, a psychology PhD student, wants to make poster sessions more efficient. In his opinion, the ‘cardinal sin’ of posters is that they often require somebody to read them for 10 minutes straight, in a time-pressured environment. He identified 3 things that posters should embody, and breaks them down as follows:

  • Posters should maximise the amount of insight for people attending a poster session.
  • Detailed information should still be present, but not the core of the poster.
  • These design goals should be easy to achieve and accessible to new and old scientists.

    Tenured and tenure track faculty positions

    Associate Professor/Professor/Full Professor Research Position in Data Management and Data Wrangling

    Hasselt University, Data Science Institute; Hasselt, Belgium
    Full-time, non-tenured academic positions

    OHI/O Program Coordinator

    The Ohio State University, Computer Science and Engineering Department; Columbus, OH

    Research Director

    University of Chicago, Urban Crime Labs; Chicago, IL
    Full-time positions outside academia

    Research Engineer, AllenNLP

    The Allen Institute for Artificial Intelligence; Seattle, WA

    Policy Fellow or Policy Counsel (Youth & Education)

    Future of Privacy Forum; Washington, DC

    Leave a Comment

    Your email address will not be published.