Data Science newsletter – February 15, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for February 15, 2020


Data Science News

Apple’s Independent Repair Program Is Invasive to Shops and Their Customers, Contract Shows

VICE, Motherboard, Maddie Stone


The contract states independent repair shops must agree to audits and inspections by Apple, even if they leave the program.

Learning from The Man who Solved the Market

Towards Data Science, Paul Tune


The first time I heard of Jim Simons was as a graduate student back in 2007, having come across the famous mathematician’s name when I was perusing differential geometry textbooks out of curiosity. At the time, little did I know that we were on the brink of one of the worse financial crisis in recent history, as well as a monumental event in US politics.

Some time later in 2009, I met a finance graduate at a university function in Melbourne who, upon asking what I did, remarked about Renaissance Technologies or RenTech, for short. He started painting a picture of the secretive, unorthodox hedge fund that hired no one with finance degree; only STEM graduates, preferably in the hard sciences, such as physics and mathematics. It sounded like a cult of modern day financial alchemists huddled in an office building somewhere in Long Island, New York.

I was fascinated: here was a hedge fund that didn’t operate like most funds, and did all their trading in an automated, dispassionate manner with minimal human intervention. The story really spoke to me because I’ve always had an obsession with predicting the future, whether it’s in trading or otherwise, and building machines that help us to do that more effectively (at that point, I had no idea I would start a career in data science).

Again, the name Jim Simons came up.

The Commons Project: Building a ‘public trust’ for health data sharing

Health Evolution, Tom Sullivan


At the Health Evolution Summit in April, the Commons Project will hold its first CommonHealth trustees and partners meeting, bring a new Android-based app that will be available for public download to enable patients to access health data and discuss the big picture utility-like data sharing service it envisions.

CEO Paul Meyer describes CommonHealth as an open-source, non-profit public service that the Commons Project is building with a broad coalition of public and private partners across the health and technology ecosystems. The goal? Building public infrastructure for the digital era with longevity in mind. Think: 100 years, or the lifecycle of at least a generation of human beings.

Duke’s cancer research and AI health initiative gets boost $200,000 grant from NCBiotech

WRAL TechWire, Elizabeth Witherspoon


Duke University researchers are using a grant from the North Carolina Biotechnology Center to “supercharge” vital medical information.

The Duke scientists say the instrumentation grant from NCBiotech enables them to use cutting-edge digital imaging and artificial intelligence (AI) technologies to advance cancer research, diagnostics and to support a vast new interdisciplinary data science initiative called AI Health.

Coronavirus outbreak puts ‘open science’ under a microscope

Fred Hutchinson Cancer Research Center, Hutch News Stories


In this case, because of the seriousness of the epidemic, virus genomes are being released three to six days after sample collection. “We haven’t seen this before,” said Dr. Trevor Bedford, a computational biologist at Fred Hutchinson Cancer Research Center in Seattle.

He is tracking the virus on Nextstrain, a public website he developed for scientists that provides informative and animated maps of viral evolution. These phylogenetic charts — family trees for viruses — can help guide the public health response as an epidemic unfolds, whether it be avian influenza, Ebola, coronavirus or Zika.

“I’ve been talking about ‘real-time’ phylogenetics for a long time, and now we are there,” he said.

IU Creates Climate Change Data Portal to Model State’s Future Water Supply

Indiana University, News at IU


To promote collaborative research and inform water stakeholders throughout the state, an Indiana University research team has created the Future Water science gateway, a public interactive data portal that shows users a highly detailed model of the Wabash River Basin under a number of different climate scenarios.

The project is an initiative of IU’s Environmental Resilience Institute, part of the Prepared for Environmental Change Grand Challenge initiative, to provide Indiana with accurate predictions that help the state prepare for changes that will affect Hoosiers’ health and livelihoods.

Researchers finally get access to data on Facebook’s role in political discourse

Science, Jeffrey Mervis


It took 20 months longer than planned, and a daunting statistical challenge remains. But Facebook is finally giving researchers access to a trove of data on how its users have shared information—and misinformation—on recent political events around the world.

The data being made available today consist of 38 million URLs relating to civic discourse that were shared publicly on Facebook between January 2017 and July 2019. They reveal such details as whether users considered a linked site to be fake news or hate speech, and whether a link was clicked on or liked. Facebook is also providing demographic information—age, gender, and location—about the people who shared, clicked on, or liked those links, as well as their political affinities.

Northwestern Mutual Data Science Institute projects take aim at big social issues

BizTimes, Lauren Anderson


The new Northwestern Mutual Data Science Institute unveiled that its first round of research projects will work to understand and address societal issues, including 2020 voter sentiment, equitable housing and the opioid crisis.

The institute launched in the fall as a partnership of Northwestern Mutual, the University of Wisconsin-Milwaukee and Marquette University, and is backed by a $15 million commitment from the Milwaukee-based life insurance company, along with $12 million from each university. The institute is housed at Cream City Labs, an innovation lab in Northwestern Mutual’s downtown headquarters.

Yoshua Bengio’s blog – first words

Yoshua Bengio


I have the impression that collectively we are like children playing with nuclear bombs. What is at stake is the well-being (or misery) of billions of human beings, not to mention an even larger number of other lives on this planet.

That is why I have decided to devote some of my energy to thinking about the climate crisis, because I believe it is one of the most serious threats to humanity and the planet that our generation and coming generations will have to deal with. I intend to talk more later about the specific research efforts which are being undertaken to take advantage of the advances in machine learning to help tackle climate change (in the meantime, check this paper).

But here I want to also point out our individual and collective responsibility — as scientists and scholars — as generators of greenhouse gases through the conferences and workshops we keep organizing all around the planet.

Web scraping is now legal

Towards Data Science, Tom Waterman


In late 2019, the US Court of Appeals denied LinkedIn’s request to prevent HiQ, an analytics company, from scraping its data.

The decision was a historic moment in the data privacy and data regulation era. It showed that any data that is publicly available and not copyrighted is fair game for web crawlers.

But commercial use of scraped data is still limited

The decision does not, however, grant HiQ or other web crawlers the freedom to use data obtained by scraping for unlimited commercial purposes.

Reuters creates prototype automated video match report led by AI sports presenter

PressGazette (UK), Freddy Mayhew


Reuters has created a virtual sports journalist using artificial intelligence technology and video footage of a human presenter.

The AI presenter reads out match summaries in a video, using Reuters photography and reports, that requires no scripting, editing or production.

Reuters claims the prototype, built in partnership with AI-video firm Synthesia, is the world’s first automated presenter-led video sports report.

Google News Initiative and Oxford University agree extension in support for Digital News Project to August 2023

University of Oxford, Reuters Institute


The Google News Initiative and Oxford University have agreed on an extension to the current grant for the Reuters Institute Digital News Report and the related Digital News Project, for a further three years from 2021 to 2023. This builds on the existing agreement with Google to support the project that was originally announced in 2015 with a previous extension announced in 2018.

The new extension to August 2023 will allow the Reuters Institute for the Study of Journalism to further expand the Reuters Institute Digital News Report’s global reach, with the plan of adding six more countries to the 38 markets already covered in the most recent 2019 report, which was made possible by support from Google and 14 other sponsors.

Connecting the dots between climate change and health care

Harvard Gazette


Boston’s medical establishment is coming together to carry a message to area physicians and other health care workers: Climate change plays a role in many of the illnesses they see each day. A Feb. 13 symposium, “The Climate Crisis and Clinical Practice,” at Harvard Medical School (HMS) aims to help them anticipate those health effects to better treat and advise patients, and to discuss relevant issues with them when appropriate. The symposium is supported by HMS, area teaching hospitals, medical associations, and the New England Journal of Medicine. The Gazette spoke with organizer Renee Salas, an emergency physician, HMS assistant professor of emergency medicine, and climate change and health expert, about the need for the gathering.

Skin Monitoring Apps Fail to Detect Melanomas

IEEE Spectrum, Megan Scudellari


Publicly available skin cancer detection apps, such as SkinVision, use AI-based analysis to determine if a new or changing mole is a source of concern or nothing to worry about. Yet according to a new analysis of the scientific evidence behind those apps, there’s a lot to worry about.

In a study published this week in The BMJ, a team of experts evaluated the science behind six skin cancer detection apps and found it sorely lacking. The apps miss melanomas, the most serious form of skin cancer; produce false positives that could lead to removing harmless moles unnecessarily; are poorly regulated; and users are not informed of the apps’ limitations.

23andMe, Ancestry, and the decline of consumer DNA tests

Vox Media, Recode, Rani Molla


23andMe and Ancestry are laying off workers as interest in their DNA tests declines.


EDBT: 23rd International Conference on Extending Database Technology

University of Copenhagen


Copenhagen, Denmark March 30-April 2. Coincides with ICDT: 23rd International Conference on Database Theory. [$$$]

Designing The 100 Questions NYC

NYU GovLab, Reaktor


New York, NY March 3, starting at 6 p.m., Reaktor (30 West 21st St). “This event, co-organized by The GovLab and Reaktor as part of NYC Open Data Week 2020 and inspired by The 100 Questions global initiative (see video below), will begin with a panel of leaders from different sectors reflecting on the need for more question-driven open data delivery.” [free, registration required]


The Analytics Summit

Nashville, TN September 21-22. “This two-day Summit alternates between hands-on workshops and speaker presentations, presented by experts from various industries and backgrounds.” Application for speaker proposals is open.

Royal Statistical Society, RSS International Conference

Bournemouth, England September 7-10. “Over 600 statisticians and data scientists gather from all sectors and from over 35 countries to share information and network, attracted by a varied programme of talks and workshops.” Deadline for submissions is April 5.

Call For Papers – ICTIR 2020

Stavanger, Norway September 14-18. “The ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR) provides a forum for the presentation and discussion of research related to the foundational aspects of Information Retrieval (IR).” Deadline for full paper submissions is May 11.
Tools & Resources

Teach Yourself Computer Science (a DIY curriculum)


“TL;DR: Study all nine subjects below, in roughly the presented order, using either the suggested textbook or video lecture series, but ideally both. Aim for 100-200 hours of study of each topic, then revist favorites throughout your career.”

TyDi QA: A Multilingual Question Answering Benchmark

Google AI Blog, Jonathan Clark


To encourage research on multilingual question-answering, today we are releasing TyDi QA, a question answering corpus covering 11 Typologically Diverse languages. Described in our paper, “TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages”, our corpus is inspired by typological diversity, a notion that different languages express meaning in structurally different ways. Because we selected a set of languages that are typologically distant from each other for this corpus, we expect models performing well on this dataset to generalize across a large number of the languages in the world.

The state of data quality in 2020

O'Reilly Radar, Roger Magoulas and Steve Swoyer


The O’Reilly survey “highlights the increasing attention organizations are giving to data quality and how AI both exacerbates and alleviates data quality issues.”

Google introduces Colab Pro w/ faster GPUs, more memory, and longer runtimes

9to5Google, Abner Li


Colab Pro is an upgrade that provides three primary benefits for $9.99/month:

  • Faster GPUs
  • Longer runtimes
  • More memory

    Internships and other temporary positions

    Bioinformatics Summer Intern

    Broad Institute, Bayer Precision Cardiovascular Lab; Cambridge, MA

    Leave a Comment

    Your email address will not be published.