Data Science newsletter – May 20, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 20, 2017


Data Science News

NIH “All of Us” precision medicine program gears up to launch

MobiHealthNews, Gus Venditto


The promise of precision medicine is starting to be realized. The combination of patient medical records and genomic data is proving to be a potent source of data for clinicians as they apply analytical tools in the search for better treatments.

But one factor holds back many new initiatives. The data available for analysis is limited to the records of patients who are participating in any given trial. For precision medicine to reach its true potential, the number of patient datasets will need to grow dramatically.

That’s the challenge that the National Institute of Health’s All of Us Research Program aims to solve.

Everything Google announced at I/O 2017

Business Insider, Julie Bort and Steve Kovach


“Among the new goodies: updates to Android, Google Home, Google Assistant and YouTube. As well as nifty new services like Google Lens, which uses your smarpthone’s camera to identify objects in the real world”

University Data Science News

Harvard professor Steven Hyman argues that biology needs more staff scientists, echoing sentiments we hear in data science about the necessity of finding hard money to support staff data scientists. He writes, “the staff-scientist model is a win for all involved. Complex scientific projects advance more surely and swiftly, and faculty members can address questions that would otherwise be out of reach”. He calls upon funders to recognize the full value of staff scientists and stop seeing research staff as “fat in the system”.

Boston University and University of Central Florida physicists “discovered a way of applying statistical mechanics, a branch of physics, to create more efficient algorithms that can run on traditional computers” or, more presciently, quantum chips (for more on quantum chips, see IBM in company data science news). The limitations of existing statistical mechanics models are thermodynamic phase transitions, a problem the new model sidesteps by running in reverse. The paper was published in Nature with a healthy alt-metric score for a paper with an abstract most people cannot understand. This type of basic scientific advancement is important for the (commercial) success of quantum computing which, in turn, will allow for the bally-hooed world domination of artificial intelligence and machine learning.

William Isaac and Andi Dixon, two graduate students, put together an accessibly written explanation about why predictive policing is probably biased. For starters, “crime itself is a largely hidden social phenomenon that happens anywhere a person violates a law” which is to say the data we call “crime data” are just the “who got caught” data. The gap between the level of crime we think exists in society and the number of crimes detected is large, especially for small crimes, and explored in detail in sociology and criminology classes. Further, actions that are recorded as crime data include things like 911 calls that are not great measures of crime. Multiple calls can be made for a single crime or no calls could be made or calls could be made about actions that are not crimes (remember when video gamers used to ‘swat’ each other?). If you have to teach an ethics module on data science, this article is a fantastic place to start because it highlights:

1) data scientists need domain knowledge

2) algorithms cannot be less biased than their training data

3) training data often contain bias

4) an ideal scenario isn’t one in which the training data + algorithm work to reproduce the status quo efficiently. In an ideal scenario, data science could help us improve upon the status quo, making policing more just.

The Sloan Foundation just funded the OpenCitations Enhancement project. The OpenCitations Corpus is essentially 15% of the same type of citation information held by Web of Science. The difference is that OCC is a non-profit working to make citation data open. In their own words, they “aim to empower the global community by liberating scholarly citation data from their current commercial shackles.”

The Sloan Foundation also granted $747,000 to California Digital Library, DataCite and DataONE in a project that will establish usage and citation metrics for data.

The University of Tennessee and Oak Ridge National Laboratory are going to start a new doctoral program in data science.

Nature ran a paper that uses social networks to estimate an individual’s socioeconomic status, a technique then used to target ads. The authors write that they can, “infer the economic well-being of individuals through a measure of their location and influence in the social network…[of] telecommunications and financial data.” This, my readers, is the story I like least this week. I find it troubling to label someone based on the friends’ they hang out with because it seems designed to amplify self-selection biases and seal off tightly clustered parts of a network by treating some entire clusters different than others. I’m actually less grumpy about using that approach to target ads because, as lunch speaker Alice Marwick pointed out, that type of ‘be careful about the crowd you hang out with’ inference could be used for more critical decisions like making employment decisions or extending loans.

Last week we included a bad linke in: UC-Berkeley is bridging a known training gap by offering data science short courses to professors and other instructors. . Apologies for the error.

Sundar Pichai Sees Google’s Future in the Smartest Cloud

WIRED, Business, Cade Metz


Two days before delivering the keynote at Google I/O, the company’s annual State of the Union address, Sundar Pichai is worried about losing his voice.

Sitting at the coffee table inside his remarkably spartan office at company headquarters, the Google CEO speaks softly, even by his standards. Step by careful step, he explains the major themes that will run through his keynote speech, and at first, they seem less weighty than they should, considering they represent the near-future of the world’s largest internet business. That’s partly because Pichai isn’t feeling too well and partly because those themes aren’t all that different from the themes that drove last year’s speech. Google, he keeps saying, is now an “AI-first company.” But about halfway through this chat, he reveals that the company will unveil a new chip specifically designed to power the new wave of artificial intelligence. However softly he speaks, this news will echo loudly across the tech industry.

Introducing the new Wright Lab, where physics takes on the universe’s biggest questions

Yale University, YaleNews


“The new Wright Lab is advancing the frontiers of fundamental physics,” said physics professor Karsten Heeger, the lab’s director. “It investigates questions about the very small and the very big, from elementary particles to the origins of the universe.”

The lab’s multi-year renovation is only one component of a major expansion of physical science facilities on campus. The effort includes the recent creation of new labs for chemistry, biology, and physics at the Sterling Chemistry Laboratory; the opening of the Yale Quantum Institute; and soon-to-be-completed classrooms and labs for computer science and engineering at the Yale School of Engineering and Applied Science. Construction of a new Yale Biology Building, on the site of the former Gibbs Laboratory, began earlier this year.

Tools & Resources

Why Running an API Directory is Harder than it Looks (And Why API Providers Should Care)

ProgrammableWeb, David Berlind


Neatly concealed in this tale about researching a financial API for inclusion in ProgrammableWeb’s API directory is some prescriptive advice for API providers who want to best optimize their developer experiences for maximum developer discoverability, explorability, and comprehension. Unlike a lot of the other prescriptive content we offer on ProgrammableWeb, this article doesn’t offer advice on what to do. But I think you’ll agree; after following our journey, you’ll have some idea of how to minimize the friction in your own developer messaging (as well as a bit more insight into how we do what we do here at ProgrammableWeb!).

Leave a Comment

Your email address will not be published.