Data Science newsletter – May 24, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 24, 2017


Data Science News

Government Data Science News

How does the FDA regulate medical software that uses ever-updating machine learning? They are creating a serious ethical questions to answer about the accuracy of its facial recognition software, according to a lawsuit filed against it by the George Washington University Center of Privacy & Technology. For starters, being in a facial recognition “line-up” does not require consent and “studies have shown that the underlying algorithms are better at identifying some races over others [blacks].” To make matters worse, “the genesis and extent of this problem remain something of a mystery, because racial bias in facial recognition software is profoundly understudied.”

DJ Patil, former Chief Data Scientist at the White House Office of Technology Policy, details the top three reasons science and medicine will be stymied by Trump’s health care plan. Teaser: genomics and precision medicine rely on data that can also be used to reveal individual’s pre-existing conditions. Science cannot advance quickly with a health care mandate that allows insurance companies to discriminate based on pre-existing conditions.

Weather data scientists at NOAA predict there will be 2-4 major hurricanes during the coming storm season. In other disaster news, the State of California will launch a limited public earthquake early warning system sometime in 2018.

Tweet of the Week

Twitter, Josh Patterson


Curiosity May Be Vital for Truly Smart AI

MIT Technology Review, Will Knight


A computer algorithm equipped with a form of artificial curiosity can learn to solve tricky problems even when it isn’t immediately clear what actions might help it reach this goal.

Researchers at the University of California, Berkeley, developed an “intrinsic curiosity model” to make their learning algorithm work even when there isn’t a strong feedback signal. The curiosity model developed by this team sees the AI software controlling a virtual agent in a video game seek to maximize its understanding of its environment and especially aspects of that environment that affect it. There have been previous efforts to give AI agents curiosity, but these have tended to work in a more simplistic way.

Legendary computer scientist James Gosling joining Amazon Web Services

GeekWire, Tom Krazit


The father of Java is joining forces with Amazon Web Services.

James Gosling, until today Chief Software Architect at Liquid Robotics, announced Monday on Facebook that he’s “starting a new Adventure” with the cloud computing juggernaut as a Distinguished Engineer. He didn’t go into detail about the role he’ll be playing at AWS, and neither did a company representative who confirmed Gosling was joining the company.

Facial Recognition Technology Is Both Biased and Understudied

Undark magazine, Rod McCullom


Just how extensive, accurate, and closely monitored is the New York City Police Department’s facial recognition technology? At the moment, no one really knows, but these are the key questions animating a freedom-of-information lawsuit against the NYPD by researchers from the Center on Privacy & Technology at Georgetown Law.

The lawsuit was filed about two weeks ago in response to repeated refusals by the nation’s largest police department to release virtually any documentation related to its use of facial recognition software, which ostensibly helps law enforcement agencies match imagery of suspected criminals (cellphone shots taken by witnesses, for example, or surveillance video, or even police lineups and live video feeds) against existing databases — be it their own, or those of other agencies (passports, for instance, or driver’s licenses).

DJ Khaled surprises statistics graduation

University of California-Berkeley, Berkeley News


Steve Stoute — a cultural force who has built a name in the music industry and as a marketing and brand entrepreneur — brought his savvy to the Zellerbach Playhouse at UC Berkeley Thursday, telling graduating statistics students that “data is the instrument, now go be the artist. Keep pushing the culture forward.”

Then he introduced his surprise for the grads. “Here is your gift today!” he told them, bringing out DJ Khaled. The record producer and DJ took the floor in a pale blue track suit and, after a little encouragement, got the students to put their hands in the air as he performed “All I Do Is Win.”

Open data and digital morphology

Proceedings of the Royal Society of London B: Biological Sciences; Thomas G. Davies et al.


Over the past two decades, the development of methods for visualizing and analysing specimens digitally, in three and even four dimensions, has transformed the study of living and fossil organisms. However, the initial promise that the widespread application of such methods would facilitate access to the underlying digital data has not been fully achieved. The underlying datasets for many published studies are not readily or freely available, introducing a barrier to verification and reproducibility, and the reuse of data. There is no current agreement or policy on the amount and type of data that should be made available alongside studies that use, and in some cases are wholly reliant on, digital morphology. Here, we propose a set of recommendations for minimum standards and additional best practice for three-dimensional digital data publication, and review the issues around data storage, management and accessibility. [full text]

Next-gen computing: Memristor chips that see patterns over pixels

University of Michigan, Michigan News


Inspired by how mammals see, a new “memristor” computer circuit prototype at the University of Michigan has the potential to process complex data, such as images and video orders of magnitude, faster and with much less power than today’s most advanced systems.

Faster image processing could have big implications for autonomous systems such as self-driving cars, says Wei Lu, U-M professor of electrical engineering and computer science. Lu is lead author of a paper on the work published in the current issue of Nature Nanotechnology.

We’re about to cripple the genomic medical era

NewCo Shift, dj patil


Over the past two years, as the Chief Data Scientist for the U.S., I’ve had the opportunity to look over the horizon and see what’s coming in advancements to medicine. First off, I couldn’t be more bullish. The costs of genetic testing continues to drop and is increasingly used to address diseases like cancer. We also now have a wide array of new sensors to understand the impact of our environments both around us (e.g., air quality) and inside us (e.g., our microbiome). These combined with with advancements of data science, machine learning, and artificial intelligence (AI) have laid the foundation to revolutionize how we treat disease.

But there’s a catch, and it’s called preexisting conditions. And bringing back preexisting conditions will derail us in three ways if the American Health Care Act (AHCA) — also known as Trumpcare, goes forward.

How Women Mentors Make a Difference in Engineering

The Atlantic, Ed Yong


For some women, enrolling in an engineering course is like running a psychological gauntlet. If they dodge overt problems like sexual harassment, sexist jokes, or poor treatment from professors, they often still have to evade subtler obstacles like the implicit tendency to see engineering as a male discipline. It’s no wonder women in the U.S. hold just 13 to 22 percent of the doctorates in engineering, compared to an already-low 33 percent in the sciences as a whole.

Nilanjana Dasgupta, from the University of Massachusetts in Amherst, thinks that mentors—people who can give advice, share experiences, or make social connections—can dismantle the gauntlet, and help young women to find their place in an often hostile field.

A new tool for discovering nanoporous materials



Materials classified as “nanoporous” have structures (or “frameworks”) with pores up to 100 nm in diameter. These include diverse materials used in different fields from gas separation, catalysis, and even medicine (e.g. activated charcoal). The performance of nanoporous materials depends on both their chemical composition and the shape of their pores, but the latter is very difficult to quantify. So far, chemists rely on visual inspection to see whether two materials have similar pores. EPFL scientists, in the framework of NCCR-MARVEL, have now developed an innovative mathematical method that allows a computer to quantify similarity of pore structures. The method makes it possible to search databases with hundreds of thousands of nanoporous materials to discover new materials with the right pore structure. The work is published in Nature Communications.

Some Social Scientists Are Tired of Asking for Permission

The New York Times, Kate Murphy


If you took Psychology 101 in college, you probably had to enroll in an experiment to fulfill a course requirement or to get extra credit. Students are the usual subjects in social science research — made to play games, fill out questionnaires, look at pictures and otherwise provide data points for their professors’ investigations into human behavior, cognition and perception.

But who gets to decide whether the experimental protocol — what subjects are asked to do and disclose — is appropriate and ethical? That question has been roiling the academic community since the Department of Health and Human Services’s Office for Human Research Protections revised its rules in January.

The revision exempts from oversight studies involving “benign behavioral interventions.” This was welcome news to economists, psychologists and sociologists who have long complained that they need not receive as much scrutiny as, say, a medical researcher.

A Fresh Math Perspective Opens New Possibilities for Computational Chemistry

US Department of Energy, Berkeley Lab


Glow-in-the-dark objects seem magical when you’re a kid—they can brighten up a dark room without the need for electricity, batteries or a light bulb. Then at some point you learn the science behind this phenomenon. Chemical compounds called chromophores become energized, or excited, when they absorb visible light. As they return to their normal state, the stored energy is released as light, which we perceive as a glow. In materials science, researchers rely on a similar phenomenon to study the structures of materials that will eventually be used in chemical catalysis, batteries, solar applications and more.

When a molecule absorbs a photon—the fundamental particle of light—electrons in the molecular system are promoted from a low-energy (ground) state to a higher-energy (excited) state. These responses resonate at specific light frequencies, leaving “spectral fingerprints” that illuminate the atomic and electronic structures of the system being studied.

How to think scientifically about scientists’ proposals for fixing science

Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog


My focus here will not be on the suggestions themselves but rather on what are our reasons for thinking these proposed innovations might be good ideas. The unfortunate paradox is that the very aspects of “junk science” that we so properly criticize—the reliance on indirect, highly variable measurements from nonrepresentative samples, open-ended data analysis, followed up by grandiose conclusions and emphatic policy recommendations drawn from questionable data—all seem to occur when we suggest our own improvements to the system. . . . I will now discuss various suggested solutions to the replication crisis, and the difficulty of using scientific evidence to guess at their effects.

The Little-Known Statistician Who Taught Us to Measure Teachers

The New York Times, The Upshot blog, Kevin Carey


Students enroll in a teacher’s classroom. Nine months later, they take a test. How much did the first event, the teaching, cause the second event, the test scores? Students have vastly different abilities and backgrounds. A great teacher could see lower test scores after being assigned unusually hard-to-teach kids. A mediocre teacher could see higher scores after getting a class of geniuses.

Thirty-five years ago, a statistician, William S. Sanders, offered an answer to that puzzle. It relied, unexpectedly, on statistical methods that were developed to understand animal breeding patterns.

AlphaGo’s Ke Jie defeat offers a sobering look at the future of man versus machine

Wired UK, Jamie Fullerton


The feeling that the setting for Ke’s matches – the Future of Go Summit in Wuzhen, a water town near Shanghai – marked an end game for AlphaGo’s gaming exploits would be fuelled further by comments from DeepMind founder Demis Hassabis later. But first, the immense interest in AlphaGo in China was demonstrated when Hassabis walked in front of delegates in Wuzhen’s Internet International Conference and Exhibition Center yesterday morning.

This startup wants to solve the social care crisis with AI

Wired UK, Oliver Franklin-Wallis


London-based social care startup Cera has launched an AI to help carers and patients answer questions – and in future, help spot symptoms of illness

Toyota Research Institute Explores Blockchain Technology for Development of New Mobility Ecosystem

Toyota USA Newsroom


The Toyota Research Institute (TRI) announced today that it is exploring blockchain and distributed ledger technology (BC/DL) for use in the development of a new mobility ecosystem that could accelerate development of autonomous driving technology. TRI is collaborating with the MIT Media Lab (MIT ML) and other industry partners to foster a digital environment where users – both businesses and consumers – may securely share driving and autonomous vehicle testing data, manage ride-share and car-share transactions and store vehicle usage information that could be used in the setting of insurance rates.

Building a Materials Data Infrastructure: Opening New Pathways to Discovery and Innovation in Science and Engineering

The Minerals, Metals and Materials Society


The availability of increasingly sophisticated experimental and computational tools provides scientists and engineers with new opportunities, but harnessing the vast amounts of data generated from these new approaches presents a challenge. Building a Materials Data Infrastructure identifies and prioritizes these challenges, while also providing actionable recommendations for addressing them.

Unique identifiers for small molecules enable rigorous labeling of their atoms

Nature, Scientific Data; Hesam Dashti, William M. Westler, John L. Markley & Hamid R. Eghbalnia


Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its ‘photo ID’, is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best approach to this requirement is the International Chemical Identifier (InChI). However, the current implementation of InChI fails to provide a complete standard for atom nomenclature, and incorrect use of the InChI standard has resulted in the proliferation of non-unique identifiers. We propose a methodology and associated software tools, named ALATIS, that overcomes these shortcomings. ALATIS is an adaptation of InChI, which operates fully within the InChI convention to provide unique and reproducible molecule and all atom identifiers. ALATIS includes an InChI extension for unique atom labeling of symmetric molecules. ALATIS forms the basis for improving reproducibility and unifying cross-referencing across databases. [full text]

The Internet Won’t Prioritize Quality Without an Intervention

Longreads, Catherine Cusick


In an interview with The New York Times, Twitter co-founder Evan Williams admits to David Streitfeld that he thinks the internet is broken — and apologizes for the role Twitter played in the ascendency of Donald Trump.

Scientists Are Using Artificial Intelligence to Plot the Galaxy

Inverse, Dan Robitzski


A team of scientists from Australia and Denmark trained A.I. to do the work for them. And they found that their deep learning system could predict the ages of red giants with 99% accuracy. So far, the A.I. has classified 7,600 and change red stars, about 5,400 of which had never been analyzed in the past.

By relying on the propensity of deep learning systems for image recognition tasks, the researchers found a way to convert the spectrum data required to calculate a star’s age into an image-like representation that the computer could recognize.

Imagining tomorrow’s university in an era of open science

F1000Research, Adina Howe et al.


As part of a recent workshop entitled “Imagining Tomorrow’s University”, we were asked to visualize the future of universities as research becomes increasingly data- and computation-driven, and identify a set of principles characterizing pertinent opportunities and obstacles presented by this shift. In order to establish a holistic view, we take a multilevel approach and examine the impact of open science on individual scholars and how this impacts as well as on the university as a whole. At the university level, open science presents a double-edged sword: when well executed, open science can accelerate the rate of scientific inquiry across the institution and beyond; however, haphazard or half-hearted efforts are likely to squander valuable resources, diminish university productivity and prestige, and potentially do more harm than good. We present our perspective on the role of open science at the university. [full text]

Meet the Nomad Who’s Exploding the Internet Into Pieces With “Secure Scuttlebutt”

The Atlantic, Ian Bogost


Dominic Tarr is a computer programmer who grew up on a remote farm in New Zealand. Down in the antipodes, isolation is even more isolating. Getting goods, people, and information to and from Australasia for families like Tarr’s has always been difficult. Bad, unreliable internet service is a particular challenge. Australia and New Zealand are first-world countries with third-world latency.

Today, Tarr lives on a sailboat—another Kiwi staple, alongside sheep and distance. Connectivity is worse on the boat than on the farm, and even less reliable. But that’s by design rather than by misfortune. Tarr started living on the boat after burning out at a previous job and discovering that the peripatetic lifestyle suited him. Unreliable and sporadic internet connectivity became an interesting engineering challenge. What if isolation and disconnection could actually be desirable conditions for a computer network?

He built something called Secure Scuttlebutt, or SSB. It’s a decentralized system for sending messages to a specific community, rather than the global internet.

Pittsburgh Welcomed Uber’s Driverless Car Experiment. Not Anymore.

The New York Times, Cecelia Kang


When Uber picked this former Rust Belt town as the inaugural city for its driverless car experiment, Pittsburgh played the consummate host.

“You can either put up red tape or roll out the red carpet,” Bill Peduto, the mayor of Pittsburgh, said in September. “If you want to be a 21st-century laboratory for technology, you put out the carpet.”

Nine months later, Pittsburgh residents and officials say Uber has not lived up to its end of the bargain.

The Fin-ternet of Things: How connectivity is changing the face of finance

World Economic Forum, Techonomy, Jennifer Schenker


Driverless cars that make per-minute toll road payments. Lightbulbs that decide the right energy vendor at a given time and price. Investment decisions made by robo-advisors. And products created in-situ by 3D printers, with royalties paid in real time on whatever intellectual property is needed. This is the coming world of the Internet of Everything, as intelligence and connectivity becomes distributed more and more widely across our economy. Everything will talk to everything and end-points will make their own decisions, powered by sensor-driven data collection and computing that continually improves, known as machine learning.

To make this a reality, we will need a well-functioning, secure micropayments system. We must reimagine the nature of financial assets, digital assets, rights management, intellectual property and royalties. “Sensor technologies and the collection and exchange of data by the Internet of Things will require a fundamental rethink of financial services,” says Udayan Goyal, a London-based private-equity and venture capital investor focused on financial services.

If they aren’t careful, banks could be relegated in this new world to becoming mere clearing houses. Credit card and insurance companies could see business shrink. So traditional financial services companies are seeking a role for themselves at the nexus of the digital economy. Getting a pole position in the evolving IoT would allow banks to gain new revenue from a range of innovative services.



R. Sommer McCoy, Monthly Music Hackathon


New York, NY June 10, a day celebrating and exploring the intersection of hip hop and technology. [free, registration required]

Library Carpentry Sprint 2017 – Library Guides at UC Berkeley

Software Carpentry


Berkeley, CA Part of the larger Mozilla Global Sprint 2017, drop by anytime on June 1-2 between 9 a.m. and 5 p.m., Berkeley Institute for Data Science, 190 Doe Memorial Library. [free]


Scale by the Bay CFP

San Francisco, CA This is a joint Call for Papers for a three-day, three-track software and data engineering conference sequence to be held November 2017, in San Francisco. Deadline for submissions is May 31.

Application now open! Facebook Grace Hopper Scholarship

This fall we’ll join thousands of remarkable women in technology in Orlando from October 4-6, 2017 for our tenth year at the Grace Hopper Celebration of Women in Computing. Completed applications are due by Thursday, June 15.
Tools & Resources

Measures for Justice data portal

Measures for Justice


“Assessing and comparing the performance of the entire U.S. criminal justice system.”

Visit a New Digital Archive of 2.2 Million Images from the First Hundred Years of Photography



“Europeana Photography opens up Europe’s rich photographic heritage to everyone. Photography is a direct and effective connection between history and contemporary society.”

Data Management Hub

GitHub – kubrik-engineering


The Data Management Hub is a distributed data management platform which fits into researchers’ workflows, enables secure collaboration on sensitive data [4] and empowers dissemination of research outcomes so that data will remain useful for decades [5]. The Data Management Hub will link research data and publications permanently to each other.

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations

GitHub – szilard


“This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications.”


Full-time positions outside academia

Bioinformatics Scientist

Personalis; Menlo Park, CA

Leave a Comment

Your email address will not be published.