Data Science newsletter – May 19, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 19, 2017


Data Science News

Inside Google’s Slow-Mo Virtual Reality Moonshot

Backchannel, Steven Levy


Clay Bavor knows immersive computing is a long-term project. Here’s what he’s doing to make it happen faster.

Using Machine Learning to Explore Neural Network Architecture

Google Research Blog, Quoc Le & Barret Zop


At Google, we have successfully applied deep learning models to many applications, from image recognition to speech recognition to machine translation. Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large — a typical 10-layer network can have ~1010 candidate networks! For this reason, the process of designing networks often takes a significant amount of time and experimentation by those with significant machine learning expertise.

Miles of Ice Collapsing Into the Sea

The New York Times, Antarctic Dispatches


We went to Antarctica to understand how changes to its vast ice sheet might affect the world. Flowing lines on these maps show how the ice is moving.

Inferring personal economic status from social network location

Nature Communications; Hernán A. Makse et al.


It is commonly believed that patterns of social ties affect individuals’ economic status. Here we translate this concept into an operational definition at the network level, which allows us to infer the economic well-being of individuals through a measure of their location and influence in the social network. We analyse two large-scale sources: telecommunications and financial data of a whole country’s population. Our results show that an individual’s location, measured as the optimal collective influence to the structural integrity of the social network, is highly correlated with personal economic status. The observed social network patterns of influence mimic the patterns of economic inequality. For pragmatic use and validation, we carry out a marketing campaign that shows a threefold increase in response rate by targeting individuals identified by our social network metrics as compared to random targeting. Our strategy can also be useful in maximizing the effects of large-scale economic stimulus policies. [full text]

City model created from images alone

ETH Zurich


ETH scientists have combined millions of images and videos into a three-dimensional, living model of the city of Zurich. The new technology has many possible applications – for example, it can analyse where and when pedestrians are on the move and parking spaces become free.

Automatically tracking neurons in a moving and deforming brain

PLOS Computational Biology; Andrew Leifer et al.


Advances in optical neuroimaging techniques now allow neural activity to be recorded with cellular resolution in awake and behaving animals. Brain motion in these recordings pose a unique challenge. The location of individual neurons must be tracked in 3D over time to accurately extract single neuron activity traces. Recordings from small invertebrates like C. elegans are especially challenging because they undergo very large brain motion and deformation during animal movement. Here we present an automated computer vision pipeline to reliably track populations of neurons with single neuron resolution in the brain of a freely moving C. elegans undergoing large motion and deformation. 3D volumetric fluorescent images of the animal’s brain are straightened, aligned and registered, and the locations of neurons in the images are found via segmentation. Each neuron is then assigned an identity using a new time-independent machine-learning approach we call Neuron Registration Vector Encoding. In this approach, non-rigid point-set registration is used to match each segmented neuron in each volume with a set of reference volumes taken from throughout the recording. The way each neuron matches with the references defines a feature vector which is clustered to assign an identity to each neuron in each volume. Finally, thin-plate spline interpolation is used to correct errors in segmentation and check consistency of assigned identities. The Neuron Registration Vector Encoding approach proposed here is uniquely well suited for tracking neurons in brains undergoing large deformations. When applied to whole-brain calcium imaging recordings in freely moving C. elegans, this analysis pipeline located 156 neurons for the duration of an 8 minute recording and consistently found more neurons more quickly than manual or semi-automated approaches. [full text]

3-D model of human brain development and disease

National Institutes of Health, NIH Matters


To study prenatal brain development and how such genetic mutations disrupt the process, a team led by Dr. Sergiu Pasca at Stanford University created a miniaturized brain model by directing human stem cells in a dish to become neurons and form 3-D brain organoids or brain-region “spheroids.” The studies were partially funded by NIH’s National Institute of Mental Health (NIMH). Results were published in the May 4, 2017, issue of Nature.

The team coaxed 3-D cell cultures to acquire the characteristics of two specific regions of the forebrain (dorsal and ventral) during early development. The two types of spheroids fused together when placed near each other, simulating the stage of development when neurons begin migrating from deep regions of the forebrain toward the cerebral cortex.

Entrepreneurship: A developing ecosystem of, through and beyond Cornell

Cornell University, Ezra Magazine, Gary E. Frank


Through research, coursework, fellowships, leadership initiatives, business incubators, community outreach, business plan competitions and more, an evolving entrepreneurial ecosystem has emerged at Cornell. Its reach extends throughout the university, from minds and hubs across the Ithaca campus and into the local community to the entrepreneurial dynamo at the heart of the Cornell Tech curriculum and within the innovations created by Weill Cornell Medicine researchers. Entrepreneurial principles are taught and applied not only to encourage self-empowerment and economic development, but also to seek solutions to pressing social concerns.

Make Data Count: Building a System to Support Recognition of Data as a First Class Research Output

UC3, Data Pub blog


The Alfred P. Sloan Foundation has made a 2-year, $747K award to the California Digital Library, DataCite and DataONE to support collection of usage and citation metrics for data objects. Building on pilot work, this award will result in the launch of a new service that will collate and expose data level metrics.

The impact of research has traditionally been measured by citations to journal publications: journal articles are the currency of scholarly research. However, scholarly research is made up of a much larger and richer set of outputs beyond traditional publications, including research data. In order to track and report the reach of research data, methods for collecting metrics on complex research data are needed. In this way, data can receive the same credit and recognition that is assigned to journal articles.

Dima Shlyakhtenko Accepts IPAM Directorship



After a thorough national search, IPAM is pleased to announce that UCLA math professor Dimitri (Dima) Shlyakhtenko will assume the directorship of IPAM on July 1, 2017. Dima replaces Russel Caflisch, who is stepping down as Director after nine years.

Video Imaging Reveals How Immune Cells Sense Danger

UC San Francisco, UCSF News Center


How do T cells, the beat cops of the immune system, detect signs of disease without the benefit of eyes? Like most cells, they explore their surroundings through direct physical contact, but how T cells feel out intruders rapidly and reliably enough to nip infections and other threats in the bud has remained a mystery to researchers.

In a new study, published online May 11, 2017, in Science, UC San Francisco researchers began to address this question by using cutting-edge techniques to capture videos of the surface of living T cells in more detail than ever before.

Automated text analysis tool will help students in large courses develop writing skills

University of Michigan News


A University of Michigan program built on the premise that students enrolled in large courses learn more if they write—as opposed to only taking multiple choice tests to show mastery of content—will expand this fall to add automated text analysis to its digital toolkit.

M-Write helps students develop their conceptual learning and writing skills in large-enrollment gateway courses. It’s another program in the university’s expanding portfolio focused on personalized education.

Long read for the week

The Economist consistently has the best coverage of AI, data science, and the impacts it will have. This week, they write about the GoodAI team trying to teach ethics to robots. Here’s a teaser: “as Rosalind Picard, director of the Affective Computing Group at MIT puts it: ‘The greater the freedom of a machine, the more it will need moral standards.'”

Why this local tech startup is building a team of — meteorologists?

Built In Austin, Kelly Jackson


As a client of Riskpulse, your key contact and support is a bona fide meteorologist.

The Austin startup uses probabilistic intelligence and data visualization to predict weather risks for the supply chain industry, supporting global energy producers, commodity investment firms, food shippers, auto manufacturers and retailers.

Bringing Street Safety to the Next Frontier of Smart Cities

Microsoft NY, Carey Anne Nadeau


Smart city technologies and data science techniques are making incredible and swift leaps forward – from smart sensors that detect smog to analytics that guide efficient water use in times of drought – but in this figurative race to the moon, select cities have been able to get projects off the ground while most others are stranded back on earth.

In a first-of-its-kind partnership between Microsoft and Open Data Nation, we’re tackling this inequity head on, lowering the barriers to entry, and making sure that the benefits of the smart cities movement diffuse to all who may share in and benefit from better, safer, and healthier cities.

We’ve identified a big issue in cities, where a broad-based, adaptable solution can have great impact. In 2016, the number of people who died in a car crash spiked to nearly 18,000, the most since 2008.

Zillow faces lawsuit over ‘Zestimate’ tool that calculates a house’s worth

The Washington Post, Kenneth R. Harney


It was bound to happen: A homeowner has filed suit against online realty giant Zillow, claiming the company’s controversial “Zestimate” tool repeatedly undervalued her house, creating a “tremendous road block” to its sale.

The suit, which may be the first of its kind, was filed in Cook County Circuit Court by a Glenview, Ill., real estate lawyer, Barbara Andersen. The suit alleges that despite Zillow’s denial that Zestimates constitute “appraisals,” the fact that they offer market-value estimates and “are promoted as a tool for potential buyers to use in assessing [the] market value of a given property,” shows that they meet the definition of an appraisal under state law. Not only should Zillow be licensed to perform appraisals before offering such estimates, the suit argues, but it also should obtain “the consent of the homeowner” before posting them online for everyone to see.

Human proteome mapped across the cell

Chemical & Engineering News, Sarah Everts


By following more than 12,000 proteins to 30 subcellular compartments, researchers provide a new tool for tracking protein location


Eighth International Conference on Computational Creativity, ICCC 2017

Association for Computational Creativity


Atlanta, GA ICCC 2017 will take place June 19-23, hosted at Georgia Tech. [$$$]

Privacy + Security Forum

Daniel Solove and Paul Schwartz


Washington, DC October 4-6. [$$$$]

Workshop on Design Issues for a data Anonymization Competition (WODIAC)

PETS 2017


Minneapolis, MN Monday, July 16, part of Privacy Enhancing Technologies Symposium 2017

2017 Government Analytics Forum: Transforming Government in the Cognitive Era



Washington, DC June 1 [registration required]


Visualize the world’s protected areas – new data visualization challenge! : visualization

Data for a Cause have teamed up with UN Environment – World Conservation Monitoring Centre to run a challenge in the second half of May (May 17 – May 24) to support their Protected Planet project.

Alternative ways to use your coding skills

Based on a conversation at a Software Carpentry ( discussion session today, an instructor mentioned that they like to share some personal coding projects with learners. Through sharing these projects she hopes to inspire learners to think about coding beyond just research or scientific applications of the skills and ways they can solve many kinds of problems. [web form]

Call for Blue Sky Papers at RSS 2017 Conference

Cambridge, MA July 15, part of the 2017 Robotics: Science and Systems Conference. Deadline for submissions is May 31.
Tools & Resources

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell


In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.

Kubeface: distributed computing in Python on Google Container Engine

Hammer Lab, Tim O'Donnell


This post introduces a Python library called Kubeface for parallel computing on Google Container Engine using Kubernetes. We developed Kubeface to train the neural networks we publish in the MHCflurry package. There’s nothing specific to neural networks in Kubeface, however, and we’ve used it to run a variety of compute-bound, easily-parallelized tasks.

Using the Dataset API for TensorFlow Input Pipelines

GitHub – tensorflow


“The Dataset API is designed to let you build complex input pipelines from simple, reusable pieces.”

How do I build an API?

O'Reilly Media, Kevin Bowersox


“Join software developer Kevin Bowersox, as he explains important considerations such as documentation and integrity when planning and building an API.” [video, 5:27]

Library Offers Largest Release of Digital Catalog Records in History

Library of Congress


“The Library of Congress announced today that it is making 25 million records in its online catalog available for free bulk download at This is the largest release of digital records in the Library’s history.”

Google is giving a cluster of 1,000 Cloud TPUs to researchers for free

TechCrunch, John Mannes


At the end of Google I/O, the company unveiled a new program to give researchers access to the company’s most advanced machine learning technologies for free. The TensorFlow Research Cloud program, as it will be called, will be application based and open to anyone conducting research, rather than just members of academia.

If accepted, researchers will get access to a cluster of 1,000 Cloud TPUs for training and inference. Each cloud TPU will offer 180 teraflops of floating-point performance and 64GB of memory. Compute time will be allocated and limited depending on the particular approved project.


Full-time, non-tenured academic positions

Research Assistant in Trade Secrets, Economic Espionage and CyberCrime

Goldsmiths University of London; London, England
Full-time positions outside academia

Data Scientist, DMP & Personalization

adidas; Amsterdam, The Netherlands

Leave a Comment

Your email address will not be published.