Data Science newsletter – April 28, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for April 28, 2017


Data Science News

Databite 98: Eric Horvitz – live stream

YouTube, Data & Society Research Institute


Artificial intelligence (AI) is at an inflection point and is poised to move into the open world and into our lives in numerous ways that will have numerous influences on people and society. While AI promises to provide great value, along with the aspirations come concerns about inadvertent costs, rough edges, and failures. Concerns include failures of automation in the open world, biased data and algorithms, opacity of reasoning, adversarial attacks on AI systems, and runaway AI. I will discuss short- and longer-term challenges and discuss studies aimed at addressing concerns, including the One Hundred Year Study on AI at Stanford University and the Partnership on AI to Benefit People and Society.

Announcement: Nature journals support the San Francisco Declaration on Research Assessment

Nature, Editorial


Nature Research will this week formally sign up to the principles outlined in the San Francisco Declaration on Research Assessment, commonly known as DORA. Nature Research (the Nature-branded journals, Scientific Reports, Scientific Data and the Nature Partner Journals) has long been editorially aligned with the principles described in DORA, particularly the need to move away from the inappropriate use of the journal impact factor.

The Ancient Mystery of How the Brain Predicts Speech is Solved

Inverse, Sarah Sloat


In a study published Wednesday in PLOS Biology, a team of neuroscientists reveal that they have figured out what mechanisms within the brain are involved when processing speech.

Study co-author Dr. Yuki Kikuchi explains in a statement that the mechanisms for speech in the brain “work like predictive text on your mobile phone, anticipating what you are going to hear next.”

The absurd precision involved in detecting gravitational waves

Jason Kottke, Veritasium


Back in September 2015, the LIGO experiment detected gravitational waves formed 1.3 billion years ago when two black holes merged into one. The physics is pretty straightforward but to get the measurement, scientists had to build one of the most sensitive machines ever built. How sensitive? To get an accurate result, they needed to measure a distance of 4km with an accuracy of 1/10000th the width of a proton. This video from Veritasium looks at how the scientists and engineers accomplished such an amazing feat.

Annotating qualitative social science

Parameters, Colin Elman and Diana Kapiszewski


A vibrant conversation is underway about how research data should be curated, managed, and shared. While these were not initially prominent questions in qualitative social science traditions, more recently, discussions have ensued across a wide range of scholarly contexts.

Video Games Are Better Without Stories

The Atlantic, Ian Bogost


A longstanding dream: Video games will evolve into interactive stories, like the ones that play out fictionally on the Star Trek Holodeck. In this hypothetical future, players could interact with computerized characters as round as those in novels or films, making choices that would influence an ever-evolving plot. It would be like living in a novel, where the player’s actions would have as much of an influence on the story as they might in the real world.

It’s an almost impossible bar to reach, for cultural reasons as much as technical ones. One shortcut is an approach called environmental storytelling. Environmental stories invite players to discover and reconstruct a fixed story from the environment itself. Think of it as the novel wresting the real-time, first-person, 3-D graphics engine from the hands of the shooter game. In Disneyland’s Peter Pan’s Flight, for example, dioramas summarize the plot and setting of the film. In the 2007 game BioShock, recorded messages in an elaborate, Art Deco environment provide context for a story of a utopia’s fall. And in What Remains of Edith Finch, a new game about a girl piecing together a family curse, narration is accomplished through artifacts discovered in an old house.

Sociologists urge use of big data to study human interaction

Stanford University, Stanford News


In a recently published paper, a group of Stanford sociology experts encourage other sociologists and social psychologists to focus on developing online research studies with the help of big data in order to advance the theories of social interaction and structure.

Companies have long used information they gather about their online customers to get insights into performance of their products, a process called A/B testing. Researchers in other fields, such as computer science, have also been taking advantage of the growing amount of data.

But the standard for many experiments on social interactions remains limited to face-to-face laboratory studies, said Paolo Parigi, a lead author of the study, titled “Online Field Experiments: Studying Social Interactions in Context.”

20 Data Science Professors to Know

Online Engineering Programs


Anyone who is interested in a career in data science is encouraged to enroll in a degree program in this field. Luckily, as the demand for these skilled professionals grows, so too has the number of on-campus and online data science programs at all levels. To give prospective students a feel for this nascent field, here are 20 exceptional data science professors at colleges across the country.

Variable habitat conditions drive species covariation in the human microbiota

PLOS Computational Biology; Charles K. Fisher et al.


The human body is inhabited by a vast number of microorganisms comprising the human microbiota. The species composition of the microbiota varies considerably from person-to-person and the relative abundances of some species rise and fall in concert. We introduce a mathematical model where differences in habitat conditions cause most of the variability of the microbiota. A statistical analysis shows that variable habitat conditions are sufficient for explaining the patterns of variation observed across a healthy human population and, as a result, the correlation between the relative abundances of two species reflects how closely related they are rather than how they directly interact with each other.

Shoto’s AI-powered photo aggregator zooms in on the events industry

VentureBeat, Stewart Rogers


Creating a buzz around your event can be hard, even in a social media-driven world. And providing attendees with extra value is often difficult too.

Today, Shoto – the photo aggregator app — has released a major AI-powered update that focuses on solving these exact problems.

The app itself is a simple affair. Shoto was originally intended to let family and friends bring photos and videos, taken on separate devices, into a single album. This new edition takes that foundation and adds the ability for event organizers to create shareable albums that not only pull in content associated with the inevitable event hashtag — from public Twitter and Facebook posts — but also to pull together photos taken at the right time and in the right vicinity, via the Shoto app.

Why Poverty Is Like a Disease – Emerging science is putting the lie to American meritocracy.

Nautilus, Christian H. Cooper


I feel as though I am wired for a permanent state of fight or flight, waiting for the other shoe to drop, or the metaphorical week when I don’t eat. I’ve chosen not to have children, partly because—despite any success—I still don’t feel I have a safety net. I have a huge minimum checking account balance in mind before I would ever consider having children. If you knew me personally, you might get glimpses of stress, self-doubt, anxiety, and depression. And you might hear about Tennessee.

Meet anyone from Tennessee and they will never say they are from “just” Tennessee. They’ll add a prefix: East, West, or Middle. My early life was in East Tennessee, in an Appalachian town called Rockwood. I was the eldest of four children with a household income that couldn’t support one. Every Pentecostal church in the surrounding hillbilly heroin country smelled the same: a sweaty mix of cheap cleaner and even cheaper anointing oil, with just a hint of forsaken hope. One of those forsaken churches was effectively my childhood home, and my school.

Waymo vs. Uber: 8 Things I Learned From Anthony Levandowski Taking the Fifth

IEEE Spectrum, Mark Harris


1. Questions can be just as informative as answers

Although Levandowski’s answers were identical, I learned a lot from Waymo’s questions. It seems Waymo now thinks that Levandowski was deceiving Google almost from the moment it hired him to work on the Street View maps project back in 2007. Google first had concerns when it found out that Levandowski was working with his own startups, 510 Systems and Anthony’s Robots, to build a self-driving car, as first revealed in IEEE Spectrum.

“When Google discovered that you were involved in 510 Systems and Anthony’s Robots, it was concerned about potential conflicts,” said Waymo’s lawyer, David Perlson. “You used confidential information from Google to help develop technology at 510 Systems; correct?” He went on to accuse Levandowski of using Street View code to calibrate 510’s Velodyne lidar, and in the startup’s self-driving car technology.

Cloud Produces Sunny Earnings at Amazon, Microsoft and Alphabet

The New York Times, Steve Lohr


The moneymaking machines at the core of Amazon, Microsoft and Alphabet, the parent company of Google, are notably different. But the respective kings of online retailing, software and internet search should all credit a relatively new line of business for lifting their financial results.

In their quarterly earnings reports on Thursday, the three said cloud computing — through which they rent computing services and online access to software hosted in their data centers — was growing faster than their larger, older businesses.

Rising carbon dioxide levels, ocean acidity may change crucial marine process

Florida State University News


The effects of climate change on Trichodesmium have been studied extensively by scientists in labs across the globe but with widely different results. Some scientists found that increased carbon dioxide in ocean waters caused a decline in nitrogen fixation, while others saw huge increases. Because of the large role these bacteria play in the health of the Earth’s oceans, Kranz and his colleagues sought to resolve the discrepancies.

Some of these discrepancies, they found, are based on the preparation of the water in which these organisms typically grow under laboratory conditions. For example, the researchers found contamination by elements such as ammonia or toxic elements like enhanced copper concentration.

“Any slight differences in the specific ingredients of the water — in this case artificial seawater that scientists prepare — can have a huge effect on the outcome,” Kranz said.

A turbo engine for tracing neurons

Allen Institute for Artificial Intelligence


Putting a turbo engine into an old car gives it an entirely new life—suddenly it can go further, faster. That same idea is now being applied to neuroscience, with a software wrapper that can be used on existing neuron tracing algorithms to boost their ability to handle not just big, but enormous sets of data. The wrapper, called UltraTracer, is highlighted this month in Nature Methods.

New statistical methods would let researchers deal with data in better, more robust ways

The Conversation, Rand Wilcox


No matter the field, if a researcher is collecting data of any kind, at some point he is going to have to analyze it. And odds are he’ll turn to statistics to figure out what the data can tell him.

A wide range of disciplines – such as the social sciences, marketing, manufacturing, the pharmaceutical industry and physics – try to make inferences about a large population of individuals or things based on a relatively small sample. But many researchers are using antiquated statistical techniques that have a relatively high probability of steering them wrong. And that’s a problem if it means we’re misunderstanding how well a potential new drug works, or the effects of some treatment on a city’s water supply, for instance.

More driverless cars coming to Massachusetts roads?

The Boston Globe, Adam Vaccaro


Three more companies have approached the Baker administration with plans to test self-driving cars on Massachusetts roads, including one considering a shuttle service at a sprawling development in Weymouth.

Delphi, the international auto parts manufacturer, and Optimus Ride, a startup spun out of the Massachusetts Institute of Technology, have each submitted an application for road tests to the Massachusetts Department of Transportation, according to spokeswoman Jacquelyn Goddard.

Meanwhile Paris-based Navya, which is developing driverless shuttle buses, has approached the state with a “preliminary plan” to shuttle passengers around Union Point, a new minicity on the site of the former Naval Air Station in South Weymouth, Goddard said.


2017 InnoVention Finals

NYU Entrepreneurial Institute


New York, NY InnoVention is an NYU startup competition. May 4 at 5 p.m., Galvanize NYC [$]

Save the Date! June 8-9 Workshop on Graduate Training in the Social and Behavioral Sciences

The National Academies of Sciences, Engineering, Medicine; Behavioral and Social Sciences and Education, Board of Science Education


Washington, DC The two day workshop will help identify educational changes that may be needed in order to better prepare PhD students in the social and behavioral sciences (SBS) for the job market and scientific challenges of the future. [free, registration required]


The AI NexusLab

Applications for the next AI NexusLab cohort close Wednesday, May 3rd and conclude with the next Future Labs AI Summit in November.

If you are and AI startup or you are familiar with any early stage artificial intelligence startups who you think could benefit from our program, please have them apply at

Important dates – Data Science for Human Capital Management workshop

New Orleans, LA Workshop is November 18. Deadline for submissions is August 7.
NYU Center for Data Science News

AI Summit 2017

YouTube, Future Labs


Highlights from the April 5 event at NYU.

Tools & Resources

Talking to your boss about data

MIT Sloan School of Management, Brian Eastwood


One of the biggest challenges in data analytics is presenting results in a way that’s meaningful to people who aren’t data scientists. As MIT Sloan Master of Business Analytics student Souhail Halaby pointed out, there’s a model that shows that the winner of the Super Bowl can predict the next year’s stock market performance.

While this is interesting, it’s little more than the butterfly effect, Halaby said. “You need human intuition to determine whether the correlation is important,” he said.

Halaby, and more than a dozen other students who enrolled in the new master’s program, learned how to apply a human touch to mathematical models in a five-day class offered during the Sloan Innovation Period last October.

Reconciliation · OpenRefine/OpenRefine Wiki

GitHub – OpenRefine


You can use OpenRefine to perform reconciliation of names in your data against any database that exposes a web service following this Reconciliation Service API specification. One such database is Wikidata, and in this document we will use it as our example service.

Responsive scrollytelling best practices

The Pudding, Russell Goldenberg


I recently wrote an article about how to implement scrollytelling with six different libraries. In this post, we will discuss all things mobile; from planning, to choices, to lessons learned. Let’s get the elephant in the room out of the way first.

When do you start on mobile?

Advanced Text Analysis with SpaCy and Scikit-Learn

GitHub – JonathanReeve


This notebook was originally prepared for the workshop Advanced Text Analysis with SpaCy and Scikit-Learn, presented as part of NYCDH Week 2017. Here, we try out features of the SpaCy library for natural language processing. We also do some statistical analysis using from the scikit-learn library.

Machine learning on mobile: on the device or in the cloud?

Machinethink, Matthijs Hollemans.


So you’ve decided it’s time to add some of this hot new machine learning or deep learning stuff into your app… Great! But what are your options?

Not Location, not Location, not Location

Medium, Towards Data Science, Thierry Elias


For this week’s project, kaggle is on the menu, specifically the Two Sigma Connect: Rental Listing Inquiries challenge. I had recently attempted to build a data set about real estate sales in NYC; unfortunately, the captcha barriers put up by most real estate websites made that effort unfruitful.

This competition is co-hosted by Two Sigma and RentHop (a portfolio company of Two Sigma Ventures, which is a division of Two Sigma Investments) to encourage creativity in using real world data to solve everyday problems.


Full-time positions outside academia

Community Engagement and Graphic Design Specialist

City of Seattle; Seattle, WA

Leave a Comment

Your email address will not be published.