Data Science newsletter – March 3, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for March 3, 2017


Data Science News

Data Visualization of the Week

Twitter, Rainmaker1973


Tweet of the Week

Twitter, Such_Politics


How millions of kids are being shaped by know-it-all voice assistants

The Washington Post, Michael S. Rosenwald


As millions of American families buy robotic voice assistants to turn off lights, order pizzas and fetch movie times, children are eagerly co-opting the gadgets to settle dinner table disputes, answer homework questions and entertain friends at sleepover parties.

More state investment needed to meet demand for engineers

The Seattle Times, Opinion, Michael B. Bragg


More kids are graduating from high school with the passion and the qualifications to pursue higher education in STEM fields. But we don’t have the capacity in our universities to provide these students with the education required to qualify for a bow wave of engineering jobs.

What’s new at AI2? Allen Institute wins honors, and hints at more to come

GeekWire, Alan Boyle


Etzioni said the institute is sharpening its focus on the moonshot challenges that artificial intelligence can address. “You can think of us as Seattle’s version of Google X, the nonprofit version,” he said.

Netflix uses AI in its new codec to compress video scene by scene

Quartz, Joon Ian Wong


Annoying pauses in your streaming movies are going to become less common, thanks to a new trick Netflix is rolling out. It’s using artificial intelligence techniques to analyze each shot in a video and compress it without affecting the image quality, thus reducing the amount of data it uses. The new encoding method is aimed at the growing contingent of viewers in emerging economies who watch video on phones and tablets.

Should artificial intelligence be used in science publishing?

Public Radio International, Science Friday


Increasingly, publishers are experimenting with using artificial intelligence in the peer review process for scientific papers. In a recent op-ed for Wired, one editor described how computer programs can handle tasks like suggesting reviewers for a paper, checking an author’s conflicts of interest and sending decision letters. [audio, 4:30]

AI Scientists Gather to Plot Doomsday Scenarios (and Solutions)

Bloomberg Technology, Dina Bass


Artificial intelligence boosters predict a brave new world of flying cars and cancer cures. Detractors worry about a future where humans are enslaved to an evil race of robot overlords. Veteran AI scientist Eric Horvitz and Doomsday Clock guru Lawrence Krauss, seeking a middle ground, gathered a group of experts in the Arizona desert to discuss the worst that could possibly happen — and how to stop it.

Their workshop took place last weekend at Arizona State University with funding from Tesla Inc. co-founder Elon Musk and Skype co-founder Jaan Tallinn. Officially dubbed “Envisioning and Addressing Adverse AI Outcomes,” it was a kind of AI doomsday games that organized some 40 scientists, cyber-security experts and policy wonks into groups of attackers — the red team — and defenders — blue team — playing out AI-gone-very-wrong scenarios, ranging from stock-market manipulation to global warfare.

Machine Learning Is Bringing the Cosmos Into Focus

The Atlantic, Adrienne LaFrance


Early astronomers didn’t just gaze up each night to produce detailed sketches of celestial bodies. They also tracked the movement of those bodies across the sky over time. They developed an understanding of Earth’s movement as a result. But to do that, they had to collect loads of data.

It makes sense, then, that computers would be such useful tools in modern astronomy. Computers help us program rocket launches and develop models for flight missions, but they also analyze deep wells of information from distant worlds. Ever larger telescopes have illuminated more about the depths of the universe than the earliest astronomers ever could have dreamed.

Predicting Marketing Campaign Response Using Amazon Machine Learning

Gigaom, Chris Mohritz


The formula for a successful marketing campaign can’t be pinned down to a single factor. In fact, it could take a mountain of data to figure out the best strategy.

That’s where A.I. — or more specifically, machine learning — comes in.

Making math more Lego-like

Harvard Gazette


The Harvard trio of Arthur Jaffe, the Landon T. Clay Professor of Mathematics and Theoretical Science, postdoctoral fellow Zhengwei Liu, and researcher Alex Wozniakowski has developed a 3-D picture-language for mathematics with potential as a tool across a range of topics, from pure math to physics.

Though not the first pictorial language of mathematics, the new one, called quon, holds promise for being able to transmit not only complex concepts, but also vast amounts of detail in relatively simple images.

How Our Hotel Used Data to Make Our Laundry Service Glamorous

Harvard Business Review, Ana Brant of Dorchester Collection


To figure out how to act on the laundry issue cropping up in our data, we had to start by better understanding what our guests really wanted from us (beyond clean, undamaged clothes being returned on time). By diving into the data, we were able to dramatically reduce customer dissatisfaction, increase customer loyalty, and develop new, differentiating service offerings.

Scientists have long been afraid of engaging in ‘advocacy.’ A new study says it may not hurt them

The Washington Post, Chris Mooney


In the new study published in the journal Environmental Communication, George Mason University’s John Kotcher and colleagues from George Mason and the University of Wisconsin at Madison probe the consequences of scientist advocacy using a representative online sample of 1,235 Americans. The study’s respondents were presented with six experimentally varied examples of a supposed Facebook post by a climate scientist named “Dr. Dave Wilson,” who in the post flags a recent media interview he has done and, elaborating on it, takes a variety of stances, representing increasingly sharp forms of advocacy.

University Data Science News

Timnit Gebru, working in Fei Fei Li‘s lab at Stanford, along with colleagues elsewhere, used data from Google‘s street view cameras to replicate the demographic information in the US Census. They were able to get remarkably decent estimates based on make and model of cars in a short period of time. Let’s not get itchy to do away with the Census or the American Community Survey. The survey methods for the Census and ACS have better validity, are much more accurate, and provide a robust set of data going back decades.

The eScience Institute at the University of Washington-Seattle is using part of a $1m commitment from Microsoft to fund its Data Science for Social Good summer internship program. The new funding supports a broader Cascadia Urban Analytics Cooperative in which UDub is joining the University of British Columbia to work on challenges facing the Cascadia region using data science.

MIT’s Tech Review named reinforcement learning and botnets of things to its list of 10 Breakthrough Technologies in 2017.

John D. Halamka, CIO and dean for technology at Harvard Medical School has colorful list of highlights from HIMSS. Wearables are in “the trough of disillusionment” and AI/ML is “at the peak of the hype curve”. He’s smart enough to follow that last observation with, “we need to be measured about our expectations….computers do not ‘think’.”

Elsewhere in health data, cybersecurity expert Bruce Schneier talks about how your medical data are currently being used.

We sadly announced the passing of Swedish statistician and data visualization expert Hans Rosling several weeks ago. Hear five visualization designers pay tribute to his work on Data Stories.

Andre Esteva at Stanford is using convolutional neural networks to diagnose skin cancer.

One to watch: Nextstrain has won an Open Science Prize for its intelligent data aggregation tool that combines viral outbreak and pathogen genomic data to aid epidemiology.

Statistics is so hot at Harvard that faculty are straining to keep up with the influx of new concentrators. They plan to hire as many as five new faculty this year alone.

Jennifer Widom, a professor of computer science and of electrical engineering, has been named dean of the School of Engineering at Stanford.

Penn State University is installing a high-performance CyberLAMP cluster to combine CPUs, NVIDIAs GPUs, and Intel‘s Xeon Phi processors with NSF funds to serve physics and material science. The University of Illinois at Urbana-Champaign also announced a super computer install this week. Called ROGER it is the first CyberGIS supercomputer for geospatial data processing.

Cameron Neylon and colleagues argue that the reason academics cannot resolve the status of preprints is that they, “tend to conflate the history of the manuscript and the status granted it by different communities”. They offer a framework for disambiguating the “state” and the “standing” of the preprint.

Oxford will be home to the new inter-disciplinary Rosalind Franklin Research Institute which has received a commitment of 100m pounds from the British government and key industry partners to focus on new instrumentation for the physical sciences.

The Dean of Engineering at the University of Washington called for more state funding to support growing student demand for STEM degrees and provide Seattle’s growing tech sector with a sufficient number of talented employees.

Arizona State University hosted top AI researchers in an event where they were tasked with presenting the worst possible outcomes they could imagine for AI applications. Much like the world of design where endless options are often explored through tinkering rather than scientific rigor, the “AI doomsday” symposium tested the limits of AI via imagination. In design fields, the imaginative process is productive, not preventative. It seems dangerous to port the strategy from one context to another.

Gary King on Big Data Analysis

audioBoom, Social Science Bites podcast


When looking at big data, says computational social scientist Gary King, “The data itself isn’t likely to be particularly useful; the question is whether you can make it useful.” In this Social Science Bites podcast, he explains more about the importance of data analysis. [audio, 26:00]


Women Data Scientists + Columbia Lecturer Host NYC Coding ‘Sprint’

Columbia University, Data Science Institute


New York, NY NYC chapter of Women in Machine Learning and Data Science (WiMLDS) has partnered with Columbia Data Science Institute lecturer Andreas Mueller to host a ‘sprint’ on March 4 for volunteer programmers to contribute to Python’s data-science toolkit, scikit-learn. [$]


Survey on Generative Music

“At the IT University of Copenhagen we’re trying to validate an experimental music improvisation technique.” … “It would be a great if you could help us gather some data! It shouldn’t take more than ten minutes.”

PEARC17 | Practice & Experience in Advanced Research Computing

New Orleans, LA Conference is July 9–13. Deadline for submissions is March 13.

Data Science Game

Registration is now open!
Register your team before Sunday, April 9.
Tools & Resources

Ozlo releases a suite of APIs to power your next conversational AI

TechCrunch, John Mannes


“Building on its promise to give the entrenched a run for their money, conversational AI startup Ozlo is making its meticulously crafted knowledge layer available for purchase.”

NEON’s data API

GitHub – NEONInc


The API documentation is hosted at

ARM’s soft launch for machine learning library

EE Times, Peter Clarke


ARM is introducing a free library of popular machine learning and computer vision routines that has been optimized to run on ARM CPUs and GPUs.

usql: usql is a universal command-line interface for SQL databases

GitHub – knq


usql provides a universal command line interface for the following databases: PostgreSQL, MySQL, Oracle, SQLite, and Microsoft SQL Server.

[1703.00426] HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving

arXiv, Computer Science > Artificial Intelligence; Cezary Kaliszyk, François Chollet, Christian Szegedy


Large computer-understandable proofs consist of millions of intermediate logical steps. The vast majority of such steps originate from manually selected and manually guided heuristics applied to intermediate goals. So far, machine learning has generally not been used to filter or generate these steps. In this paper, we introduce a new dataset based on Higher-Order Logic (HOL) proofs, for the purpose of developing new machine learning-based theorem-proving strategies. We make this dataset publicly available under the BSD license. We propose various machine learning tasks that can be performed on this dataset, and discuss their significance for theorem proving. We also benchmark a set of simple baseline machine learning models suited for the tasks (including logistic regression, convolutional neural networks and recurrent neural networks). The results of our baseline models show the promise of applying machine learning to HOL theorem proving.


Internships and other temporary positions

Gearbox Vibration Data Analysis and Mining Internship

NASA Glenn Research Center; Cleveland, OH

Data Science for Space Communication Internship

NASA Glenn Research Center; Cleveland, OH

Data Analytics and Visualization for Facility Test Data

NASA Glenn Research Center; Cleveland, OH

Data For Impact Fellowship

SocialCops; New Delhi, India

NumFOCUS Google Summer of Code 2017

NumFOCUS; Austin, TX
Full-time positions outside academia

Data Science Lead

Battelle; Boulder, CO

Visiting Postdoctoral Fellowships “Algorithmed Public Spheres”

Hans Bredow Institute; Hamburg, Germany

Leave a Comment

Your email address will not be published.