Data Science newsletter – June 2, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 2, 2017


Data Science News

Tweet of the Week

Twitter, Eliel Cruz-Lopez


From Startup to Scaleup – Sam Altman and Reid Hoffman at The Scaleup Offsite

Y Combinator


Sam Altman sits down with Reid Hoffman to talk about the key changes founders should think about as they transition from a startup to a scaleup. [video, 41:02]

LIGO detects merging black holes for third time

MIT News


The collision of a pair of colossal, stellar-mass black holes has made itself heard, nearly 3 billion light years away, through a cosmic microphone on Earth.

On Jan. 4, the Laser Interferometry Gravitational-wave Observatory (LIGO) picked up a barely perceptible signal that scientists quickly determined to be a gravitational wave — a ripple of energy passing through the curvature of spacetime. The event, published today in Physical Review Letters, marks the third direct detection of a gravitational wave.

The WHO’s new boss is tasked with an impossible job

Vox, Julia Belluz


Tedros Adhanom Ghebreyesus inherits a fraught organization. Throughout the recent Ebola epidemic, the agency was universally admonished for being too slow to respond. And one of the top reasons cited for the delay is that the WHO is underfunded, understaffed, and underpowered. It’s also losing relevance as other global health players, like the Bill and Melinda Gates Foundation, have become more prominent.

Seeing Soil Moisture from the Sky

Eos, Jian Peng


It’s a hidden part of the hydrological cycle but moisture in the soil plays an important role in many processes and influences the climate. Soil moisture can be measured by instruments in the field or by remote sensing techniques. A recent review article in Reviews of Geophysics discussed soil moisture data obtained from satellite observations and the methods for “downscaling” the information so that it can be applied in particular contexts. The editors asked one of the authors some questions about how techniques are advancing in this field.

Facebook, Google Urge Congress to Reform NSA Surveillance

Government Technology, Giuseppe Macri


A group of Silicon Valley tech giants are urging Congress to reform National Security Agency authority that empowers the agency to potentially spy on millions of Americans incidentally while surveilling foreign targets.

Facebook, Google, Microsoft, and Twitter are among the 30 tech companies, trade groups, and lobbyists asking Congress to reform Section 702 of the 2008 Foreign Intelligence Surveillance Act (FISA) Amendments Act — authority the agency uses to tap the physical infrastructure of the internet, such as undersea fiber cables, to surveil the content of foreigners’ emails, instant messages, and other communications as they exit and enter the U.S.

The machine learning paradox

O'Reilly Radar, Mike Loukides


Nothing says machine learning can’t outperform humans, but it’s important to realize perfect machine learning doesn’t, and won’t, exist.

Microsoft releases version 2.0 of its deep learning toolkit

TechCrunch, Frederich Lardenois


Microsoft today launched version 2.0 of what is now called the Microsoft Cognitive Toolkit. This open-source toolkit, which was previously known as CNTK, is Microsoft’s competitor to similar tools like TensorFlow, Caffe and Torch, and, while the first version was able to challenge many of its competitors in terms of speed, this second version puts an emphasis on usability (by adding support for Python and the popular Keras neural networking library, for example) and future extensibility, while still maintaining — and improving — its speed.

Government Data Science News

NIH reimbursement rates could drop to a 10% flat rate (from around 50%) if Trump’s budget proposal becomes law. Congress is unlikely to approve this drastic cut, but a 2014 rule change appears to allow the White House administration to act unilaterally. If the proposal passes, universities with large endowments may be able to fund what the government will not, but many universities and researchers will be left scrambling to keep the refrigerators running, the lights on, and the mouths (of lab animals) fed.

Elsewhere at the NIH, a project is underway to determine whether anonymous peer review might be able to root out racism against black grant applicants. Run by third-party evaluator Social Solutions International, “the study will compare 400 black and white applicants, matched by research topic, gender, degree, type of institution, and original score. An additional 400 white applicants will be chosen at random.” Concern abounds that it is difficult to fully anonymize research proposals in small fields and that scientists will “not rest easy” when it comes to approving proposals until they know who they are reviewing.

Giant federal government contracts for data services are available: $553 million with NOAA and $785 million for U.S. Army Intelligence Center of Excellence.

Measures for Justice (link) is a non-profit established to gather and disseminate data that will reveal “the numbers and facts behind a covert criminal justice system often criticized for its lack of transparency.” The group already has six states worth of data that it had to go from county to county to obtain. It has also received millions in funding support from Google and the Zuckerberg Chan Foundation.

If you’re still scratching your head trying to figure out how Trump won enough votes to take the electoral college and move into the White House, you might want to read this new summary of the three main reasons voting projections were wrong. What is not at all in doubt?: Hilary won the popular vote. By a sizable margin.

City planners in Des Moines, Iowa, plan to hire an intern to map the region’s 600+ miles of bicycle trails with a custom, lime-green “data bike” outfitted with a 360-degree camera and a bump-detecting vibration measurement app.

Imagine a job with global responsibilities for worldwide health, now imagine it’s also a seriously underfunded beauracracy and a punching bag for critics; that’s what Tedros Adhanom Ghebreyesus walks into as the new director of the World Health Organization.

Canada’s Federal and provincial governments have invested hundreds of millions to retain top AI talent, refusing to lose them to Google, Facebook, Amazon, and other US-based tech companies. Unfortunately, while Canada has excellent academic institutions, it is lacking in powerhouse tech companies of its own and may be unable to keep top AI researchers in the country. Losing top AI researchers may not be great for Canada, but it may also be better for the field to avoid clustering all top talent in a few schools and companies.

Yisong Yue’s answer to If you were designing an undergraduate ML course, what would the curriculum encompass? What books would you use?



I would choose A Course in Machine Learning by Hal Daumé III. The topics covered in that book strongly intersects with the topics covered in my Machine Learning & Data Mining course. These topics offer a great mix of fundamental concepts and practically relevant tips.

interdisciplinary inference meetings

David Hogg, Hogg's Research blog


Justin Alsing (Flatiron) organized an interdisciplinary meeting at Flatiron across astrophysics, biology, and computing, to discuss topics of mutual interest in inference or inverse problems. Most of the meeting was spent with us going around the room describing what kinds of problems we work on so as to find commonalities.

Don’t Expect Tech to Care About Your Problems

Bloomberg View, Cathy O'Neil


The entrepreneurs of Silicon Valley are undoubtedly finding many ways to make the world a better place — with tunnels, flying cars, interplanetary travel. Yet I can’t help noticing a growing divide between the problems people have and the problems tech companies are willing or able to solve.

This divide has two sources. The first is related to inequality: When people live in an elite bubble, they don’t experience the everyday frictions of normal life. So they focus on issues that would barely register for the rest of us workaday saps. That’s how you get Juicero, or for that matter Theranos, which was supposed to solve the problem of people who don’t like getting blood drawn. Even as someone who regularly faints at the doctor’s office, and whose veins are tricky to find, I don’t think that rises to the level of a real problem.

The Code for Facial Identity in the Primate Brain

Cell; Le Chang, Doris Y. Tsao


Primates recognize complex objects such as faces with remarkable speed and reliability. Here, we reveal the brain’s code for facial identity. Experiments in macaques demonstrate an extraordinarily simple transformation between faces and responses of cells in face patches. By formatting faces as points in a high-dimensional linear space, we discovered that each face cell’s firing rate is proportional to the projection of an incoming face stimulus onto a single axis in this space, allowing a face cell ensemble to encode the location of any face in the space. Using this code, we could precisely decode faces from neural population responses and predict neural firing rates to faces. Furthermore, this code disavows the long-standing assumption that face cells encode specific facial identities, confirmed by engineering faces with drastically different appearance that elicited identical responses in single face cells. Our work suggests that other objects could be encoded by analogous metric coordinate systems.

The state of AI adoption in media

Digiday, Max Willens


The duopoly has gobbled up artificial intelligence startups, and it’s hungry for more.

Research released earlier this month by CB Insights found it acquired 34 AI startups in the first quarter of 2017, which more than doubled the total acquired during the same period last year and easily surpassed the previous high of 28, which came in the third quarter of last year. Last week, Axios reported that Google has launched an AI-specific venture fund, one that’s overseen by engineers rather than venture capitalists, to hand out investments up to $10 million.

Penn Study Links Heart Rate to Gender Gap in Criminal Offending

University of Pennsylvania, Penn News


In the field of criminology, it is well established that men engage in more crime than women. Now, a new study from the University of Pennsylvania published in the journal Criminology, addresses the incomplete understanding of why males are more criminal than females by examining gender differences in biological functioning and behavior. It is the first study to demonstrate that men’s lower resting heart rate partly explains the higher rate of criminal offending.

Olivia Choy, who graduated this month with a Ph.D. from Penn’s Department of Criminology in the School of Arts & Sciences and is joining Nanyang Technological University as an assistant professor in July, conducted the research in Penn Integrates Knowledge Professor Adrian Raine’s lab.

How an Icon of Evolution Lost Its Flight

The Atlantic, Ed Yong


Leonid Kruglyak wanted to know why this bird couldn’t take to the skies. Specifically, as a geneticist, he wanted to know what genetic changes had grounded it. When he got back to his lab, he reached out to a research team that had collected blood samples from 223 flightless cormorants—almost a quarter of the total endangered population. He and his own team used these samples to sequence the cormorant’s genome, then compared its DNA to that of three other cormorant species, looking for mutations that are unique to the flightless one, and that are likely to alter its genes in important ways.

Artificial Intelligence for Everyone: A Next-Generation Customer Experience

MIT Technology Review, Jason Pontin


Organizations today are challenged with mining, analyzing and putting to good use the vast amounts of data now flooding in from multiple checkpoints. How to utilize that data to get closer to internal and external customers? Jason Pontin sits down with Rob DeSisto, Chief Value Officer for Salesforce, to discuss the Salesforce Einstein AI initiative and how it is enabling organizations to glean better value from customer data, without needing a staff full of data scientists—it’s the democratization of AI. [video, 18:01]

Here’s How to Track the Smartphone Apps That Are Tracking You

Fast Company, Glenn Fleishman


“When applications ask for permissions, that is not really done in a manageable way,” says Franziska Roesner, an assistant professor in computer science and engineering at the University of Washington, who researches computer security and privacy. “iOS doesn’t know necessarily whether it’s reasonable for an application to use your location, and that’s why they ask the user,” she says.

Apple has to rely on a developer’s disclosure as to what’s being done with that location data. Some of Roesner’s work tries to match up an app’s purpose and interface elements with the kind of permission being asked, to make sure a request isn’t misused.

Higher ed stepping in to fill cybersecurity gaps



In recent weeks, a pervasive ransomware attack affected systems throughout the world, causing chaos in National Health System hospitals in Great Britain and continuing to cripple hundreds of thousands of computers.

The increase in cyberattacks has led to an increase in demand for qualified cybersecurity employees in government and private industry. Colleges and universities throughout the country are responding by offering degrees, certificates and tutorials in the burgeoning field. Texas A&M University at College Station recently instituted a minor degree program, while other institutions continue to find increased support and interest in cybersecurity degrees.

Data Visualization of the Week

Twitter, Science Friday



Boulder, CO Challenge Session

West Big Data Innovation Hub


Boulder, CO, and online June 5-6 as part of the West Hub Annual All Hands Meeting open to all. The Challenge will be mentioned during the Livestream June 5, 9am-12:30pm MT [free]

Design for America Leadership Studio 2017

Design for America


Evanston, IL August 3-7 at Northwestern University. This year, DFA teams will explore: “How can we expand accessibility in urban areas?” [$$]

Computation+Journalism Symposium

Northwestern University


Evanston, IL October 13-14 at Northwestern University [$$]


Apply to Become a Carpentry Instructor Trainer!

The Carpentry community is growing! This month we welcomed ten new Instructor Trainers to our community. Now we are looking for the next group of new Trainers. Deadline for applications is June 14.
Tools & Resources


Sakshi Pratap, Anand Sainath, John Stasko


“To help analysts and researchers better explore, analyze, and make sense of such document collections, we have developed the visual analytics system, Jigsaw. The original version of Jigsaw, available at the link above, runs as a desktop Java application. To reach a broader audience and leverage advances in web technologies and tools, we have developed a web-based version of the system that we call WebJigsaw.”

R Interface to Keras

R Studio


“Keras is a high-level neural networks API developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.”

Don’t use deep learning your data isn’t that big

Simply Statistics, Jeff Leek


“When your dataset isn’t that big, doing something simpler is often both more interpretable and it works just as well due to potential overfitting. To test this idea I’m going to do an experiment on the digits data. I’m going to build a model just to predict one versus zero. I’m going to do that using logistic regression and I’m going to use a deep neural network.”

Correcting bias in meta-analyses: What not to do (meta-showdown Part 1)

Felix Schönbrodt's blog


“Publication bias and p-hacking can dramatically inflate effect size estimates in meta-analyses. Many methods have been proposed to correct for such bias and to estimate the underlying true effect. In a large simulation study, we found out which methods do not work well under which conditions, and give recommendations what not to use.”

Computational Publishing

GitHub – odewahn


This is draft for an article I’m working on for O’Reilly Media about the Jupyter Publishing landscape. If you have any thoughts you’d care to share, please post an issue at [odewahn/computational-publishing](

JupyterLab: The evolution of the Jupyter web interface

O'Reilly Radar, Nicole Tache


“Project Jupyter co-founder Brian Granger on the JupyterLab project, its potential role in scientific and tech communities, and the expanding role of notebooks.”

Beautifully Animate Points with WebGL and regl

Peter Beshai


“I’ve got a fever, and the only prescription is more points dancing across my screen. In a previous blog post, I covered how to animate thousands of points using HTML5 canvas and d3, but that approach doesn’t scale too well beyond 10,000 points. Perhaps you can push it to 20,000 if you don’t mind dropping a frame here or there, but if you need that buttery smoothness, you’re going to want to switch from the 2D canvas to using WebGL. And who doesn’t need buttery smoothness?”

Rust: A Scala Engineer’s Perspective

Lloyd Chan, BeachApe blog


The 1st year anniversary of my first line of Rust code is coming up, and it’s getting for 5 years since I wrote my first line of Scala code. I thought it would be a good idea to summarise my Scala-tinted perspective of The Rust Experience TM, one year on.

IPUMS International – Harmonized International Census Data For Social Science and Health Research



IPUMS-International is dedicated to collecting and distributing census data from around the world. The project goals are to collect and preserve data and documentation, harmonize data, and disseminate the harmonized data free of charge.



Postdoctoral Fellow in Star Formation, Exoplanets, or Stellar Astrophysics

University of Texas; Austin, TX
Full-time positions outside academia

Research Scientist

Google Brain; Cambridge, MA

Leave a Comment

Your email address will not be published.