Data Science newsletter – August 23, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 23, 2018


Data Science News

No more excuses for non-reproducible methods

Nature, World View, Lenny Teytelman


… Now should be springtime for methods sharing. Mobile-friendly, web-based technologies are maturing just as the need to improve reproducibility has gained widespread attention. A new era of more-efficient, more-confident science is ours to lose.

My own obsession with sharing methods started with an all-too-common frustration. I spent the first year and a half of my postdoc working out a protocol for single-cell microscopy. Assiduous tinkering showed that subtle changes to sample preparation were crucial. Instead of 1 microlitre of a chemical, I needed 5. Instead of a 15-minute incubation, I needed an hour. Alas, the general technique had already been published, so I got no credit for the work. Anyone else using the published recipe would have been either getting misleading results, or sharing my frustration at having had to waste time discovering the necessary adjustments for themselves — hence my enthusiasm for a central place to update protocols and share tips.

ASU receives multimillion grant from NSF to create a national biorepository

Arizona State University, ASU Now


Ecosystems and natural resources are rapidly changing across our nation and around the world. Now, more than ever before, measuring the causes and effects of environmental change is taking on greater importance.

The National Ecological Observatory Network (NEON) has named Arizona State University as its primary location to house a national biorepository for the next 30 years. Hundreds of thousands of biological samples collected over the next three decades from 81 field sites across the U.S., including Alaska, Hawaii and Puerto Rico, will be curated by ASU’s Biodiversity Knowledge Integration Center (BioKIC) and Natural History Collections and be made available to the greater scientific community.

NEON is the first-ever, continental-scale ecological observation facility sponsored by the National Science Foundation and managed by Battelle Memorial Institute.

Can data reveal the saddest number one song ever?

BBC Culture, Miriam Quick


Data journalist Miriam Quick put Spotify’s new algorithm to the test, analysing over 1000 tracks to find the saddest pop songs to top the charts. The results were surprising.

Why Universities Need ‘Public Interest Technology’ Courses

WIRED, Ideas, Susan Crawford


In traditional academic settings, every public-policy-minded technologist feels a little lonely, fruitlessly attempting to cross-register into courses that might provide a few snippets of legal and policy skills or allow them to apply their coding abilities to a real, live public problem. Landing internships and fellowships can feel like guerrilla warfare to these scrappy students. This has to change—and now a consortium of foundations and academics is looking for solutions.

Last month, a hand-picked group of university presidents and provosts from across the country, plus a few university faculty members, met for two days at an estate-turned-conference center on Long Island to catalyze the intentional creation of a new academic field aimed at addressing precisely this gap in interdisciplinary opportunities. This new area, “public interest technology,” is still being defined; it encompasses designing public policy and laws with an awareness of how technology actually works, as well as ensuring that technology is being used to serve public values of fairness and equity. It means consciously thinking about the welfare of society in general, rather than the incentives of a single company.

Machine learning could offer chemists fast, accurate calculations

Chemical & Engineering News, Sam Lemonick


Existing computers can calculate the exact properties of only the smallest molecules thanks to the mathematical complexity of quantum mechanics. So chemists have invented methods, including force fields, density functional theory (DFT), and the coupled cluster single double triple technique (CCSD(T)), to approximate values like molecular energies and forces. Users can get quick answers or accurate ones with these methods. For example, CCSD(T) is accurate but slow, compared with force fields.

Some researchers think machine learning could offer a better way. At the American Chemical Society national meeting in Boston on Tuesday, Adrian Roitberg of the University of Florida described a method that can achieve the accuracy of CCSD(T) in the computational time of force fields. He, along with Florida colleage Justin S. Smith and Olexandr Isayev of the University of North Carolina, Chapel Hill, call it Accurate NeurAl networK engINe for Molecular Energies (ANAKIN-ME).

During a session in the Division of Computers in Chemistry, Roitberg said the third version of the method, which the team calls ANI-1ccx, can predict the forces and energy of a molecule with only the positions of its atoms and their atomic number. The algorithm treats each element separately, then produces a summed prediction of the forces and energy in the molecule as a whole.

Post No Evil

WNYC, Radiolab


Back in 2008 Facebook began writing a document. It was a constitution of sorts, laying out what could and what couldn’t be posted on the site. Back then, the rules were simple, outlawing nudity and gore. Today, they’re anything but.

How do you define hate speech? Where’s the line between a joke and an attack? How much butt is too much butt? Facebook has answered these questions. And from these answers they’ve written a rulebook that all 2.2 billion of us are expected to follow. Today, we explore that rulebook. We dive into its details and untangle its logic. All the while wondering what does this mean for the future of free speech? [audio, 1:08:43]

How USC’s Person Data Integration Project Went Enterprise-Wide

Oracle, Stanley Wu


At a high level, the Person Entity (PE) Project is essentially a database of everything applicable to a Person. This includes every kind of person – pre-applicant, applicant, admit, student, alum, donor, faculty, staff, etc. It aims to be a centralized, high-integrity database, encompassing data from multiple systems, that can supply data to every department at USC. Such complete, high-quality data encompassing every Trojan’s entire academic and professional life at USC can then motivate powerful decision-making in recruiting, admissions, financial aid, advancement, advising, and many other domains.

As one of the original members of the team that started the PE as a skunk works project, I have seen some of the challenges of transforming a small project to an enterprise-backed service.

Meliou, Brun Receive Grant to Improve Software Fairness

University of Massachusetts Amherst, College of Information and Computer Sciences


UMass Amherst College of Information and Computer Sciences researchers Alexandra Meliou and Yuriy Brun have received a four-year, $1.05 million grant from the National Science Foundation to study how software systems can exhibit bias and how software engineers can develop fairer, more equitable systems.

Meliou says, “Software makes decisions in what products we are led to buy, who gets a loan, self-driving car actions that may lead to property damage or human injury, medical diagnoses and treatment, and every stage of the criminal justice system including arraignment and sentencing that determine who goes to jail and who is set free.”

Amazon expands its Alexa Fund Fellowship to a total of 18 universities, up from 4 last year

TechCrunch, Sarah Perez


Amazon announced this morning a further investment in voice technology with an expansion of its Alexa Fund Fellowship to 14 new universities, up from only 4 in 2017, the Fellowship’s first year. The program, which is designed to support conversational A.I. and speech science research, is now coming to 18 total universities, including new additions, MIT, Dartmouth and Cambridge.

Under the banner of the Alexa Fund Fellowship are two programs: The Alexa Graduate Fellowship, focused on fostering education by PhD and post-doctoral students on topics like machine learning, speech science, and conversational A.I.; as well as the Alexa Innovation Fellowship, which is aimed at helping entrepreneurship center faculty serve as voice experts on campus, Amazon says.

Only 10 of the universities are receiving the 2018-19 Graduate Fellowship – a decision that’s made based on their research interests, planned coursework, and conversational A.I. curriculum, says Amazon.

Downloads of Julia Programming Language Surge in China, Julia Computing


Less than two weeks after Julia 1.0 was released, China has for the first time become the number one country for downloads of the Julia programming language. This highlights the broad international appeal of Julia for artificial intelligence, machine learning, numerical and scientific computing.

Julia is a free and open source computer programming language that delivers the speed of C++ and Java together with the high-level productivity, simplicity and ease of use of Python and R.

How the U.S. Has Failed to Protect the 2018 Election–and Four Ways to Protect 2020

Lawfare, Alex Stamos


In the swirl of news this week, it would be easy to miss recent announcements from two of America’s largest and most influential technology companies that have implications for our democracy as a whole. First, on Tuesday morning, Microsoft revealed that it had detected continued attempts at spear-phishing by APT 28/Fancy Bear, the hacking group tied to Russia’s Main Intelligence Directorate (known as the GRU). Later that day, my friends and former colleagues at Facebook unveiled details on more than 600 accounts that were being used by Russian and Iranian groups to distort the information environment worldwide.

The revelations are evidence that Russia has not been deterred and that Iran is following in its footsteps. This underlines a sobering reality: America’s adversaries believe that it is still both safe and effective to attack U.S. democracy using American technologies and the freedoms we cherish.

And why wouldn’t they believe that? In some ways, the United States has broadcast to the world that it doesn’t take these issues seriously and that any perpetrators of information warfare against the West will get, at most, a slap on the wrist. While this failure has left the U.S. unprepared to protect the 2018 elections, there is still a chance to defend American democracy in 2020.

Estimating the Gender Ratio of AI Researchers Around the World

Medium, Element AI, Yoan Mantha and Simon Hudson


Anyone in the industry or going to prominent Artificial Intelligence conferences can tell you that a gender imbalance exists, but we felt more rigorous research was important to drive the conversation forward and accelerate correcting this imbalance. As a follow up to The Global AI Talent Pool Report on released in February, we worked with WIRED who was also interested in looking more deeply at the state of diversity in the AI expert talent pool. For the article, we worked in collaboration with Tom Simonite of WIRED to delve further into the research by adding the dimension of gender and country to the original report’s data.

In our study, we focused on the 4000 researchers who have been published at the leading conferences NIPS, ICML, or ICLR (see the second half of this post for our methodology).

Can Facebook, or Anybody, Solve the Internet’s Misinformation Problem?

The New York Times, Farhad Manjoo


In theory, Facebook’s announcement on Tuesday that it had discovered and shut down a wide-ranging Iranian misinformation campaign should make you feel better.

The social network was slow to recognize such campaigns as threats before the 2016 presidential election, and it surely deserves some credit for what Mark Zuckerberg, its chief executive, described as the company’s shift from “reactive to proactive detection” of coordinated propaganda operations.

But it was not just Iran, and not just Facebook.

To Build Trust In Artificial Intelligence, IBM Wants Developers To Prove Their Algorithms Are Fair

Futurism, Dan Robitzski


Scientists from IBM have a new safeguard that they say will make artificial intelligence more safe, transparent, fair, and effective. They propose that, right before developers start selling an algorithm, they should publish a Supplier’s Declaration of Conformity (SDoC). As a report or user manual, the SDoC would show how well the algorithm performed at standardized tests of performance, fairness and risk factors, and safety measures. And they should make it available to anyone who’s interested.

In a research paper published Wednesday, the IBM scientists argue that this kind of transparency could help build public trust and reassure prospective clients that a particular algorithm will do what it’s supposed to without screwing anyone over based on biased training data. If a particular algorithm does seem likely to screw someone over, the client (and even interested citizens) would ideally be able to tell from the test results and choose not to put it to use.

Project will build AI models to explain, predict and influence the social world

University of Chicago, UChicago News


Data-driven models are increasingly used to simulate and make predictions about complex systems, from online shopping preferences and the performance of the stock market to the spread of disease and political unrest. But while powerful methods in machine learning and computational social science improve at predicting the future, they often lack the ability to explain why those results occur, rendering these models less helpful for shaping interventions and policy.

Social MIND, or Social Machine Intelligence for Novel Discovery, aims to reorient these models to emphasize prediction, explanation and intervention. With a $2 million grant from the Defense Advanced Research Projects Agency (DARPA) as part of its Ground Truth program, the collaboration between researchers at the University of Chicago and the Massachusetts Institute of Technology will build a “model of models” that combines computational approaches and pits them against each other to reveal the underlying factors driving social systems, as well as potential points of intervention.


PyTorch Developer Conference 2018


San Francisco, CA October 2. [save the date, application required]

HackUMass VI

Hack UMass Tech Team


Amherst, MA October 12-14. [application required]

EMNLP 2018



Brussels, Belgium October 31-November 4. “SIGDAT, the Association for Computational Linguistics (ACL) special interest group on linguistic data and corpus-based approaches to NLP, invites you to participate in the 2018 Conference on Empirical Methods in Natural Language Processing.” [$$$$]

Data Natives The Conference

Data Natives


Berlin, Germany November 22-23. “The meeting point bringing together a global community of data-driven pioneers to explore the technologies that are shaping our world.” [$$$]

NLP and Text as Data Speaker Series

NYU Center for Data Science


New York, NY September 13 starting at 4:30 p.m., NYU Center for Data Science (60 Fifth Avenue, 7th floor common area). Speaker: Erin Hegel. [free]


Space Apps Challenge 2018 Host Application

“Apply to host a 2018 International Space Apps Challenge Event on 19th to 21st of October, 2018. A lead or host is a person who organizes the event in your locality.”

Request for Applications

“The Chan Zuckerberg Initiative seeks to support up to 10 Imaging Scientists who will work at the interface of biology, microscopy hardware, and imaging software at imaging centers across the United States. “Imaging Scientists” might be engineers, physicists, mathematicians, computer scientists, or biologists who have focused on technology development in either microscopy or data analysis fields.” Deadline to apply is October 3.
Tools & Resources

Edge Computing at Chick-fil-A

Medium, Chick-fil-A Tech Blog; Chick-fil-A IOT/Edge team


Our goal: simplify the restaurant experience for Owner/Operators and their teams and optimize for great, hot, tasty, food served quickly and with a personal touch, all while increasing the capacity of our existing footprint.

Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

As a simple example, image a forecasting model that attempts to predict how many Waffle Fries (or replace with your favorite Chick-fil-A product) should be cooked over every minute of the day. The forecast is created by an analytics process running in the cloud that uses transaction-level sales data from many restaurants. This forecast can most certainly be produced with a little work. Unfortunately, it is not accurate enough to actually drive food production.


Full-time, non-tenured academic positions

Clinical Assistant Professor

New York University, Department of Psychology; New York, NY

Program Administrator

New York University, Center for Data Science; New York, NY

Project Lead, Data Integrity

University of Southern California, USC Annenberg School for Communication and Journalism; Los Angeles, CA
Tenured and tenure track faculty positions

Computational Social Science Professor

New York University Abu Dhabi, Social Science Division; Abu Dhabi, United Arab Emirates
Full-time positions outside academia

Senior Project Manager

The Engine Room; Accra, Ghana, or Remote

Leave a Comment

Your email address will not be published.