Data Science newsletter – January 18, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for January 18, 2017


Data Science News

The tech of PIXAR part 1: Piper – daring to be different

fxguide, Mike Seymour


Every shot in Piper is composed of millions of grains of sand, each one of them around 5000 polygons. No bumps or displacements were used in the grains, just procedurally generated Houdini models. Displacements were the first option for creating the sand, but after many tests, the required displacement detail needed to convey the sand closeups were approximately 36 times the normal displacement map resolution used in production, which made displacements inefficient and started to make sand instancing a viable solution.

Is AI Sexist?

Foreign Policy, Erika Hayasaki


In the not-so-distant future, artificial intelligence will be smarter than humans. But as the technology develops, absorbing cultural norms from its creators and the internet, it will also be more racist, sexist, and unfriendly to women.

An increasing role for libraries in research information management

OCLC Research, Hanging Together blog


It’s no secret that the research ecosystem has been experiencing rapid change in recent years, driven by complex political, technological, and network influences. One component of this complicated environment is the adoption of research information management (RIM) practices by research institutions, and particularly the increasing involvement of libraries in this development.

Research information management is the aggregation, curation, and utilization of information about research. Research universities, research funders, as well as individual researchers are increasingly looking for aggregated, interconnected research information to better understand the relationships, outputs, and impact of research efforts as well as to increase research visibility.

Influential Computing Researchers and Practitioners Announce Steps to Prevent Algorithmic Bias



Algorithms, the set of instructions computers employ to carry out a task, influence almost every aspect of society. The explosive growth of data collection, coupled with increasingly sophisticated algorithms, has resulted in a significant increase in automated decision-making, as well as a greater reliance on algorithms in human decision-making. Industry forecasters believe software programs incorporating automated decision-making will only increase in the coming years as artificial intelligence becomes more mainstream. One of the major challenges of this emerging reality is to ensure that algorithms do not reinforce harmful and/or unfair biases.

A Search Engine for Programming Language Syntax Is a Pretty Good Idea

VICE, Motherboard, Michael Byrne


The current search engine for programming language syntax is Google. Knowing how to search for information is a key skill in knowing how to program at all. You can know all of the algorithms and a half-dozen programming languages inside-out, but you will nonetheless be searching for how to do something at some point, whether it’s related to some brand-new or super-obscure functionality or to how to translate some feature or another in one programming language to another language.

In other words, knowing how to program has a lot to do with knowing how to access information—an acute awareness of how and when to learn.

This learning might occur in hyperdrive if you’re the sort of programmer that’s either obsessively learning new things just for the sake of it—which is a whole lot of programmers—and-or has to learn new things to apply them to a new project or task. For a recent project, for example, I needed to use a machine learning framework that’s implemented in a kind of obscure language called Lua, which is like a super-lightweight version of Python. I watched a couple of videos, but mostly I was inferring syntax from other Lua code and Googling things like “Lua for loop break.”

The Real Impact of Artificial Intelligence

World Economic Forum


Artificial Intelligence panel discussion from World Economic Forum Annual Meeting 2017 [video, 57:07]

Bait and switch: UCLA study finds fish fraud runs rampant

UCLA Newsroom


Next time you go out for sushi in Los Angeles, don’t bother ordering halibut. Chances are it’s not halibut at all.

A new study from researchers at UCLA and Loyola Marymount University checked the DNA of fish ordered at 26 Los Angeles sushi restaurants from 2012 through 2015, and found that 47 percent of sushi was mislabeled. The good news is that sushi represented as tuna was almost always tuna. Salmon was mislabeled only about one in 10 times. But out of 43 orders of halibut and 32 orders of red snapper, DNA tests showed the researchers were always served a different kind of fish. A one-year sampling of high-end grocery stores found similar mislabeling rates, suggesting the bait-and-switch may occur earlier in the supply chain than the point of sale to consumers.

This forecasting tool aims to keep ships and blue whales from colliding

The San Diego Union-Tribune, Deborah Sullivan Brennan


The tool allows researchers to post online maps showing likely “hot spots” for blue whales that will help ship captains avoid collisions with the animals.

“We can both see where they go and when they go,” said Elliott Hazen, a research ecologist with the National Marine Fisheries Service, who developed the forecasting program. “We can take their movements and combine that with remotely sensed oceanographic data, to find out not only where they go, but also some of the oceanographic conditions that trigger that.”

How big data can improve cities and save lives: UW and Seattle brainstorm solutions on education, homelessness, and transportation

GeekWire, Taylor Soper


Using data and analytics to help improve the bottom line is now common practice in the technology and business worlds. But how can city governments use those same techniques and utilization of new information to improve the lives of its constituents?

That is the theme of a two-day workshop hosted by the City of Seattle, the University of Washington, and MetroLab, a Washington D.C.-based city-university collaboration that launched as part of the White House’s Smart Cities Initiative in September 2015.

The event, called “Big Data and Human Services” and sponsored by Microsoft, Amazon, and Comcast, gathered folks in Seattle from the public and private sectors.

AI Revolution

Carnegie Mellon Today


Microsoft turns to Carnegie Mellon alumnus Harry Shum to lead more than 5,000 researchers in its artificial intelligence initiative.

This AI Proof Reader Will Change How We Write

Medium, Gold and Black Games


Toronto-based startup Atomic Reach thinks that it’s got the answer to this problem with its Atomic AI platform, which launches today. Here’s how it works.

At the core, you’ve got an artificial neural network that’s been painstakingly built to understand 23 distinct measures of language and structure. This artificial neural network is continuously growing, and as it consumes more data, it becomes more precise.

It boasts over three million articles in its database. This is growing constantly, and is analyzed on a regular basis.

Rebuilding legitimacy in a post-truth age

Medium, Duncan Watts and David Rothschild


The current state of public and political discourse is in disarray. Politicians lie with impunity. Traditional news organizations amplify fact-free assertions, while outright fake news stories circulate on social media. Public trust in the media, science, and expert opinion has fallen, while segregation into like-minded communities has risen. Millions of citizens blame their economic circumstances on caricatures like “elites” rather than on specific economic forces and policies. Enormously complex subjects like mitigating climate change, or balancing economic growth and inequality, are reduced to slogans. Magical thinking (e.g., that millions of manufacturing jobs can be created by imposing trade restrictions; that everyone can have affordable, high quality, healthcare without individual mandates or subsidies; that vaccines cause autism) has proliferated.

The result has been a called a post-truth age, in which evidence, scientific understanding, or even just logical consistency have become increasingly irrelevant to political argumentation. Indeed, a flagrant disregard for consistency and evidence may even be interpreted as a demonstration of power: the power to create one’s own reality.

New technology enables 5-dimensional imaging in live animals and humans

University of Southern California, Dornsife Colllege of Letters, Arts and Sciences


A computer algorithm for analyzing time-lapse biological images could make it easier for scientists and clinicians to find and track multiple molecules in living organisms. The technique is faster, less expensive and more accurate than current methods — and it even works with cell phone images.

Computer Models Help Fight Climate Change

Communications of the ACM, News, Gregory Goth


Given the vast number of variables and uncertainties in trying to predict how much sea levels may rise, how, and where to best deploy marshes in coastal defenses, as well as how best to help those marshes thrive, some of the world’s leading climate researchers are turning to flexible computer modeling to help them prioritize where marshes fit into the larger picture.

“No one is suggesting we rely exclusively on salt marshes to mitigate the effects of climate change,” said Mark Schuerch, a post-doctoral researcher and coastal geographer at the University of Cambridge, U.K., specializing in salt marsh modeling. “It has to be seen as a complementary measure, which reduces the pressure on the hard infrastructure.”

Government Data Science News

Harvard University‘s Gary King, Jennifer Pan (now at Stanford), and Molly Roberts (now at UC-San Diego) updated their report on China and the Internet, finding that the Chinese government “fabricates and posts about 448 million social media comments a year”. Talk about fake news.

DARPA released a call for proposals to develop a “secure messaging and transaction platform” that ensures protected communication no matter where the sender and receiver are.

Leif-Anders Thorsrud, a senior researcher at the Central Bank of Norway used a macroeconomic prediction tool fed with ~500,000 news articles to improve on the bank’s economic predictions by 10 percent.

Nancy Potok has been named Chief Statistician of the United States, a promotion from her position as deputy director and chief operating officer of the US Census Bureau.

Preprint supporters in biology are petitioning NIH to allow the inclusion of preprints in grants. More than 600 scientists have signed the petition.


Webinar: An Exclusive Peek “Under the Hood” of Enthought Training and the Pandas Mastery Workshop

Online January 26 at 11 a.m. CT [registration required]

WECode, Harvard University

Cambridge, MA February 4-5 [$$$]

TNU – Brain Connectivity Workshop 2017

Zurich, Switzerland May 17-19 [$$$]

The Great Game AI Survey!

The inaugural artificial intelligence (AI) in games survey aimed at assessing the knowledge of AI in games across the industry. Results to be published in a months time @brainybeard and

2017 International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation

Washington, DC July 5-8 at George Washington University. Deadline for paper abstract submissions is Wednesday, February 22.

Advances in complex systems

Lake Como School of Advanced Studies – July 3-7. Deadline to apply is Wednesday, March 1.

CfP: Studying User Perceptions and Experiences with Algorithms

Montreal, Quebec, Canada Workshop is part of the 11th International Conference on Web and Social Media. Deadline for proposals is Saturday, March 4.
Tools & Resources

[1701.03980] DyNet: The Dynamic Neural Network Toolkit

arXiv, Statistics > Machine Learning; Graham Neubig et al.


We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet’s dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet’s speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit.


GitHub – dmlc


“This repository aims at providing a high performing and flexible deep learning platform, by prototyping a pure NumPy interface above MXNet backend.”


Full-time positions outside academia

Senior Java Developer

Lilt; Munich, Germany

DevOps Lead

Allen Institute for Artificial Intelligence; Seattle, WA
Internships and other temporary positions

IBM Social Good Fellowship

IBM Research; Yorktown Heights, NY

Leave a Comment

Your email address will not be published.