Data Science newsletter – October 5, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for October 5, 2016

 
 
Data Science News



Tweet of the Week

Twitter, Christine Zhang


from September 30, 2016


Headline:


Think You Understand Why Ideas Go Viral? Big Data May Change Your Mind – From tweets to scientific discoveries, human behavior is surprisingly predictable.

KelloggInsight podcast


from October 03, 2016

Kellogg Insight talked with two researchers who are starting to find answers by analyzing huge amounts of data. Microsoft’s Duncan Watts explains why we should stop worrying about a tipping point, and Kellogg Professor Dashun Wang discusses how human behavior is more predictable than you might think. [audio, 16:47]


Headline:


[1610.00388] Learning to Translate in Real-time with Neural Machine Translation

arXiv, Computer Science > Computation and Language; Jiatao Gu, Graham Neubig, Kyunghyun Cho, Victor O.K. Li


from October 03, 2016

Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively.


Headline:


How Los Alamos Is Learning to Track Disease Outbreaks Around the World

MIT Technology Review, arXiv


from September 28, 2016

Ashlynn Daughton at Los Alamos National Laboratory in New Mexico has come up with a new method for describing disease that is designed to bring this disparate field together and gain international traction. Their new system of classification is called the Anthology of Biosurveillance Diseases, and they have set up an online database to support it.


Headline:


Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics

arXiv, Statistics > Applications; Alexander Franks, Alexander D'Amour, Daniel Cervone, Luke Bornn


from September 30, 2016

In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of “meta-metrics” which can be used to identify the metrics that provide the most unique, reliable, and useful information for decision-makers. Specifically, we develop methods to evalute metrics based on three criteria: 1) stability: does the metric measure the same thing over time 2) discrimination: does the metric differentiate between players and 3) independence: does the metric provide new information? Our methods are easy to implement and widely applicable so they should be of interest to the broader sports community. We demonstrate our methods in analyses of both NBA and NHL metrics. Our results indicate the most reliable metrics and highlight how they should be used by sports analysts. The meta-metrics also provide useful insights about how to best construct new metrics which provide independent and reliable information about athletes.

Roundup:

Startup memoirs

Drew Conway, the man who created the now-famous Venn diagram that defined data science, recounts the day his startup Alluvium was born. (Yes, there is a Venn diagram in the story.)

And on the eve of a possible purchase by Spotify, the head of Data Science at SoundCloud talks about SoundCloud’s early data science decisions.

Farhad Manjoo explains how MailChimp grew without taking venture capital or “spending like a debt-fueled rocket ship careening out of control.

Avi Freedman, the first person to offer anything like internet service to Philadelphia, offers excellent pointers on how to establish the right infrastructure to succeed as a start-up.


Headline:


[1610.00527] Video Pixel Networks

arXiv, Computer Science > Computer Vision and Pattern Recognition; Google DeepMind


from October 03, 2016

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.


Headline:


Fujitsu Memory Tech Speeds Up Deep-Learning AI

IEEE Spectrum


from October 04, 2016

By creating an efficiency shortcut in the calculations performed by neural networks, Fujitsu researchers reduced the amount of internal GPU memory used by 40 percent. Their solution allows for a larger and potentially more capable neural network to run on a single GPU.

“To the best of our knowledge, we are the first to propose this type of solution,” says Yasumoto Tomita, research manager of the Next-Generation Computer Systems Project at Fujitsu Laboratories Ltd.


Headline:


Which Birth Control Is Best for You? Here’s the Answer

Iodine blog


from September 29, 2016

We asked thousands of women to share their experience. What have they tried, and what did they think of it? Did they find a method effective for them? How much of a hassle did they think a method involved? And what was their bottom line: Was the method worth it, all things considered, or not worth it?

In the end, we surveyed more than 5,000 people on their experience?—?and the resulting data is the cornerstone of our new Birth Control tool. The Worth It score is the first big purple circle that pops out on the 17 different methods in our database (note that we don’t have data on every method out there?—?so we still need your help! If you’ve tried any kinds of birth control, share your experience now!)


Headline:


Deepwater Horizon oil spill impacted bluefin tuna spawning habitat in Gulf of Mexico, Stanford and NOAA researchers find

Stanford News


from September 30, 2016

The scientists directly mapped the preferred spawning habitat of the Atlantic bluefin tuna by drawing from a 16-year data set of electronic tagging data from 66 fish, which provided information such as the animals’ locations, temperatures and unique diving patterns after up to a year of being tracked on sojourns of thousands of miles. They then overlaid this data with satellite observations of the oil spill’s reach, to map the potential impact.

“It took us many years to establish and perfect the techniques of putting a satellite tag, essentially a small computer, on giant bluefin tuna, many over 1,000 pounds in this study, and figure out exactly where and when they potentially spawn in the Gulf of Mexico,” said study co-author Barbara A. Block, a professor of marine sciences at Stanford. Block led the effort to discern exactly where and when the bluefin spawn.


Headline:


Scientists want to mimic the human brain. And they’ve made a breakthrough

World Economic Forum


from October 04, 2016

Forget supercomputers or the latest technological innovation: one of the most impressive machines in the world is right there in your head. While computers might be great at carrying out relatively simple steps at super-fast speeds, the human brain is still much more sophisticated when it comes to advanced tasks like pattern recognition and creative thinking.

That’s why scientists have for a long time been attempting to imitate the human brain. And they’ve just made a breakthrough. We spoke with Angeliki Pantazi, who worked on the project with a team of scientists from IBM Research – Zurich.


Headline:


So. Algorithms Are Designing Chairs Now

WIRED, Design


from October 03, 2016

The Elbo chair is unusual piece of furniture. Not for its looks—though the legs, back, and arms bear an uncanny resemblance to bones—but for how it came to be.

Arthur Harsuvanakit and Brittany Presten of Autodesk’s generative design lab created the chair, but they didn’t design it. Yes, they wanted the Elbo to reference the Danish mid-century modern style. And they wanted the seat 18 inches off the floor, and capable of supporting 300 pounds. But they let algorithms decide everything else.


Headline:


How Space Scientists Turn Exoplanets Into Places We Can ‘See’

WIRED, Science


from September 12, 2016

Scientists don’t actually know what any exoplanets look like. But imagining how they might be—in a standing-right-there sense—is central to scientists’ perceptions of and interest in them, according to Lisa Messeri, a space anthropologist at the University of Virginia. And people felt so passionate about Proxima b (which is not Earth-like) because its proximity makes it more real. It is a place people could imagine being.

Messeri studies how scientists effect the transformation from random planet to real place. In her new book, Placing Outer Space, she maps that mental shift among scientists at the Mars Desert Research Station, at a Silicon Valley NASA center, at a mountaintop observatory in Chile, and in an MIT exoplanet group.


Headline:


Above the Clouds: 5 Years of ‘Data’ at SoundCloud (Part 1)

Medium, Silicon Allee


from August 11, 2016

SoundCloud’s data team is entering a new, exciting phase. This is a story about our journey.

SoundCloud is where the music culture of ‘now’ is defined. Every day, millions of people express themselves, listen, create, discover, and connect around music, in ways that define themselves and sometimes change their lives forever.

 
Events



The Alchemy of Algorithms: A Talk by Glen de Vries



New York, NY Thursday, October 13, at 6 p.m., NYU Courant Institute, Warren Weaver Hall, Room 109 (251 Mercer Street) [free]

Lessons in Netprov: Collaborative Writing in the Digital Age



Berkeley, CA Friday, 14 October 2016 4:00 pm, at UC-Berkeley D-Lab (350 Barrows Hall) [free]

First RLadies meetup!



Los Angeles, CA Saturday, October 15, at 1 p.m., Botany Building Room 325 (618 Charles E Young Dr S) [free]

Data Science Camp Silicon Valley



Santa Clara, CA Saturday, October 29, starting at 8:15 a.m., Intel (2200 Mission College Blvd). [$$]
 
NYU Center for Data Science News



Headline:


Network Effects Explained: PageRank and Preferential Attachment

Medium, Bruno Goncalves


from October 04, 2016

When Google uses the value of PageRank to sort the results it displays as the result of our queries it implicitly influences future results: If, say, I’m researching sources to link from a blog post I’m writing, I will likely link to the relevant result that I find first. In this way, result ranking in Google (or any other search engine) becomes a self reinforcing prophecy: the more people link to my page, the higher its PageRank, and the higher its PageRank the more links it attracts as a result.

This process is perhaps one of the most interdisciplinary models in existence, having been introduced, independently, by the British Statistician Udny Yule in 1925, the American Sociologist Herbert Simon in 1955, the British Scientometrician Derek Price in 1976 and the Hungarian Physicists Albert-Lázló Barabási and Réka Albert in 1999. It is commonly referred to by names as diverse the “Yule Process”, “Mathew effect”, “the rich get richer”, “Preferential Attachment”, the “Barabási-Albert” model, among many others.


Headline:


Invest in Technology With Social Benefits – NYTimes.com

NYTimes.com, Room for Debate, Arun Sundararajan


from October 04, 2016

Fears of widespread automation and an imminent “world without work” have risen as advances in digital technology herald the emergence of increasingly sophisticated artificial intelligence. A distinct air of technological determinism – that the technology by itself dictates its broader economic and societal impacts – surrounds these dire predictions, while ignoring the political and organizational factors that often shape the nature and pace of social change. Society can amplify automation’s benefits while limiting its harm.

 
Tools & Resources



“Bayesian Methods for Hackers”: An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python 😉

GitHub – CamDavidsonPilon


from October 03, 2016

“The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.”


Scaling Machine Learning – Part 1: Naive approach

Cake Solutions, Martin Zapletal


from October 01, 2016

“In this series of posts I will discuss the evolution of machine learning algorithms with regards to scaling and performance. We will start with a naive implementation and progress to more advanced solutions finally reaching state of the art implementations, similar to what companies like Google, Netflix and others use for their data pipelines, recommendation systems or machine learning.”


Beginner’s Guide to Apache Flink – 12 Key Terms, Explained

KDnuggets, Andres Vivanco


from October 04, 2016

We review 12 core Apache Flink concepts, to better understand what it does and how it works, including streaming engine terminology


Data Management Plan for Moore Investigator in Data Driven Discovery Grant

Ethan White


from October 04, 2016

“This Data Management Plan (DMP) was created for Ethan White’s Moore Investigator in Data Driven Discovery award. It describes the management and sharing of all data and code associated with Gordon and Betty Moore Foundation grant GBMF4563 (White 2014). This includes raw data collected as part of the proposal, data compilations, and software. Research associated with this award is related to data-intensive approaches to studying ecological systems and the development of software for automating the cleaning, restructuring, and integration of heterogenous data sources.”


Write research documents online, together.

Authorea


from October 05, 2016

“Authorea is a new kind of research editor. Write, cite, collaborate, host, and publish all in one place.”


Crowdsorcery: A Proposal for an Open-Source Toolkit Integrating Best Lessons from Industry & Academia – Follow the Crowd

HumanComputation.org, MLEASE


from October 04, 2016

Want to collect data using the crowd but afraid of poor quality results? Unsure how to best design your task to ensure success? Want to use all the power of modern machine learning algorithms for quality control, but without having to understand all that math? Want a solve a complex task but unsure how to effectively piece multiple tasks together in a single automated workflow? Want to make your workflow faster, cheaper, and more accurate using the latest and greatest optimization techniques from databases and decision theory, but without having to develop those techniques yourself? Like the programmatic power of Mechanical Turk micro-tasks and the high-level expertise found on UpWork? Want to combine the best of both worlds in a seamless workflow?


An Introduction to Machine Learning in Julia

Julia Computing, Abhijith C and Ranjan Anantharaman


from September 28, 2016

Machine learning is now pervasive in every field of inquiry and has lead to breakthroughs in various fields from medical diagnoses to online advertising. Practical machine learning is quite computationally intensive, whether it involves millions of repetitions of simple mathematical methods such as Euclidian Distance or more intricate optimizers or backpropagation algorithms. Such computationally intensive techniques need a fast and expressive language – one that enables scientists to write simple, readable code that performs well. Enter Julia.

 
Careers


Tenured and tenure track faculty positions

Assoc. or Full Professor, Communication, Media and Learning Technologies Design



Teachers College Columbia University; New York, NY
Postdocs

Postdoctoral Fellowship Opportunities in Regenerative Medicine



Medicine by Design (MbD) and the Ontario Institute for Regenerative Medicine; Toronto, ON, Canada

Leave a Comment

Your email address will not be published.