NYU Data Science newsletter – September 9, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 9, 2016

GROUP CURATION: N/A

Data Science News

NASA Launches the Osiris-Rex Spacecraft to Asteroid Bennu

The New York Times

from September 08, 2016

The Osiris-Rex spacecraft was launched Thursday evening, beginning a seven-year mission to touch an asteroid and return a sample to Earth. [data graphic]

Who “owns” your data?

University of California System, Data Pub blog

from September 08, 2016

Determining ownership (and whether there’s even anything to own) can be frustratingly complicated — and, even when obvious, ownership only determines some of what can be done with data. Other things like policies, contracts, and laws may dictate certain terms in circumstances where ownership isn’t relevant — or even augment or overrule an owner where it is. To avoid an unpleasant surprise about what you can or can’t do with your data, you’ll want to plan ahead and think beyond the simple question of ownership.

What I learned from Deep Learning Summer School 2016

LinkedIn, Hamid Palangi

from August 20, 2016

Two weeks ago I attended the deep learning summer school at Montreal organized by Yoshua Bengio and Aaron Courville. Below is a summary of what I learned. It starts from basic concepts and continues with more advanced topics.

1. Essence of regularization

Extreme metagenomics using nanopore DNA sequencing: a field report from Svalbard, 78 N | bioRxiv

bioRxiv; Arwyn Edwards, Aliyah R Debbonaire, Birgit Sattler, Luis AJ Mur, Andrew J Hodson

from September 07, 2016

In the field of observation, chance favours only the prepared mind (Pasteur). Impressive developments in genomics have led microbiology to its third ?Golden Age?. However, conventional metagenomics strategies necessitate retrograde transfer of samples from extreme or remote environments for later analysis, rendering the powerful insights gained retrospective in nature, striking a contrast with Pasteur?s dictum. Here we implement a highly portable USB-based nanopore DNA sequencing platform coupled with field-adapted environmental DNA extraction, rapid sequence library generation and off-line analyses of metagenome profiles to characterize the microbiota of a High Arctic glacier. The profile of the microbial communities produced by this approach are coherent with those from conventional amplicon and shotgun metagenome sequencing of glacier environments, and prompt testable hypotheses regarding microbial community structure and function on the basis of data generated and analysed while in the field.

WaveNet: A Generative Model for Raw Audio

Google DeepMind blog

from September 08, 2016

This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.

We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.

Plenary Panel: Is Deep Learning the New 42?

YouTube, KDD2016 video

from September 01, 2016

The history of deep learning goes back more than five decades but in the marketplace of ideas its perceived value went through booms and busts. We are no doubt at an all time high: in the last couple of years we witnessed extraordinary advances in vision, speech recognition, game playing, translation, and so on, all powered by deep networks. At the same time companies such as Amazon, Apple, Facebook, Google, and Microsoft are making huge bets on deep learning research and infrastructure, ML competitions are dominated by deep learning approaches, open source deep learning software is proliferating, and the popular press both cheerleads the progress and raises the dark specter of unintended consequences.

So is deep learning the answer to everything?

Smart Cities Are More Pragmatism than Pure Invention, Singapore CIO Says

Government Technology

from September 08, 2016

As part of its Smart Nation initiative, a program meant to slingshot state technologies forward, Singapore’s aspirations include one of the most sprawling sensor and camera networks ever deployed in a city. The sensors will enhance how the city monitors vehicle traffic, weather patterns and pedestrian movement, and even act as litter detectors to ensure cleanliness. What’s more, the data will be streamed into an architecturally accurate map called Virtual Singapore. The city map will offer officials real-time intelligence during emergency situations and data about infrastructure for construction projects.

Reproducible Science: Cancer Researchers Embrace Containers in the Cloud

Google Research Blog, Nicole Deflaux and Jonathan Bingham

from September 06, 2016

Today we hear from the principal investigators of the ICGC-TCGA DREAM Somatic Mutation Calling Challenges about how they are encouraging cancer researchers to make use of Docker and Google Cloud Platform to gain a deeper understanding of the complex genetic mutations that occur in cancer, while doing so in a reproducible way.

Announcing the First Annual Global PhD Fellowship Summit and the 2016 Google PhD Fellows

Google Research Blog, Michael Rennaker

from September 07, 2016

Recently, Google PhD Fellows from around the globe converged on our Mountain View campus for the first annual Global PhD Fellowship Summit. The students heard talks from researchers like Jeff Dean, Françoise Beaufays, Peter Norvig, Maya Gupta and Amin Vahdat, and got a glimpse into some of the state-of-the-art research pursued across Google. [plus list of Google PhD Fellowship awardees]

Part 1: Deep Learning and Long-Term Investing

Euclidean Technologies, John Alberg and Michael Seckler

from September 06, 2016

Seventy-five years ago, Benjamin Graham – the father of security analysis – wrote that in the short run the market behaves like a voting machine, but over the long run it more closely resembles a weighing machine. Graham’s point was that fear, greed, and other emotions (the voting machine) can drive short-term market fluctuations which in turn cause disconnects between the price and true value of a company’s shares. Over long periods of time, however, the weighing machine kicks in as a company’s fundamentals ultimately cause the value and market price of its shares to converge.

Traditionally, investors have performed long-term fundamental analysis by studying the income statements, balance sheets, and other publicly available information about a company’s operations. Then, they use this information in the context of the company’s market value to make an informed decision about its prospects as a long-term investment.

The automation of this process, systematic value investing, has become possible with the emergence of high-quality data on company fundamentals and the ever-increasing computational power available to researchers.

Will A.I. Harm Us? Better to Ask How We’ll Reckon With Our Hybrid Nature

Nautilus, David Krakauer

from September 06, 2016

At what point did we create an artificial intelligence? Was it when we first chiseled on rocks the memory of our debts? Was it that point when we enhanced reasoning by exploring possibilities in the arena of a game? Or when we solved a problem of inference beyond our merely fleshy ability to calculate? The dream of a fully autonomous artificial intelligence, stuff of infinite science-fiction prognostication, has blinded us to the incremental nature of artificial intelligence. The deep intellectual and ethical question facing our species is not how we’ll prevent an artificial superintelligence from harming us, but how we will reckon with our hybrid nature.

Events

NYU Stern | $300K Entrepreneurs Challenge

New York, NY NYU entrepreneurs and their founding team members are strongly encouraged to attend the Competition Kick-Off. — Wednesday, September 14, starting at 5 p.m., Tisch Hall – Paulson Auditorium.

How Data Science Is Making On-Demand Grocery Delivery Profitable at Instacart

Berkeley, CA Fifteen years ago, Webvan spectacularly failed to bring grocery delivery online. Speculation has been high that the current wave of on-demand grocery delivery startups will meet similar fates. Jeremy Stanley explains why this time the story will be different—and how data science is the key. Innovations in mobile applications have paved the way, but significant investments in algorithms to optimize efficiency will drive positive unit economics. — Friday, September 16, starting at 1:10 p.m., 190 Doe Library.

Text as Data Speaker Series

New York, NY David Bamman (UC-Berkeley) — Beyond Bags of Words: Linguistic Structure in the Analysis of Text as Data — Thursday, September 22, 4-5:30 p.m. in room 217, 19 West 4th St.

Mini Conference – “Real Estate Tech Unbundled” with Marc Holliday and Amol Sarva

New York, NY Historically, the dynamics of the Real Estate industry had been very stable, with legendary dynastic families and well established public REITs dominating the marketplace. However, we are now seeing a a massive tech disruption that is challenging longstanding RE models. — Thursday, September 29, starting at 6:30 p.m., Convene Conference Center (@45th btwn Park & Lex, Lobby Level).

2016 Workshop on Visualization for the Digital Humanities

Baltimore, MD Part of IEEE VIS 2016. Monday, October 24

Deadlines

Help a colleague: 5 minute online study of visual attributes in networks

deadline: Survey

“The goal of this study is to understand which visual representations are most prominent in network visualizations.”

Deadline Thursday, 15 September 2016.

SoCal ML Symposium

deadline: Conference

Pasadena, CA “The Southern California Machine Learning Symposium brings together students and faculty to promote machine learning in the southern California region. The workshop serves as a forum for researchers from a variety of fields working on machine learning to share and discuss their latest findings.” — Friday, November 18, at Caltech.

Deadline for submissions is Tuesday, October 4.

R/GA Marketing Tech Venture Studio

deadline: Contest/Award

The R/GA Marketing Tech Venture Studio with Interpublic is designed for emerging companies revolutionizing the future of marketing and digital products by leveraging artificial intelligence and machine learning.

Deadline to apply is Monday, 24 October 2016.

NYU Center for Data Science News

Summer’s notable departures and arrivals at NYU Center for Data Science

Foster Provost stepped down from the Moore-Sloan steering committee and handed over the CDS Interim Directorship to Claudio Silva.
Roy Lowrance, CDS Managing Director, moved into industry.
Dan Cervone, Moore-Sloan Fellow, took a job with the LA Dodgers starting in October.
Pablo Barberá, a Moore-Sloan Fellow, started his tenure-track position at USC in July.
Andrea Rooy-Jones joined Moore-Sloan in a research and outreach capacity.
Michael Gill joined us as a Moore-Sloan Fellow.
Our top-notch administrator David Clark started an MBA program this fall; we welcome Kathryn Angeles in his position.

Master’s in Data Science Open House

NYU Center for Data Science

from September 27, 2016

New York, NY If you would like to learn more about the MSDS program and are in the NYC area, please plan to attend our Open House listed below. The Open House provides an overview of the program and gives you a chance to interact with faculty, students, and staff of CDS. Next Open House: Tuesday, September 27, at NYU Center for Data Science, (60 5th Avenue, 7th Floor).

Tools & Resources

BAyesian Model-Building Interface (BAMBI) in Python.

GitHub – bambinos

from September 08, 2016

Bambi is a high-level Bayesian model-building interface written in Python. It’s built on top of the PyMC3 probabilistic programming framework, and is designed to make it extremely easy to fit mixed-effects models common in social sciences settings using a Bayesian approach.

Data Programming: Creating Large Training Sets, Quickly

arXiv, Statistics > Machine Learning; Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

from May 25, 2016

Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users provide a set of labeling functions, which are programs that heuristically label large subsets of data points, albeit noisily. By viewing these labeling functions as implicitly describing a generative model for this noise, we show that we can recover the parameters of this model to “denoise” the training set. Then, we show how to modify a discriminative loss function to make it noise-aware. We demonstrate our method over a range of discriminative models including logistic regression and LSTMs. We establish theoretically that we can recover the parameters of these generative models in a handful of settings. Experimentally, on the 2014 TAC-KBP relation extraction challenge, we show that data programming would have obtained a winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a supervised LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way to create machine learning models for non-experts.

Attention and Augmented Recurrent Neural Networks

Google Brain, Chris Olah and Shan Carter

from September 08, 2016

Recurrent neural networks are one of the staples of deep learning, allowing neural networks to work with sequences of data like text, audio and video. They can be used to boil a sequence down into a high-level understanding, to annotate sequences, and even to generate new sequences from scratch!

How to Pick the Fastest Line at the Supermarket

The New York Times

from September 07, 2016

… Get behind a shopper who has a full cart

That may seem counterintuitive, but data tell a different story, said Dan Meyer, a former high school math teacher who is the chief academic officer at Desmos, where he explores the future of math, technology and learning.

Careers

Full-time, non-tenured academic positions

Research Analyst (accountability) – apply by Sep 22

Data & Society; New York, NY

Full-time positions outside academia

Senior Data Scientist

Hired; San Francisco, CA

Sports.BradStenger.com

NYU Data Science newsletter – September 9, 2016

Leave a Comment Cancel reply