NYU Data Science newsletter – May 9, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 9, 2016

GROUP CURATION: N/A

Data Science News

Reproducibility Symposium – Theory And Practice

Kyle Cranmer, Theory and Practice blog

from May 03, 2016

… Often in the discussions around open and reproducible science we focus three main types of research products

the paper: the traditional narrative document, often primary product, DOI is standard

the data: increasingly appreciated as a first-class citizen of scientific record with data repositories providing DOIs

the code: recognized as a research product, but the last to be integrated into literature system and given DOI

But even with access to all three of these products reproducing the results or evaluating the methodology can be difficult because software often has dependencies and it can take a lot of effort to prepare the environment. … I really think we should elevate the environment to be the fourth pillar of computational reproducibility:

the environment: a critical and under-appreciated ingredient for computational reproducibility

Expanding Tropics Push High Altitude Clouds Towards Poles

NASA

from May 05, 2016

A new NASA analysis of 30-years of satellite data suggests that a previously observed trend of high altitude clouds in the mid-latitudes shifting toward the poles is caused primarily by the expansion of the tropics.

Clouds are among the most important mediators of heat reaching Earth’s surface. Where clouds are absent, darker surfaces like the ocean or vegetated land absorb heat, but where clouds occur their white tops reflect incoming sunlight away, which can cause a cooling effect on Earth’s surface. Where and how the distribution of cloud patterns change strongly affects Earth’s climate. Understanding the underlying causes of cloud migration will allow researchers to better predict how they may affect Earth’s climate in the future.

CleanStat Uses Data To Track the Cleanliness of Los Angeles’ Streets

CityLab, Eillie Anzilott

from May 06, 2016

As of April, every street in Los Angeles has a number. It falls on a scale of 1 to 3, and it denotes cleanliness: 61 percent of were given a 1 and determined clean, 35 percent were somewhat tidy, and 4 percent were a 3—flat-out dirty.

These rankings, commissioned as part of L.A.’s $9.1 million Clean Streets Initiative, have a pretty clear message for Mayor Eric Garcetti: “In L.A., we just haven’t been doing a very good job,” he tells CityLab. The city has long faced problems with trash. While it may seem in okay shape now, cleanliness-wise, with 96 percent of the streets in tolerable-to-good condition, LA Independent reported that the remaining 4 percent is still significant: “It equates to 376 miles, or running a marathon each day for two weeks and seeing only streets filled with trash.”

Eulogy for The New York Times R&D Lab — Medium

Medium, David Riordan

from May 06, 2016

The New York Times R&D Lab, a groundbreaking department of applied creative technology that helped one of the great institutions of journalism see how it could thrive amidst a changing media ecosystem, died Monday in New York City. It was eleven years old.

And while this isn’t an obituary, it is a eulogy for a small group within a tremendous organization that I believe had an outsize impact on how The New York Times has evolved during the past seven years. I’ve always thought that without the NY Times R&D Lab, the labs of other cultural institutions in New York simply wouldn’t exist.

Workshop on Human Centred Machine Learning at CHI 2016

CHI 2016

from May 08, 2016

Machine learning is one of the most important and successful techniques in contemporary computer science, with applications ranging from from medical research to the arts, as well as considerable recent interest in its use for interaction design. It is often conceived in a very impersonal way, with algorithms working autonomously on passively collected data. However, this viewpoint hides considerable human work of tuning the algorithms, gathering the data, and even deciding what should be modeled in the first place.

Examining machine learning from a human-centered perspective includes explicitly recognising this human work, as well as reframing machine learning workflows based on situated human working practices, and exploring the co-adaptation of humans and systems. A human-centered understanding of machine learning in human context can lead not only to more usable machine learning tools, but to new ways of framing learning computationally. This workshop will bring together researchers to discuss these issues and suggest future research questions aimed at creating a human-centered approach to machine learning.

Our MSR guest @erichorvitz arrived just in time for me to test our new Terminator glasses, er, @HoloLens

Twitter, Mor Naaman

from May 03, 2016

Inside Palantir, Silicon Valley’s Most Secretive Company

BuzzFeed News, William Alden

from May 06, 2016

A cache of internal documents shows that despite growing revenue, Palantir has lost top-tier clients, is struggling to stem staff departures, and isn’t collecting most of the money it touts in high-value deals.

Also:

The Real Lesson for Data Science That is Demonstrated by Palantir’s Struggles (May 11, Simply Statistics, Roger Peng)

Building AI Is Hard—So Facebook Is Building AI That Builds AI

WIRED, Business

from May 06, 2016

… “It’s almost like being the coach rather than the player,” says Demis Hassabis, co-founder of DeepMind, the Google outfit behind the history-making AI that beat the world’s best Go player. “You’re coaxing these things, rather than directly telling them what to do.”

That’s why many of these companies are now trying to automate this trial and error—or at least part of it. If you automate some of the heavily lifting, the thinking goes, you can more rapidly push the latest machine learning into the hands of rank-and-file engineers—and you can give the top minds more time to focus on bigger ideas and tougher problems. This, in turn, will accelerate the progress of AI inside the Internet apps and services that you and I use every day.

Announcing the 2016 Data Stories Winners

Science

from May 06, 2016

Science has completed its Data Stories video contest, and we’re pleased to present the winners of this inaugural contest. Researchers submitted 90-second videos portraying captivating visual stories based on scientific data. There were no restrictions on the type of data used, but it had to be free and publically available. The intent of this contest was to encourage and recognize compelling visual representations of scientific data. Many entries took a creative approach to conveying complex information in video format within the restricted 90-second timeframe.

The Rise of Intelligent Machines – Part 1

Rolling Stone Australia, Jeff Goodell

from May 04, 2016

‘Welcome to robot nursery school,” Pieter Abbeel says as he opens the door to the Robot Learning Lab on the seventh floor of a sleek new building on the northern edge of the UC-Berkeley campus. The lab is chaotic: bikes leaning against the wall, a dozen or so grad students in disorganised cubicles, whiteboards covered with indecipherable equations. Abbeel, 38, is a thin, wiry guy, dressed in jeans and a stretched-out T-shirt. He moved to the U.S. from Belgium in 2000 to get a Ph.D. in computer science at Stanford and is now one of the world’s foremost experts in understanding the challenge of teaching robots to think intelligently. But first, he has to teach them to “think” at all. “That’s why we call this nursery school,” he jokes. He introduces me to Brett, a six-foot-tall humanoid robot made by Willow Garage, a high-profile Silicon Valley robotics manufacturer that is now out of business. The lab acquired the robot several years ago to experiment with. Brett, which stands for “Berkeley robot for the elimination of tedious tasks”, is a friendly-looking creature with a big, flat head and widely spaced cameras for eyes, a chunky torso, two arms with grippers for hands and wheels for feet. At the moment, Brett is off-duty and stands in the centre of the lab with the mysterious, quiet grace of an unplugged robot. On the floor nearby is a box of toys that Abbeel and the students teach Brett to play with: a wooden hammer, a plastic toy airplane, some giant Lego blocks. Brett is only one of many robots in the lab. In another cubicle, a nameless 45-centimetre-tall robot hangs from a sling on the back of a chair. Down in the basement is an industrial robot that plays in the equivalent of a robot sandbox for hours every day, just to see what it can teach itself. Across the street in another Berkeley lab, a surgical robot is learning how to stitch up human flesh, while a graduate student teaches drones to pilot themselves intelligently around objects. “We don’t want to have drones crashing into things and falling out of the sky,” Abbeel says. “We’re trying to teach them to see.”

Events

Mike Kuniavsky at The Interval: Our Future in Algorithm Farming

The software running much of our world today, from consumer apps to industrial infrastructures, is increasingly built on systems that learn and try to predict the future. They’re increasingly sophisticated and profoundly different than technologies we’ve ever lived with before.

In the first of a new series of talks from the team at PARC, the venerable Palo Alto research lab, UX designer / author Mike Kuniavsky will talk about what we—the intended beneficiaries of these products and services—will do while the algorithms that are supposed to understand us are on what is likely a slow learning curve.

San Francisco, CA Tuesday, May 17, starting at 6:30 p.m., The Interval at Long Now
(Fort Mason Center)

iidata Student-Run Conference in Data Science at University of California-Davis

iidata is a one-day, collegiate-level Data Science convention aimed at educating students in the new, thrilling field of data science. We welcome all students, regardless of background knowledge, so long as you have a mindset to never stop learning. The convention will consist of guest speakers, workshops, and competitions.

Davis, CA Saturday, May 21, UC Davis Conference Center

Artificial Intelligence: Law and Policy

The University of Washington School of Law is delighted to announce a public workshop on the law and policy of artificial intelligence, co-hosted by the White House and UW’s Tech Policy Lab. The event places leading artificial intelligence experts from academia and industry in conversation with government officials interested in developing a wise and effective policy framework for this increasingly important technology. The event is free and open to the public but requires registration.

Seattle, WA Tuesday, May 24, at the University of Washington School of Law

Artificial Intelligence For Social Good workshop

Organizers: The White House and the Computing Community Consortium

Washington, DC Tuesday, June 7, at The Willard Intercontinental Hotel

SafArtInt 2016 – WORKSHOP ON SAFETY AND CONTROL FOR ARTIFICIAL INTELLIGENCE

The Public Workshop on Safety and Control for Artificial Intelligence (SAF|ART|INT) is a jointly-sponsored event of the White House Office of Science and Technology Policy (OSTP) and Carnegie Mellon University. The workshop will explore the potential future of AI and AI applications, the emerging technical means for constructing safe and secure systems, how safety might be assured, and how we can make progress on the challenges of safety and control for AI.

Pittsburgh, PA Tuesday, June 28, at Carnegie Mellon University

AI Now symposium

The White House and New York University’s Information Law Institute, with support from Google Open Research and Microsoft Research, will host a major public symposium to address the near-term impacts of AI technologies across social and economic systems.

The focus will be the challenges of the next 5-10 years, specifically addressing five themes: social inequality, labor, financial markets, healthcare, and ethics. Leaders from industry, academia, and civil society will share ideas for technical design, research and policy directions.

Thursday, July 7, at Skirball Center, NYU

Deadlines

Data ethics case studies

deadline: subsection?

The Council for Big Data, Ethics, and Society has released three case studies (with more on the way) and has set a deadline of June 1, 2016, for any new submissions to its call for cases.

Tools & Resources

Feature Selection @ ASU

from March 09, 2016

scikit-feature is an open-source feature selection repository in Python developed at Arizona State University. It is built upon one widely used machine learning package scikit-learn and two scientific computing packages Numpy and Scipy. scikit-feature contains around 40 popular feature selection algorithms, including traditional feature selection algorithms and some structural and streaming feature selection algorithms. It serves as a platform for facilitating feature selection application, research and comparative study. It is designed to share widely used feature selection algorithms developed in the feature selection research, and offer convenience for researchers and practitioners to perform empirical evaluation in developing new feature selection algorithms.

Writing with the machine

Robin Sloan

from May 05, 2016

Building this felt like playing with Lego, except instead of plastic bricks, I was snapping together conveniently-packaged blocks of human intellect and effort.

One block: a recurrent neural network, fruit of the deep learning boom, able to model and generate sequences of characters with spooky verisimilitude. Snap!

Another block: a powerfully extensible text editor. Snap!

Together: responsive, inline “autocomplete” powered by an RNN trained on a corpus of old sci-fi stories.

How to Install GPU Tensorflow – Ubuntu 14.04

YouTube, Justin Francis

from April 15, 2016

This is my video tutorial on how to install GPU enabled Tensorflow on Ubuntu 14.04 LTS with both Python 2 & Python 3.

Number plate recognition with Tensorflow

Matthew Earl, Matt's ramblings blog

from May 06, 2016

Over the past few weeks I’ve been dabbling with deep learning, in particular convolutional neural networks. One standout paper from recent times is Google’s Multi-digit Number Recognition from Street View. This paper describes a system for extracting house numbers from street view imagery using a single end-to-end neural network. The authors then go on to explain how the same network can be applied to breaking Google’s own CAPTCHA system with human-level accuracy.

In order to get some hands-on experience with implementing neural networks I decided I’d design a system to solve a similar problem: Automated number plate recognition (automated license plate recognition if you’re in the US).

Sports.BradStenger.com

NYU Data Science newsletter – May 9, 2016

Leave a Comment Cancel reply