NYU Data Science newsletter – September 27, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 27, 2016

GROUP CURATION: N/A

Data Science News

Headline:

Microsoft’s Internet Business Gets a New Kind of Processor

WIRED, Business

from September 25, 2016

It was December 2012, and Doug Burger was standing in front of Steve Ballmer, trying to predict the future.

Ballmer, the big, bald, boisterous CEO of Microsoft, sat in the lecture room on the ground floor of Building 99, home base for the company’s blue-sky R&D lab just outside Seattle. The tables curved around the outside of the room in a U-shape, and Ballmer was surrounded by his top lieutenants, his laptop open. Burger, a computer chip researcher who had joined the company four years earlier, was pitching a new idea to the execs. He called it Project Catapult.

The tech world, Burger explained, was moving into a new orbit. In the future, a few giant Internet companies would operate a few giant Internet services so complex and so different from what came before that these companies would have to build a whole new architecture to run them. They would create not just the software driving these services, but the hardware, including servers and networking gear. Project Catapult would equip all of Microsoft’s servers—millions of them—with specialized chips that the company could reprogram for particular tasks.

Headline:

You Too Can Become a Machine Learning Rock Star! No PhD Necessary.

Medium, Backchannel, Steven Levy

from September 26, 2016

But what if you could get the benefits of AI without having to hire those hard-to-find and expensive-to-woo talents? What if smart software could lower the bar? Could you get deep learning with a shallower talent pool?

A startup called Bonsai and an emerging class of companies with the same idea say yes. Brace yourself for the democratization of AI. It’s a movement that might eventually include millions of people?—?and, some say, billions.

Headline:

Stop-and-frisk did not work “incredibly well”

Medium, Sharad Goel

from September 26, 2016

This past week Donald Trump called for the broad use of stop-and-frisk by police departments across the country. Or maybe just in Chicago?—?his position seemed to evolve. Either way, Mr. Trump claimed the tactic had worked “incredibly well” in New York City.

In reality, it was a racially discriminatory policy in which officers regularly stopped individuals with little legal basis. It undermined the trust and confidence that minority residents placed in their police department, and there is scant evidence that it reduced crime. That’s why the NYPD began retreating from these tactics even before a federal judge ordered them to do so. And that’s why we should not return to these antagonistic stop-and-frisk policies now.

Headline:

Progress in AI, through collaborative research

IBM Blog Research, Guru Banavar

from September 20, 2016

As a researcher, I know that collaborating with leading minds around the world is the key to fulfilling the true potential of cognitive computing. And that’s why IBM is forming the Cognitive Horizons Network (CHN), a network of the world’s leading universities committed to working with IBM to accelerate the development of core technologies needed to advance the promise of cognitive computing.

We announced the CHN today at our 5th annual IBM Research Cognitive Colloquium held this year at the T.J. Watson Research Center in Yorktown Heights, New York. Both the Colloquium and the CHN has brought together hundreds of leaders in the field to work toward creating a shared vision of cognitive computing and stimulating meaningful discussions on research directions and anticipated breakthroughs.

Headline:

The Data Processing Inequality

Medium, Adam Kelleher

from September 26, 2016

If you look at the wikipedia article for the data processing inequality, it’s really just a stub (as of the time this article was published). The inequality is given, but there is little context. The data processing inequality is fundamental to data science, machine learning, and social science.

Lately, I’ve been blogging almost exclusively about causality. I’m about to go deep into causal inference, but needed to lay one more piece of groundwork before I do. There is a deep problem with how we encode the “real world” into a form that a computer can understand. The implications go far beyond limiting the predictive power of a machine learning model. Our representations of data can limit our ability to infer causal relationships. To understand this fully, you need to understand the data processing inequality.

Headline:

Stealing an AI Algorithm and Its Underlying Data Is a ‘High-School Level Exercise’

Communications of the ACM, NextGov

from September 26, 2016

Cornell Tech researchers have demonstrated the ability to remotely reverse-engineer machine-learning algorithms, essentially stealing artificial intelligence (AI) products and using them for free, by accessing an application programming interface (API).

In addition, after the algorithm has been copied, it can be coerced into producing examples of the potentially proprietary data on which it was trained.

Headline:

Microsoft CEO Satya Nadella Discusses Ariticial Intelligence’s Impact

Fortune, Jonathan Vanian

from September 26, 2016

Microsoft’s overarching goal is to “democratize A.I.,” which Nadella explained has something to do with analyzing the mountains of data produced by consumers and businesses and then presenting the findings to people who have far less free time than they used to have.

Headline:

A Rare Tour Of Microsoft’s Hyperscale Datacenters

The Next Platform, Timothy Prickett Morgan

from September 26, 2016

If you want to study how datacenter design has changed over the past two decades, a good place to visit is Quincy, Washington. There are five different datacenter operators in this small farming community of around 7,000 people, including Microsoft, Yahoo, Intuit, Sabey Data Centers, and Vantage Data Centers, and they have located there thanks to the proximity of Quincy to hydroelectric power generated from the Columbia River and the relatively cool and arid climate, which can be used to great advantage to keep servers, storage, and switches cool.

All of the datacenter operators are pretty secretive about their glass houses, but every once in a while, just to prove how smart they are about infrastructure, one of them opens up the doors to let selected people inside. Ahead of the launch of Windows Server at its Ignite conference, Microsoft invited The Next Platform to visit its Quincy facilities and a history lesson of sorts in datacenter design, demonstrating how Microsoft has innovated and become one of the biggest of the hyperscalers in the world, rivaling Google and Amazon Web Services – companies that are its main competition in the public cloud business.

Headline:

Bay Area Deep Learning School Day 1 at CEMEX auditorium, Stanford – YouTube

YouTube, Shubhabrata Sengupta

from September 24, 2016

Day 1 of Bay Area Deep Learning School featuring speakers Hugo Larochelle, Andrej Karpathy, Richard Socher, Sherry Moore, Ruslan Salakhutdinov and Andrew Ng. [TK-Day 2 link]

Headline:

Google’s Internet-Beaming Balloon Gets a New Pilot: AI

WIRED, Business

from September 23, 2016

Launching balloons into the stratosphere is a usual thing for the Google X lab—or just X, as it’s now called after spinning off from Google and nestling under the new umbrella called Alphabet. X is home to Project Loon, an effort to beam the Internet from the stratosphere down to people here on Earth. The hope is that these balloons can fly over areas of the globe where the Internet is otherwise unavailable and stay there long enough to provide people with a reliable connection. But there’s a problem: balloons tend to float away.

That’s why it’s so impressive that the company managed to keep a balloon in Peruvian airspace for over three months. And it’s doubly impressive when you consider that the navigation system can only move these balloons up and down—not forward and back or side to side. They move like hot-air balloons—avoiding the weather or catching it at the right time, rather than pushing right through it—and that’s because a more complex navigation system would be too heavy and too expensive for the task at hand. Rather than navigate Peruvian air space with some sort of jet propulsion system, the Loon team turned to artificial intelligence.

Events

CodeNeuro – Neuroscience + Data Science

San Francisco, CA Friday-Saturday, 14-15 October 2016 [free]

Tools & Resources

When and why log5 doesn’t work

Sabermetric Research, Phil Birnbaum

from September 22, 2016

It turns out that the log5 formula makes a certain assumption about the sport, an assumption that makes the log5 formula work out perfectly. That assumption is: that the set of score differentials follows a logistic distribution.

What’s the logistic distribution? It’s a lot like the normal distribution, a bell-shaped curve. They can be so similar in shape that I’d have trouble telling them apart by eye. But, the logistic distribution has fatter tails relative to the “bell.”

Practical tutorials and labs for TensorFlow used by Nvidia, FFN, CNN, RNN, Kaggle, AE

GitHub – alrojo

from September 25, 2016

Learn TensorFlow from scratch by examples and visualizations with interactive jupyter notebooks. Learn to compete in the Kaggle leaf detection challenge!

All exercises are designed to be run from a CPU on a laptop, but can be accelerated with GPU resources.

How do Convolutional Neural Networks work?

Brandon Rohrer, Data Science and Robots Blog

from August 18, 2016

Nine times out of ten, when you hear about deep learning breaking a new technological barrier, Convolutional Neural Networks are involved. Also called CNNs or ConvNets, these are the workhorse of the deep neural network field. They have learned to sort images into categories even better than humans in some cases. If there’s one method out there that justifies the hype, it is CNNs.

What’s especially cool about them is that they are easy to understand, at least when you break them down into their basic parts. I’ll walk you through it. There’s a video that talks through these images in greater detail. If at any point you get a bit lost, just click on an image and you’ll jump to that part of the video.

AWS ElasticSearch Setup

CTOvision.com, Adam Gerhart

from September 26, 2016

This is the first of a two-part post on getting Amazon’s version of ElasticSearch set up in AWS. We go over the basics of setting up an AWS ES cluster and then tackle supplying the cluster with data via Logstash in our next post.

Incremental, Iterative Data Processing with Timely Dataflow

Communications of the ACM

from October 01, 2016

We describe the timely dataflow model for distributed computation and its implementation in the Naiad system. The model supports stateful iterative and incremental computations. It enables both low-latency stream processing and high-throughput batch processing, using a new approach to coordination that combines asynchronous and fine-grained synchronous execution. We describe two of the programming frameworks built on Naiad: GraphLINQ for parallel graph processing, and differential dataflow for nested iterative and incremental computations. We show that a general-purpose system can achieve performance that matches, and sometimes exceeds, that of specialized systems.

Careers

Internships and other temporary positions

Visiting Researchers, The Alan Turing Institute

The Alan Turing Institute; British Library, London, UK

Tenured and tenure track faculty positions

Assistant Professor; Data Exploration

Emory University: Arts and Sciences: Math/Computer Science; Atlanta, GA

Postdocs

Postdoc; Predictive analytics in higher education

School of Information, University of Michigan; Ann Arbor, MI

Full-time, non-tenured academic positions

Research Data Manager; Rhode Island Innovate Policy Lab

Brown University, Computing and Information Services; Providence, RI

Sports.BradStenger.com

NYU Data Science newsletter – September 27, 2016

Leave a Comment Cancel reply