NYU Data Science newsletter – March 1, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 1, 2016

GROUP CURATION: N/A

 
Data Science News



Start-up makes sense of transit data

Los Angeles Times


from February 28, 2016

On the Internet, traffic is easily tracked. Google and Facebook have algorithms that know what users are searching for. Online retailers can monitor what shoppers are buying. Newspapers can see in real-time how many readers are viewing an article.

The same hasn’t been true for traffic in the physical world, where data-gathering has been decidedly more low-tech.

 

Everpub: reusable research, 21st century style

Tim Head, Konrad Hinsen, C. Titus Brown, Kyle Cranmer


from February 28, 2016

The open source/open science community is rapidly converging around a set of technologies that will enable highly reproducible and reusable computer-aided research. These technologies include environments to encode and encapsulate dependencies, cloud compute to execute workflows, collaboration technologies that enable remixing, and text formats that enable comparison and merging.

We believe that the time is right to develop a vertical spike through the problem space, with tools to go from an empty directory to a fully rendered paper with an associated workflow that can be executed, reviewed, and remixed. We will explore a specific vertical integration of the existing tools in a focused way, find points of general technical agreement, and map areas where further work is needed. In the process, we will provide a technical basis for demos and extension. Engagement with a broad community, open discussion, and community brainstorming will build consensus about “solved” problems as well as discovering the hard knots of disagreement. Finally, open community building around this problem will inevitably yield serendipitous long-term interactions.

 

The Promise of Artificial Intelligence Unfolds in Small Steps

The New York Times


from February 28, 2016

… “I think future generations are going to look back on the A.I. revolution and compare its impact to the steam engine or electricity,” said Erik Brynjolfsson, director of the Initiative on the Digital Economy at Massachusetts Institute of Technology’s Sloan School of Management. “But, of course, it is going to take decades for this technology to really come to fruition.”

There are reasons for enthusiasm. Computers continue to get cheaper even as they get more powerful, making it easier than ever to crunch vast amounts of data in an instant. Also, sensors, smartphones and other tech devices are all over the place, feeding more and more information into computers that are learning more and more about us.

 

An update on the “P” in AMP: Clams, Coins, and Careful Cleaning

AMPLab – UC Berkeley, Daniel Haas


from February 29, 2016

Many of the projects in the AMPLab drive systems for large-scale data analysis to ever-increasing sophistication, speed, and scale. No matter how efficient the algorithm or how powerful the system that runs it, however, the human analyst remains a fundamental in-the-loop component of a data processing pipeline. End-to-end data analysis involves frequent iteration through human-mediated processing steps such as formulating queries or machine learning workflows, enriching and cleaning data, and manually examining, visualizing, and evaluating the output of the analysis. In some of our recent work in support of the “P” in AMP, the goal is to make such operations scalable (via approximation, machine learning, and the involvement of crowd workers on platforms like Amazon’s Mechanical Turk) and low-latency (via adapting techniques inspired by the existing distributed systems literature for human workers).

One such effort focuses on data labeling, a necessary but often slow process that impedes the development of interactive systems for modern data analysis. Despite rising demand for manual data labeling, there is a surprising lack of work addressing its high and unpredictable latency. In our latest paper (which will appear in VLDB 2016), we introduce CLAMShell, a system that speeds up crowds in order to achieve consistently low-latency data labeling.

 

Inside the Artificial Intelligence Revolution: Pt. 1

Rolling Stone, Jeff Goodell


from February 29, 2016

Welcome to robot nursery school,” Pieter Abbeel says as he opens the door to the Robot Learning Lab on the seventh floor of a sleek new building on the northern edge of the UC-Berkeley campus. The lab is chaotic: bikes leaning against the wall, a dozen or so grad students in disorganized cubicles, whiteboards covered with indecipherable equations. Abbeel, 38, is a thin, wiry guy, dressed in jeans and a stretched-out T-shirt. He moved to the U.S. from Belgium in 2000 to get a Ph.D. in computer science at Stanford and is now one of the world’s foremost experts in understanding the challenge of teaching robots to think intelligently. But first, he has to teach them to “think” at all. “That’s why we call this nursery school,” he jokes.

Also:

  • The Promise of Artificial Intelligence Unfolds in Small Steps (The New York Times, February 28)
  • What counts as artificially intelligent? AI and deep learning, explained (The Verge, February 29)
  • AlphaGo and AI Progress (Miles Brundage, February 27)
  •  

    All Talk and No Buttons: The Conversational UI

    A List Apart, Matty Mariansky


    from February 23, 2016

    We’re witnessing an explosion of applications that no longer have a graphical user interface (GUI). They’ve actually been around for a while, but they’ve only recently started spreading into the mainstream. They are called bots, virtual assistants, invisible apps. They can run on Slack, WeChat, Facebook Messenger, plain SMS, or Amazon Echo. They can be entirely driven by artificial intelligence, or there can be a human behind the curtain.

     

    AlphaGo and AI Progress

    Miles Brundage


    from February 27, 2016

    AlphaGo’s victory over Fan Hui has gotten a lot of press attention, and relevant experts in AI and Go have generally agreed that it is a significant milestone. For example, Jon Diamond, President of the British Go Association, called the victory a “large, sudden jump in strength,” and AI researchers Francesca Rossi, Stuart Russell, and Bart Selman called it “important,” “impressive,” and “significant,” respectively.

    How large/sudden and important/impressive/significant was AlphaGo’s victory? Here, I’ll try to at least partially answer this by putting it in a larger context of recent computer Go history, AI progress in general, and technological forecasting. In short, it’s an impressive achievement, but considering it in this larger context should cause us to at least slightly decrease our assessment of its size/suddenness/significance in isolation. Still, it is an enlightening episode in AI history in other ways, and merits some additional commentary/analysis beyond the brief snippets of praise in the news so far. So in addition to comparing the reality to the hype, I’ll try to distill some general lessons from AlphaGo’s first victory about the pace/nature of AI progress and how we should think about its upcoming match against Lee Sedol.

     

    What counts as artificially intelligent? AI and deep learning, explained

    The Verge


    from February 29, 2016

    … What do ‘neural network,’ ‘machine learning,’ and ‘deep learning’ actually mean?

    These are the three terms you’re most likely to have heard lately, and, to be as simple as possible, we can think of them in layers. Neural networks are at the bottom — they’re a type of computer architecture onto which artificial intelligence is built. Machine learning is next — it’s a program you might run on a neural network, training computers to look for certain answers in pots of data; and deep learning is on top — it’s a particular type of machine learning that’s only become popular over the past decade, largely thanks to two new resources: cheap processing power and abundant data (otherwise known as the internet).

     

    Beyond the hashtags: #Ferguson, #Blacklivesmatter, and the online struggle for offline justice

    American University, Center for Media & Social Impact


    from February 29, 2016

    IN 2014, A DEDICATED ACTIVIST MOVEMENT—Black Lives Matter (BLM)—ignited an urgent national conversation about police killings of unarmed Black citizens. Online tools have been anecdotally credited as critical in this effort, but researchers are only beginning to evaluate this claim. This research report examines the movement’s uses of online media in 2014 and 2015. To do so, we analyze three types of data: 40.8 million tweets, over 100,000 web links, and 40 interviews of BLM activists and allies.

     

    Research Partners Invited!

    UC-Berkeley, ENVIRONMENT and SOCIETY: DATA SCIENCES for the 21st CENTURY (DS421)


    from February 29, 2016

    Global environmental change poses critical environmental and societal needs, and the next generation of students are part of the future solutions. This National Science Foundation Research Traineeship (NRT) in Data Science for the 21st Century prepares graduate students at the University of California Berkeley with the skills and knowledge needed to evaluate how rapid environmental change impacts human and natural systems and to develop and evaluate data-driven solutions in public policy, resource management, and environmental design that will mitigate negative effects on human well-being and the natural world. Trainees will research topics such as management of water resources, regional land use, and responses of agricultural systems to economic and climate change, and develop skills in data visualization, informatics, software development, and science communication.

    In a final semester innovative team-based problem-solving course, trainees will collaborate with an external partner organization to tackle a challenge in global environmental change that includes a significant problem in data analysis and interpretation of impacts and solutions.

     
    Events



    Cornell Tech Health Conference 2016



    Everyone is invited to attend this conference which explores current trends and expanding, new market opportunities in the health sector. Many of these opportunities are made possible by advances in technological innovation. Innovators and leaders, ranging from hospitals, research centers, academia, pharmaceuticals, investors, and incubators will present a fascinating, multifaceted view of emerging trends and the future of health and healthcare services made possible utilizing technology.

    See speakers below. Join us for this deeply engaging and exciting meeting of the minds!

    A hackathon, at Cornell Tech, supported by Johnson & Johnson and Cerner, will immediately follow the conference. Participants will form teams and create over the weekend for a chance to build the next technology enabled solution in health services. For more information: Health Hackathon

    Friday, March 4, at Cooper Union, starting 12:30 p.m.

     

    Structure Data 2016



    You’ve heard about the power of data for years, now hear how the world’s leading companies are using it to evolve their businesses for the 21st century. Structure Data will feature interviews with enterprise data decision-makers, as well as the executives driving the world of big data software and the researchers taking technologies like artificial intelligence from the lab to our living rooms.

    Wednesday-Thursday, March 9-10, San Francisco

     
    Tools & Resources



    Four pitfalls of hill climbing

    Chris Said, The File Drawer blog


    from February 28, 2016

    One of the great developments in product design has been the adoption of A/B testing. Instead of just guessing what is best for your customers, you can offer a product variant to a subset of customers and measure how well it works. While undeniably useful, A/B testing is sometimes said to encourage too much “hill climbing”, an incremental and short-sighted style of product development that emphasizes easy and immediate wins.

    Discussion around hill climbing can sometimes get a bit vague, so I thought I would make some animations that describe four distinct pitfalls that can emerge from an overreliance on hill climbing.

     

    How to Code and Understand DeepMind’s Neural Stack Machine

    Andrew Trask, i am trask blog


    from February 25, 2016

    Summary: I learn best with toy code that I can play with. This tutorial teaches DeepMind’s Neural Stack machine via a very simple toy example, a short python implementation. I will also explain my thought process along the way for reading and implementing research papers from scratch, which I hope you will find useful.

     

    TensorFlow for Poets

    Pete Warden's blog


    from February 28, 2016

    I feel very lucky to be a part of building TensorFlow, because it’s a great opportunity to bring the power of deep learning to a mass audience. I look around and see so many applications that could benefit from the technology by understanding the images, speech, or text their users enter. The frustrating part is that deep learning is still seen as a very hard topic for product engineers to grasp. That’s true at the cutting edge of research, but otherwise it’s mostly a holdover from the early days. There’s already a lot of great documentation on the TensorFlow site, but to demonstrate how easy it can be for general software engineers to pick up I’m going to present a walk-through that takes you from a clean OS X laptop all the way to classifying your own categories of images. You’ll find written instructions in this post, along with a screencast showing exactly what I’m doing.

     

    Launching the Luna Rating System

    Medium, Tom Silver


    from February 25, 2016

    … The Turing Test was meant to be a thought experiment, not a practical yardstick for AI research. But its principles have withstood the test of time for good reason. First, the Test acknowledges that intelligence cannot be evaluated without an intelligent, human evaluator. Second, the choice of natural language as the medium for the Test is critical. Natural language is widely believed to be AI-complete, meaning that a machine capable of natural language will be capable of all other feats of AI. A practical test for artificial intelligence should build on these same foundations.

    Luna Rating System

    As the field moves at an unprecedented pace, it is more important than ever to have an accurate sense of where we are. With this ambition, I introduce the Luna Rating System. Luna, for short.

     

    Leave a Comment

    Your email address will not be published.