NYU Data Science newsletter – June 22, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 22, 2016

GROUP CURATION: N/A

Data Science News

Why Twitter Just Bought an Artificial-Intelligence Start-Up Called Magic Pony

Vanity Fair, The Hive blog

from June 20, 2016

With Twitter’s stock in the gutter and investors clambering for a turnaround, C.E.O. Jack Dorsey has embarked on an acquisition spree that could give 2013-era Marissa Mayer a run for her money. Fresh off a $70 million investment in audio-streaming service SoundCloud, Twitter announced Monday that it has purchased Magic Pony, a British artificial-intelligence start-up, for a reported $150 million. Magic Pony is the third machine-learning start-up Twitter has purchased in as many years.

Also in Internet and big media:

The Inventors of the Internet Are Trying to Build a Truly Permanent Web (June 20, WIRED, Business)

Nielsen hopes to bring science to TV casting (June 22, Associated Press)

NYC Media Lab ’16 summit (takes place on Thursday, September 22, at Columbia University)

From Smart City to Quantified Community: A New Approach to Urban Science | NYU Center for the Humanities

NYU Center for the Humanities

from June 16, 2016

Urban planning as a profession shifted radically after World War II. A result of the military development of systems engineering and optimization processes for radar and missile control, planners attempted to apply complex systems models and new decision-making algorithms to create optimized solutions to dynamic problems.

World’s Fastest Supercomputer Now Has Chinese Chip Technology

Bloomberg

from June 20, 2016

In a threat to U.S. technology dominance, the world’s fastest supercomputer is powered by Chinese-designed semiconductors for the first time. It’s a breakthrough for China’s attempts to reduce dependence on imported technology.

The Sunway TaihuLight supercomputer, located at the state-funded Chinese Supercomputing Center in Wuxi, Jiangsu province, is more than twice as powerful as the previous winner, according to TOP500, a research organization that compiles the rankings twice a year.

Artificial Intelligence Achieves Near-Human Performance in Diagnosing Breast Cancer

Beth Israel Deaconess Medical Center

from June 19, 2016

A research team from Beth Israel Deaconess Medical Center and Harvard Medical School recently developed artificial intelligence methods aimed at training computers to interpret pathology images, with the long-term goal of building AI-powered systems to make pathologic diagnoses more accurate

Also in clinical research:

Why Most Clinical Research Is Not Useful (June 21, PLOS Medicine, Essay; John P. A. Ioannidis)

Blood-Based Screening for Colon Cancer – A Disruptive Innovation or Simply a Disruption? (June 15, JAMA, Viewpoint; Ravi B. Parikh, MD, MPP and Vinay Prasad, MD, MPH)

I’m Feeling Yucky 🙁 Searching for symptoms on Google (June 20, Official Google Blog)

How Artificial Intelligence Is Bringing Us Smarter Medicine (June 20, Fast Company)

The Inventors of the Internet Are Trying to Build a Truly Permanent Web | WIRED

WIRED, Business

from June 20, 2016

… Says Cerf, “I’m concerned about a coming digital dark ages.”

That’s why he and some of his fellow inventors of the Internet are joining with a new generation of hackers, archivists, and activists to radically reinvent core technologies that underpin the web. Yes, they want to make the web more secure. They want to make it less vulnerable to censorship. But they also want to make it more resilient to the sands of time.

DARPA to Build “Virtual Data Scientist” Assistants Through A.I.

Inverse

from June 18, 2016

The Defense Advanced Research Projects Agency (DARPA) announced on Friday the launch of Data-Driven Discovery of Models (D3M), which aim to help non-experts bridge what it calls the “data-science expertise gap” by allowing artificial assistants to help people with machine learning. DARPA calls it a “virtual data scientist” assistant.

This software is doubly important because there’s a lack of data scientists right now and a greater demand than ever for more data-driven solutions. DARPA says experts project 2016 deficits of 140,000 to 190,000 data scientists worldwide, and increasing shortfalls in coming years.

Baidu Is Using Its Own Data to Measure China’s Economy

Bloomberg

from June 21, 2016

Baidu is using its own trove of data to measure China’s economy, devising new gauges that may paint a better picture than the government’s. Bloomberg’s David Ramli reports on “Asia Edge.”

Bringing Precision to the AI Safety Discussion

Google Research Blog, Chris Olah

from June 21, 2016

We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley. … We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

Meantime, in search and Google:

The big search upgrade — and how Amazon could beat Google at its own game (June 21, VentureBeat, Ivan Bercovich)

Knights Landing Proves Solid Ground for Intel’s Stake in Deep Learning

The Next Platform

from June 21, 2016

Intel has finally opened the first public discussions of its investment in the future of machine learning and deep learning and while some might argue it is a bit late in the game with its rivals dominating the training market for such workloads, the company had to wait for the official rollout of Knights Landing and extensions to the scalable system framework to make it official—and meaty enough to capture real share from the few players doing deep learning at scale.

Also in hardware:

“Artificial Synapses” Could Let Supercomputers Mimic the Human Brain (June 20, Scientific American, LiveScience)

World’s Fastest Supercomputer Now Has Chinese Chip Technology (June 20, Bloomberg)

Barefoot Networks’ New Chips Will Transform the Tech Industry (June 14, WIRED, Business)

HPC Spending Outpaces The IT Market, And Will Continue To (June 22, The Next Platform)

The Conference Scene for Data-Driven Discovery

Medium, Moore Data, Carly Strasser

from June 21, 2016

Do you ever get FOMO when you see a conference hashtag on Twitter? We do. Luckily, we have a way to find out what conferences are most important to our grantees in Data-Driven Discovery at the Moore Foundation: annual reports. As part of our yearly fact-gathering from grantees, we request information about the conferences they have attended in the last year. We collected this information from our 14 DDD Investigators and more than 100 researchers from the Moore-Sloan Data Science Environments.

More data news from the Foundations who support this newsletter:

Galaxy-seeking robots (June 15, Gordon and Betty Moore Foundation)

Four foundations announce support for ASAPbio (June 20, ASAPbio)

Google searches will soon show ‘related medical conditions’ when someone searches for health symptoms

Wired UK

from June 21, 2016

Google is going to start showing medical details in search results that are related to illnesses.

Starting with US users in the “coming days”, the Silicon Valley company will provide a number of possible “related conditions” when a person searches for symptoms they may be suffering.

Forget Doomsday AI—Google Is Worried about Housekeeping Bots Gone Bad | WIRED

WIRED, Business

from June 21, 2016

Tom Murphy graduated from Carnegie Melon University with a PhD in computer science. Then he built software that learned to play Nintendo games.

In some cases, the system works well. Playing Super Mario, for instance, it learns to exploit a bug in the game, stomping on enemy Goombas even when floating below them. It can rack up points by attacking the game with a reckless abandon you and I would never try. But in other cases, it fizzles. It scores fewer points in Tetris than it would by merely placing blocks at random.

Deadlines

Open Context & Carleton Prize for Archaeological Visualization

deadline: subsection?

Increasingly, archaeology data are being made available openly on the web. But what do these data show? How can we interrogate them? How can we visualize them? How can we re-use data visualizations?

We’d like to know. This is why we have created the Open Context and Carleton University Prize for Archaeological Visualization and we invite you to build, make, hack, the Open Context data and API for fun and prizes.

Deadline for submissions is Friday, December 16.

Tools & Resources

Hello, TensorFlow! – O’Reilly MediaO’ReillysearchconfigureClose MenuOpen Menusearchcodecodefacebooktwitteryoutube-largegooglelinkedin

O'Reilly Radar, Amy Schumacher

from June 20, 2016

How does TensorFlow work? Let’s break it down so we can see and understand every moving part. We’ll explore the data flow graph that defines the computations your data will undergo, how to train models with gradient descent using TensorFlow, and how TensorBoard can visualize your TensorFlow work. The examples here won’t solve industrial machine learning problems, but they’ll help you understand the components underlying everything built with TensorFlow, including whatever you build next!

persistent-rnn: Fast Recurrent Networks Library

GitHub – baidu-research

from June 20, 2016

For a GPU, the largest source of on-chip memory is distributed among the individual register files of thousands of threads. For example, the NVIDIA TitanX GPU has 6.3 MB of register file memory, which is enough to store a recurrent layer with approximately 1200 activations. Persistent kernels exploit this register file memory to cache recurrent weights and reuse them over multiple timesteps.

Avoiding reloading layer weights multiple times makes persistent kernels very efficient at low batch sizes.

Comma Separated JSON

kirit.com

from June 18, 2016

The problem with JSON is that to produce it you need to build a memory structure of everything you want to dump out, and to parse it you have to build everything in one go back into memory. This is fine for small JSON blobs, but isn’t really ideal when the data consists of many mega bytes, or more.

XML solves this by having event based parsers that allow you to read sub-sections of the structure as they stream past. Kind of great, but who really wants to go back to XML?

CSV solves this in a different way. By having each line of data pretty much independent we can both generate and parse it one line at a time.

How we made a VR data visualization

Simon Rogers

from June 20, 2016

When we made an interactive guide to the UK’s EU referendum, which takes place this Thursday, it seemed like an important opportunity to try out producing our own 360-degree data visualization.

Careers

Projects Data Coordinator: NSF Arctic Data Center at the National Center for Ecological Analysis and Synthesis (NCEAS)

University of California-Santa Barbara

Job Listings | SciPy 2016

SciPy

Spheryx Solutions – Scientific Software Developer

Spheryx Solutions

Sports.BradStenger.com

NYU Data Science newsletter – June 22, 2016

Leave a Comment Cancel reply