NYU Data Science newsletter – May 28, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 28, 2015

GROUP CURATION: N/A

Data Science News

Introduction to Neural Machine Translation with GPUs (part 1) | Parallel Forall

Nvidia, Cuda Zone

from May 27, 2015

Neural machine translation is a recently proposed framework for machine translation based purely on neural networks. This post is the first of a series in which I will explain a simple encoder-decoder model for building a neural machine translation system [Cho et al., 2014; Sutskever et al., 2014; Kalchbrenner and Blunsom, 2013]. In a later post I will describe how an attention mechanism can be incorporated into the simple encoder-decoder model [Bahdanau et al., 2015], leading to the state-of-the-art machine translation model for a number of language pairs including En-Fr, En-De, En-Tr and En-Zh [Gulcehre et al., 2015; Jean et al., 2015]. Furthermore, I will introduce recent work which has applied this framework of neural machine translation to image and video description generation [Xu et al., 2015; Li et al., 2015].

Using logs to build a solid data infrastructure (or: why dual writes are a bad idea) | Confluent

Martin Kleppmann, Confluent blog

from May 27, 2015

This is an edited transcript of a talk I gave at the Craft Conference 2015. The video and slides are also available. … Logs are everywhere. I’m not talking about plain-text log files (such as syslog or log4j) – I mean an append-only, totally ordered sequence of records. It’s a very simple structure, but it’s also a bit strange at first if you’re used to normal databases. However, once you learn to think in terms of logs, many problems of making large-scale data systems reliable, scalable and maintainable suddenly become much more tractable. [video, 45:39]

White House sides with Oracle, tells Supreme Court APIs are copyrightable

Ars Technica

from May 27, 2015

The Justice Department is weighing in on the hot-button intellectual property dispute between Google and Oracle, telling the Supreme Court that APIs are protected by copyright.

Exposure to Ideologically Diverse News and Opinion, Future Research

Solomon Messing

from May 24, 2015

Earlier this month, we published an early access version of our paper in ScienceExpress (Bakshy et al. 2015), “Exposure to ideologically diverse news and opinion on Facebook.” The paper constitutes the first attempt to quantify the extent to which ideologically cross-cutting hard news and opinion is shared by friends, appears in algorithmically ranked News Feeds, and is actually consumed (i.e., click through to read). … While Science papers are explicitly prohibited from suggesting future directions for research, we would like to shed additional light on our study and raise a few questions that we would be excited to see addressed in future work.

I.R.S. Data Breach May Be Sign of More Personalized Schemes

The New York Times

from May 27, 2015

The plot to steal information on 100,000 taxpayers from the Internal Revenue Service and hijack nearly $50 million in refunds not only reveals a previous security breach but hints at a wider fraud that may bedevil Americans in the future.

Some security and tax experts warned that this latest data theft might be a prelude to more targeted schemes aimed at duping taxpayers into handing millions of dollars over to criminals or to help thieves circumvent the agency’s security filters next year and beyond.

Want better science? Quit hoarding data, genetics researchers say

BetaBoston

from May 27, 2015

… [Andrea Downing] chanced on ClinVar, a research database funded by the National Institute of Health that acts as a kind of Wikipedia to catalogue scientific research on mutations in genes. It gave her a roadmap for the research associated with her variant, called C16G.

Downing typed in the letters and numbers of her mutation, and the website spit out a list of companies and labs that have studied her variant. Though much of that information was technical, she said, “the things I do understand about it are very empowering. It’s a starting point to answering questions I don’t know.”

When the database first launched, the idea was that the single repository would present a unified picture of a variant, drawing from all available research that was publicly shared by companies and research labs.

APIs Are Eating The World

InformationWeek

from May 26, 2015

The ability to deploy IT services rapidly and reliably has transformed the nature of business.

Here’s why finance is about to be disrupted – O’Reilly Radar

O'Reilly Radar, Paul Kedrosky

from May 28, 2015

Finance has the three main characteristics of an industry likely to be transformed by technology: 1. It traffics in bits, not atoms. 2. Its services are often delivered remotely. 3. There is little need for human contact.

Unlike other sectors with these characteristics — e.g., media, advertising, and travel services — finance hasn’t been disrupted. Despite huge technological change and a series of financial crises, the league table of the largest financial firms today, both in the U.S. and around the world, remains much the same as it has always been.

Why we need to start talking about the user experience of data

Medium, Chris Dowsett

from May 26, 2015

Over the past four years, I’ve been undertaking Doctorate research studying how business leaders used data in their decision making process. My research focused on factors that influenced data choices and how senior business leaders prioritized different data sources in decisions.

I found that many companies were investing in data technologies without also investing in structures and processes to help managers use the growing variety of data.

CDS News

‘Deep Learning’ Will Soon Give Us Super-Smart Robots

WIRED, Business

from May 27, 2015

… This week in the scientific journal Nature, [Yann] LeCun—also a professor of computer science at New York University—details the current state of this “deep learning” technology in a paper penned alongside the two other academics most responsible for this movement: University of Toronto professor Geoff Hinton, who’s now at Google, and the University of Montreal’s Yoshua Bengio. The paper details the widespread progress of deep learning in recent years, showing the wider scientific community how this technology is reshaping our internet services—and how it will continue to reshape them in the years to come.

But as LeCun tells WIRED, deep learning will also extend beyond the internet, pushing into devices that can operate here in the physical world—things like robots and self-driving cars.

Deep learning

Nature

from May 27, 2015

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

Sports.BradStenger.com

NYU Data Science newsletter – May 28, 2015

Leave a Comment Cancel reply