NYU Data Science newsletter – June 7, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 7, 2016

GROUP CURATION: N/A

Data Science News

O'Reilly Media, David Selby

from June 03, 2016

David Selby shares some of the data challenges he’s faced and explains why he’s particularly enthusiastic for the latest technological developments in the field. [video, 11:05]

How Smart Data Lakes are Revolutionizing Enterprise Analytics

SmartData Collective, Sean Martin

from June 03, 2016

As the quantity and diversity of relevant data grows within and outside of the enterprise, business users and IT are struggling to extract maximum value from this data. Fortunately, recent developments in big data technologies have significantly impacted the proficiency of contemporary analytics – the most profound of these involving the deployment of semantically enhanced Smart Data Lakes.

Defined as centralized repositories, Smart Data Lakes enable organizations to analyze alldata assets with a specificity and speed that wasn’t previously available, revolutionizing the scope and focus of analytics. The value derived from this approach improves the analytics process and expedites conventional data preparation.

Predicting the Impact of Scientific Concepts Using Full Text Features

Allen Institute, Semantic Scholar; Kathleen Mckeown et al.

from June 06, 2016

New scientific concepts, interpreted broadly, are continuously introduced in the literature , but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties – including researchers and the general public – focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features, and that the most accurate predictions result from combining the metadata and full text features. Surprisingly , these results hold even when the metadata features are available for a much larger number of documents than are available for the full text features.

Apple iPhone vs. Web Services

Business Insider

from June 05, 2016

This week, Intel CEO Brian Krzanich announced that people are keeping their PCs a lot longer before upgrading: The average has increased from four years to as many as six. …
What’s happening is pretty simple. The hardware and the software running on any device itself have become way less interesting than the web apps and services.

[1606.00776] Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

arXiv, Computer Science > Computation and Language; Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville

from June 02, 2016

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics.

Also, in text analysis:

Introducing DeepText: Facebook’s text understanding engine (June 01, Facebook Code, Engineering Blog)

[1606.00372] Conversational Contextual Cues: The Case of Personalization and History for Response Ranking (June 01, arXiv, Computer Science > Computation and Language; Rami Al-Rfou, Marc Pickett, Javier Snaider, Yun-hsuan Sung, Brian Strope, Ray Kurzweil)

Predicting the Impact of Scientific Concepts Using Full Text Features (June 06, Allen Institute, Semantic Scholar; Kathleen Mckeown et al.)

[1606.00372] Conversational Contextual Cues: The Case of Personalization and History for Response Ranking

arXiv, Computer Science > Computation and Language; Rami Al-Rfou, Marc Pickett, Javier Snaider, Yun-hsuan Sung, Brian Strope, Ray Kurzweil

from June 01, 2016

We investigate the task of modeling open-domain, multi-turn, unstructured, multi-participant, conversational dialogue. We specifically study the effect of incorporating different elements of the conversation. Unlike previous efforts, which focused on modeling messages and responses, we extend the modeling to long context and participant’s history. Our system does not rely on handwritten rules or engineered features; instead, we train deep neural networks on a large conversational dataset.

The secrets to future-proofing data science

SiliconANGLE

from June 06, 2016

The one thing that can be said with any certainty about technology is that there is no certainty.

“I’ve been fond of telling clients that anybody who says they have their future state figured out, I throw out of my office. Whatever we think we know about the process today, in two years it is going to completely change with new options we haven’t thought of yet,” said John Akred, founder and CTO of Silicon Valley Data Science, LLC.

New paper: “Safely interruptible agents”

Machine Intelligence Research Institute

from June 01, 2016

Google DeepMind Research Scientist Laurent Orseau and MIRI Research Associate Stuart Armstrong have written a new paper on error-tolerant agent designs, “Safely interruptible agents.” The paper is forthcoming at the 32nd Conference on Uncertainty in Artificial Intelligence.

Former NASA chief unveils $100 million neural chip maker KnuEdge

VentureBeat, Dean Takahashi

from June 06, 2016

It’s not all that easy to call KnuEdge a startup. Created a decade ago by Daniel Goldin, the former head of the National Aeronautics and Space Administration, KnuEdge is only now coming out of stealth mode. It has already raised $100 million in funding to build a “neural chip” that Goldin says will make data centers more efficient in a hyperscale age.

The Serial Entrepreneur Who Leads Paul Allen’s AI Institute

Forbes, Peter High

from June 06, 2016

… Peter High: You are the CEO of the Allen Institute for Artificial Intelligence whose mission is to contribute to humanity through high impact AI research and engineering. Can you provide your definition for high impact AI research and engineering?

Oren Etzioni: It starts with Paul Allen, who is a visionary and scientific philanthropist. He won the Carnegie Medal for Philanthropy last year. He has been passionate for decades about AI research and the potential of AI to benefit humanity.

In January 2014, we were launched as a nonprofit research institute in Seattle. We are now fifty people – about half PhDs and half engineers – and the question that we ask ourselves when we get up in the morning is “What can we do using the techniques?” Ultimately, to me, the computer is just a big pencil. What can we sketch using this pencil that makes a positive difference to society, and advances the state of the art, hopefully in an out-sized way? We are small compared to the teams that Google and Facebook and others have, but we want to punch above our weight class.

Events

Data Science Summit 2016

Industry leaders predict that most applications will be powered by machine learning within the next 3-5 years. To keep pace with this changing landscape, business and academic leaders are developing new tools and techniques that maximize the ability to create and leverage powerful machine learning. We have brought together these innovators for the 5th annual Data Science Summit in San Francisco.

Moore-Sloan Data Science Environment participants: Andreas Mueller (NYU), Magdalena Balazinska (UW-Seattle), Emily Fox (UW-Seattle), Bill Howe (UW-Seattle)

San Francisco, CA Tuesday-Wednesday, July 12-13 [$$$]

Deadlines

ACM DEV 2016 – Call for Papers

deadline: subsection?

DEV 2016 provides an international forum for research in the design, implementation, and evaluation of information and communication technologies (ICTs) for social and economic development.

Nairobi, Kenya Friday-Monday, November 18-21.

Deadline for submissions is Friday, June 24.

Mozilla Fellowships for Science

deadline: subsection?

We’re looking for researchers with a passion for open source and data sharing, already working to shift research practice to be more collaborative, iterative and open. Fellows will spend 10 months starting September 2016 as community catalysts at their institutions, mentoring the next generation of open data practitioners and researchers and building lasting change in the global open science community.

Deadline for applications is Saturday, July 16.

MacArthur Foundation is offering $100 million to a group that identifies a social problem and can solve it

deadline: subsection?

The MacArthur Foundation, which has doled out billions of dollars in “genius grants” and to nonprofit organizations working on major social challenges, now wants to give away $100 million to solve a societal problem that might not be on its radar.

Deadline for submissions is Monday, October 3.

Tools & Resources

Analytics and the customer journey: 7 best practices

SAS Voices, Analise Polsky

from June 03, 2016

Do you want to send out more timely, relevant offers to customers while they’re browsing online or visiting your store? What if you could reduce the average checkout time from four minutes to two by using your mobile app?

If you want answers to these questions and others, then it’s time for a journey map.

Toward A Practical Perceptual Video Quality Metric

The Netflix Tech Blog

from June 06, 2016

At Netflix we care about video quality, and we care about measuring video quality accurately at scale. Our method, Video Multimethod Assessment Fusion (VMAF), seeks to reflect the viewer’s perception of our streaming quality. We are open-sourcing this tool and invite the research community to collaborate with us on this important project.

idbr: access the US Census Bureau International Data Base in R

AriLamstein.com, Kyle Walker

from June 06, 2016

The US Census Bureau’s International Data Base (IDB) is one of the best resources on the web for obtaining both historical and future projections of international demographic indicators. I’ve long used the IDB in my teaching, generally using its web interface to download data extracts. However, the Census Bureau also makes the IDB accessible via its API, which makes it much more convenient for programmers to access the data. Earlier this year, I wrote the R package idbr (https://github.com/walkerke/idbr) to help R programmers use the IDB in their projects.

Big Data and Social Science: A Practical Guide to Methods and Tools

CRC Press; Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, and Julia Lane

from June 05, 2016

The goal of this book is to provide social scientists with an understanding of the key elements of this new science, its value, and the opportunities for doing better work. The goal is also to identify the many ways in which the analytical toolkits possessed by social scientists can be brought to bear to enhance the generalizability of the work done by computer scientists.

We take a pragmatic approach, drawing on our experience of working with data. Most social scientists set out to solve a real- world social or economic problem: they frame the problem, identify the data, do the analysis, and then draw inferences. At all points, of course, the social scientist needs to consider the ethical ramifications of their work, particularly respecting privacy and confidentiality. The book follows the same structure. We chose a particular problem—the link between research investments and innovation— because that is a major social science policy issue, and one in which social scientists have been addressing using big data techniques.

Careers

Urban Institute – Junior Data Visualization Developer

Urban Institute

PhD Studentship, OpenSystems

University of Barcelona

Sports.BradStenger.com

NYU Data Science newsletter – June 7, 2016

Leave a Comment Cancel reply