NYU Data Science newsletter – May 25, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 25, 2016

GROUP CURATION: N/A

Data Science News

The intelligent app ecosystem (is more than just bots!)

TechCrunch

from May 24, 2016

… Now is an exciting time to be investing in the broader intelligent app ecosystem because several important trends are coming together in application development:

The availability of massive computational power and low-cost storage to feed machine learning models,

The ease of use with which developers can take advantage of machine learning techniques,

The adoption of microservices as a development paradigm for applications, and

The proliferation of platforms on which to develop applications, and in particular platforms based on “natural user interfaces” like messaging and voice.

Data Science: Just One More Way That Donald Trump is Different

Domino Blog

from May 23, 2016

If you’re a data scientist interested in working in politics, don’t bother applying to the Trump campaign — but Hillary has a job for you.

In a recent AP interview Donald Trump discounted the value of data as a campaign strategy tool. His plan is to win based on the strength of his personality. The presumptive Republican nominee pointed to Obama’s personality, saying “Obama got the votes much more so than his data processing machine, and I think the same is true with me.”

Networking @Scale, May 2016 — Recap

Facebook Code, Engineering Blog

from May 24, 2016

Last year, we held our first Networking @Scale, our invitation-only, one-day technical conference for engineers working on large-scale networking solutions. We received a tremendous amount of interest and positive feedback. So, this year, we decided to go even bigger with a two-day event. We hosted this second Networking @Scale on May 10 and 11 and had speakers from Akamai, AT&T, Comcast, Facebook, Google, the Jet Propulsion Laboratory (JPL), Microsoft, and Netflix. This year’s event reinforced the incredible range of network challenges we all face as a community.

Baidu’s Ambitions in Voice Tech

Bloomberg

from May 23, 2016

Andrew Ng, chief scientist at Baidu, and Bloomberg’s Jack Clark discuss voice-based communications, the challenges of artificial intelligence and the advances in speech recognition. They speak to Bloomberg’s Emily Chang on “Bloomberg West.”

Deep learning enters the beauty industry

EurekAlert! Science News, Insilico Medicine

from May 24, 2016

Insilico Medicine to present their results in applying deep learning to biomarker development and cosmetics applications at INNOCOS World Beauty Innovation Summit in Vienna 9-10th of June. INNOCOS is one of the largest annual events in the beauty industry bringing together top experts from many areas of research, R&D heads of the cosmetics conglomerates, innovation and strategy professionals and digital media experts.

In addition to heading Insilico Medicine, Inc, a big data analytics company focused on applying advanced signaling pathway activation analysis and deep learning methods to biomarker and drug discovery in cancer and age-related diseases, Alex Zhavoronkov, PhD is the co-founder and principal scientist of Youth Laboratories, a company focusing on applying machine learning methods to evaluating the condition of human skin and general health status using multimodal inputs.

Contextual sensitivity in scientific reproducibility

Proceedings of the National Academy of Sciences; Jay J. Van Bavel, Peter Mende-Siedlecki, William J. Brady, and Diego A. Reinero

from May 23, 2016

Scientific progress requires that findings can be reproduced by other scientists. However, there is widespread debate in psychology (and other fields) about how to interpret failed replications. Many have argued that contextual factors might account for several of these failed replications. We analyzed 100 replication attempts in psychology and found that the extent to which the research topic was likely to be contextually sensitive (varying in time, culture, or location) was associated with replication success. This relationship remained a significant predictor of replication success even after adjusting for characteristics of the original and replication studies that previously had been associated with replication success (e.g., effect size, statistical power). We offer recommendations for psychologists and other scientists interested in reproducibility.

Context matters when replicating experiments, argues study

Retraction Watch

from May 23, 2016

Lead author of the study, Jay Van Bavel, an assistant professor of social psychology at New York University, told us the results suggest that the study’s context was important to the success of its replication in a “small to moderate effect size.”

Urika-ka-ka-ka! Cray unveils open source big data box

The Register

from May 24, 2016

Supercomputer shop Cray is branching into analytics with tailored high-end systems packing open-source number-crunching software.

Cray Tuesday unveiled the Urika-GX ticking some of the best-known technology boxes and due in the second-half of 2016.

Carnegie Mellon’s BrainHub To Host First Neurohackathon

Carnegie Mellon University, CMU News

from May 17, 2016

Carnegie Mellon University’s BrainHub will host its first Neurohackathon, sponsored by Qualcomm, May 24-25. The event is one of the first hackathons to engage computer scientists in using one of the hardest systems to crack: the structure of neural data and the brain.

[1605.07110] Deep Learning without Poor Local Minima

arXiv, Statistics > Machine Learning; Kenji Kawaguchi

from May 23, 2016

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. For an expected loss function of a deep nonlinear neural network, we prove the following statements under the independence assumption adopted from recent work: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) the property of saddle points differs for shallow networks (with three layers) and deeper networks (with more than three layers). Moreover, we prove that the same four statements hold for deep linear neural networks with any depth, any widths and no unrealistic assumptions. As a result, we present an instance, for which we can answer to the following question: how difficult to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima and the property of the saddle points). We note that even though we have advanced the theoretical foundations of deep learning, there is still a gap between theory and practice.

Why AI is finally going mainstream

O'Reilly Radar, Tim O'Reilly

from May 24, 2016

There is a growing—and urgent—need for information on applied AI, as opposed to the kind of research presented at academic conferences. That’s why we’re launching a new event—the O’Reilly Artificial Intelligence Conference—to be held in New York City at the Javits Convention Center, alongside the Strata + Hadoop World conference. This event will focus on the engineering practices and technologies used for building smart software.

We’re gathering developers interested in artificial intelligence to talk about the latest research and explain how to implement AI in real projects. And we’re gathering business people who want to think about the impact AI will have on their businesses: how they will use it and how it will change their competitive landscape. We’ll discuss the limitations of artificial intelligence and its untapped opportunities.

Unveiling the Hidden Layers of Deep Learning

Scientific American, Jen Christiansen

from May 20, 2016

In a recent Scientific American article entitled “Springtime for AI: The Rise of Deep Learning,” computer scientist Yoshua Bengio explains why complex neural networks are the key to true artificial intelligence as people have long envisioned it. It seems logical that the way to make computers as smart as humans is to program them to behave like human brains. However, given how little we know of how the brain functions, this task seems more than a little daunting. So how does deep learning work?

This visualization by Jen Christiansen explains the basic structure and function of neural networks.

Events

Big Data in Biomedicine Conference

Join researchers and leaders from academia, hospitals, government and industry for two days at Stanford Medicine to spark collaborations, address challenges, and identify actionable steps for using large-scale data analysis and technology to improve human health.

Palo Alto, CA Thursday-Friday, May 25-26. Live webcast available.

CDS News

CDS Fellow Interview: Daniel Fernández

NYU Center for Data Science

from May 24, 2016

… What drew you to work on a project focusing on speech impediments?

I have been always interested in research topics concerning the social integration of underrepresented minorities. I completed my doctoral work at Victoria University of Wellington in New Zealand, and while I was there, I worked at the University’s Disability Service’s Department, which helped students with social communication disorders or language-based learning disabilities. … We are using crowdsourced experiments to obtain data in the study of speech-rate tasks. An example of a speech-rate task would be a child saying a word which contains an “r” sound. This word is uploaded into the crowsourced experiment, and a group of non-expert listeners rate whether this particular child has produced the sound of the “r” correctly or incorrectly.

Tools & Resources

Programmers are not different, they need simple UIs.

Salvatore Sanfilippo, <antirez>

from May 24, 2016

… want to assume that at this point is tacit, given for granted, that programmers also have user interfaces, and that such user interfaces are so crucial to completely change the perception of a system. Database query languages, libraries calls, programming languages, Unix command line tools, they all have an User Interface part. If you use them daily, for you they are more UIs than anything else.

So if this is all well known why I’m here writing this blog post? Because I want to stress how important is the concept of simplicity, not just in graphical UIs, but also in UIs designed for programmers. The act of iterating again and again to find a simple UI solution is not a form of perfectionism, it’s not futile narcissism.

Computational and Inferential Thinking

GitBook

from May 23, 2016

By Ani Adhikari and John DeNero; Contributions by David Wagner

This is the textbook for the Foundations of Data Science class at UC Berkeley.

Deploying Elasticsearch At Scale for Social Media Analytics

Spinn3r blog

from May 23, 2016

… We have about 150 database class machines backing our Elasticsearch and Cassandra install.

All our machines are identical and have 128GB of RAM, 2 1TB SSDs and single core 3.2Ghz. We will probably be doubling down and buying more hardware towards the end of the year and expect to be near 1000 machines.

Making Sense of Stream Processing

O'Reilly Media, Martin Kleppmann

from May 24, 2016

How can event streams help make your application more scalable, reliable, and maintainable? In this report, O’Reilly author Martin Kleppmann shows you how stream processing can make your data storage and processing systems more flexible and less complex. Structuring data as a stream of events isn’t new, but with the advent of open source projects such as Apache Kafka and Apache Samza, stream processing is finally coming of age.

Using several case studies, Kleppmann explains how these projects can help you reorient your database architecture around streams and materialized views. The benefits of this approach include better data quality, faster queries through precomputed caches, and real-time user interfaces. Learn how to open up your data for richer analysis and make your applications more scalable and robust in the face of failures.

Sports.BradStenger.com

NYU Data Science newsletter – May 25, 2016

Leave a Comment Cancel reply