NYU Data Science newsletter – May 5, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 5, 2016

GROUP CURATION: N/A

Data Science News

Penn Medicine: Using Data To Save Patient Lives

InformationWeek, Elite 100

from May 02, 2016

Penn Signals is a system that uses existing data from electronic health records to perform real-time predictive analysis of heart failure patients. The goal? Penn Medicine wanted to place patients in proper risk groups and assign them to cardiology resources in order to get them the best care and improve their outcomes.

Maana raises $26M from top oil and gas investors for big data analytics platform

GeekWire

from May 03, 2016

The 4-year-old startup aims to help large companies in industries like oil and gas, manufacturing, healthcare, and insurance harness the potential of their internal data with an enterprise-level search engine.

Maana CTO and co-founder Donald Thompson, a former long-time Microsoft employee, told GeekWire that part of his company’s secret sauce lies within its patented “Enterprise Knowledge Graph,” which lets data analysts, business analysts, data scientists, and enterprise architects collaborate in a single, integrated system. The idea is to use machine learning and other technology to bring together siloed data and ultimately help drive better decision-making.

Qualcomm’s deep learning SDK will mean more AI on your smartphone

The Verge

from May 02, 2016

The benefits of machine learning continue to trickle down to smartphones and gadgets, and chipmaker Qualcomm wants to help speed up the process. The company is launching a new software development kit for its “machine intelligence platform” Zeroth. This SDK will make it easier for companies to run deep learning programs directly on devices like smartphones and drones — if they’re powered by one of Qualcomm’s chips, of course.

[1604.06076] Question Answering via Integer Programming over Semi-Structured Knowledge

arXiv, Computer Science > Artificial Intelligence; Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, Dan Roth

from April 20, 2016

Answering science questions posed in natural language is an important AI challenge. Answering such questions often requires non-trivial inference and knowledge that goes beyond factoid retrieval. Yet, most systems for this task are based on relatively shallow Information Retrieval (IR) and statistical correlation techniques operating on large unstructured corpora. We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts. On a dataset of real, unseen science questions, our system significantly outperforms (+14%) the best previous attempt at structured reasoning for this task, which used Markov Logic Networks (MLNs). It also improves upon a previous ILP formulation by 17.7%. When combined with unstructured inference methods, the ILP system significantly boosts overall performance (+10%). Finally, we show our approach is substantially more robust to a simple answer perturbation compared to statistical correlation methods.

Preparing for the Future of Artificial Intelligence

whitehouse.gov, Ed Felten

from May 03, 2016

Today, we’re [the White House] announcing a new series of workshops and an interagency working group to learn more about the benefits and risks of artificial intelligence.

Also, in federal government announcements:

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights (May 4, whitehouse.gov; Megan Smith, DJ Patil, Cecilia Muñoz)

What We Mean When We Talk About EvGen Part II: Building Out a National System for Evidence Generation (May 3, FDA Voice, Rachel E. Sherman)

Software Update Destroys $286 Million Japanese Satellite

Hackaday

from May 02, 2016

The Japanese X-ray telescope Hitomi has been declared lost after it disintegrated in orbit, torn apart when spinning out of control. The cause is still under investigation but early analysis points to bad data in a software package pushed shortly after an instrument probe was extended from the rear of the satellite. JAXA, the Japanese space agency, lost $286 million, three years of planned observations, and a possible additional 10 years of science research.

Hitomi, also known as ASTRO-H, successfully launched on February 17, 2016 but on March 26th catastrophe struck, leaving only pieces floating in space. JAXA, desperately worked to recover the satellite not knowing the extent of the failure. On April 28th they discontinued their efforts and are now working to determine the reasons for the failure, although a few weeks ago they did provide an analysis of the failure sequence at a press conference.

Can artificial intelligence create the next wonder material?

Nature News & Comment

from May 04, 2016

Some researchers believe that machine-learning techniques can revolutionize how materials science is done.

The Startup Zeitgeist

Jared Friedman, The Macro blog

from May 04, 2016

Since 2008, we’ve received tens of thousands of these applications. Collectively, they provide insights into the ideas smart people are working on and how it’s changed over time. We’ve never talked about these publicly before.

But recently, we commissioned Priceonomics (YC W12) and their data studio to analyze eight years’ worth of our anonymized application data. After breaking the applications down into keywords, they calculated the percentage of applicants that mentioned any given term.

So let’s review the data, starting with a simple example.

Spark with Tungsten Burns Brighter

KDnuggets, Paige Roberts

from May 04, 2016

Apache Spark is one of “the hottest technology” for data science and analytics. A project called Tungsten represents a huge leap forward for Spark, particularly in the area of performance. Understand how it works, and why it improves Spark performance so much.

Events

The big deal about big data: Improving national security and public policy

A lecture by Dr. Gary King, Albert J. Weatherhead III University Professor and Director of the Institute for Quantitative Social Science, Harvard University

Washington, DC. Wednesday, May 11, starting at 4 p.m., Senate Dirksen 106

The CityLab Workshop – Colocated with AAAI ICWSM 2016

This workshop aims into bringing together researchers and practitioners to discuss and explore the research challenges and opportunities in applying the web, the mobile, the pervasive and the social computing paradigm to understand cities and, crucially, engage their citizens in an effort to reclaim urban space for improving their quality of life. Our goal is to create a better understanding of cities and a living lab for understanding both technological and social phenomena. The interdisciplinary focus aims into attracting and welcomes diverse researchers from social sciences and information systems.

Tuesday, May 17, in Cologne, Germany

Inaugural Meetup of Chicago Women in Machine Learning & Data Science

The WiMLDS is a community of women interested in machine learning and data science. We meet to socialize, and to discuss machine learning and data science in an informal setting with the purpose of learning about top-notch research and technology and building a community around women in these fields.

Thursday, May 19, at Datascope (16 N. Wabash, Suite 1250)

Tools & Resources

About:Stencila

Stencila

from March 05, 2016

Stencila is a platform for creating documents that are driven by data. At present we have two types of documents,

Stencils : like traditional templates for embedding data analysis and presentation code but designed to allow what-you-see-is-what-you-get editing while still maintaining reproducibility (think RMarkdown meets Jade meets Google Docs)

Sheets : like traditional spreadsheets but with cells that are expressions in the host language (i.e. R or Python or …) and built from the ground up for transparency, testability and version control while still maintaining accessibility (think R meets Google Sheets meets git)

The Rise of Data Science Notebooks

datanami, Dan Osipov

from May 04, 2016

Interactive notebooks are experiencing a rise in popularity. How do we know? They’re replacing PowerPoint in presentations, shared around organizations, and they’re even taking workload away from BI suites (more on that later).

Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica and Matlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin, Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.

Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line?

Parallel Graph Analytics

Communications of the ACM; Andrew Lenharth, Donald Nguyen, Keshav Pingali

from May 01, 2016

Using computers to study graph properties is not new; 40 years ago, in 1976, mathematicians Kenneth Appel and Wolfgang Haken ran a FORTRAN program for 1,200 hours to prove the four-color theorem for planar graphs.a Today, we are interested in studying not only the mathematical properties of different kinds of graphs (such as planar graphs) but also in computing and exploiting properties of particular graphs that arise from big datasets. There are three main challenges in performing computations on these graphs—large graph sizes, diversity in graph structure, and the complex patterns of parallelism in graph analytics applications. Most current parallel abstractions and implementations fail to provide solutions for at least one of them.

Data Dumps | Freebase API (Deprecated)

Google Developers

from May 04, 2016

The Freebase API will be completely shut-down on Aug 31 2016. This page provides access to the last available data dump. [Shut down was first announced in December, 2014.]

How to Quantize Neural Networks with TensorFlow

Pete Warden's blog

from May 03, 2016

I’m pleased to say that we’ve been able to release a first version of TensorFlow’s quantized eight bit support. I was pushing hard to get it in before the Embedded Vision Summit, because it’s especially important for low-power and mobile devices, so it’s exciting to get it out there. All this documentation will be appearing on the main TensorFlow site also, but since I’ve talked so much about why eight-bit is important here, I wanted to give an overview of what we’ve released in this post too.

Efficient R programming

Colin Gillespie and Robin Lovelace

from May 05, 2016

This website contains text and code for the forthcoming O’Reilly book: Efficient R programming. Pull requests and general comments are welcome.

Careers

Clara – Machine Learning Scientist

Clara, Clara Labs

Sports.BradStenger.com

NYU Data Science newsletter – May 5, 2016

Leave a Comment Cancel reply