NYU Data Science newsletter – August 16, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 16, 2016

GROUP CURATION: N/A

Data Science News

Answer to How does Keras compare to other Deep Learning frameworks like Tensor Flow, Theano, or Torch?

Quora, Francois Chollet

from August 15, 2016

First, let’s look at Theano and TensorFlow. These two are very similar to each other, since TensorFlow is re-using many of the key ideas first laid out in Theano. Both are essentially languages for defining abstract, general-purpose computation graphs. They are not quite “deep learning frameworks”, and they can be used for much more than deep learning! Keras, on the other end, is an actual deep learning framework: a well-designed API that allows you use to build deep learning models by clipping together high-level building blocks. And since Keras runs on top of TensorFlow or Theano, there is no performance cost to using Keras compared to using the one of these lower-level frameworks.

Astronomers Are On A Celestial Treasure Hunt. The Prize? Planet Nine

NPR, Weekend Edition Saturday

from August 13, 2016

“Nobody’s actually seen the new planet. The reason astronomers think it’s out there is the strange behavior of some smallish objects in the Kuiper Belt, a collection of celestial objects orbiting in the outer reaches of the solar system.”

Innovation Dialog – How is Data Science going to change your life

Ding Ding TV, Indiana University

from August 12, 2016

Indiana University Profesor David Wild, Ying Ding and Data Science specialist Kyle Stirling discussed about how Data Science is going to change our life.

Data science has been called “the sexiest job of the 21st Century.” With a projected shortage of 140,000-190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions, data science is a rapidly growing, high-paying field.

Tales from ICML: Insights and Takeaways

KDnuggets, Anna Bethke

from August 15, 2016

This year I had the opportunity to attend the International Conference for Machine Learning (ICML) in New York City, and overall, I quite enjoyed the experience. The conference lasted six days including tutorials and workshops and had talks and posters that spanned the full range of machine learning. It was a good conference to go to in order to get a high level overview of a variety of algorithms, applications and datasets. It also was a way to meet others in the field?—?and there was certainly no lack of people to meet given the fact that over 3,200 people attended the conference. Presented here are my opinions on what I was able to take away from my time at ICML.

UW CSE researchers tap their inner Indiana Jones to unearth the history of web tracking

UW CSE News

from August 15, 2016

Researchers in UW CSE’s Security & Privacy Research Lab turned archaeologists to deliver the first comprehensive study of third-party web tracking based on a new tool, TrackingExcavator, that detects and analyzes third-party tracking behavior. UW CSE Ph.D. student Adam Lerner presented the results of the study, which examines tracking on the most popular online destinations dating back to 1996, at the USENIX Security Conference in Austin, Texas last week.

The Future Of Photography: Cutting Edge – Forbes

Forbes

from August 02, 2016

Researchers at the Computer Vision Lab at Columbia University have developed a bendable lens array and self-charging “eternal” camera.

Examining the Government-Constituent Relationship. With data. And trees.

U.S. Department of Commerce, Economics & Statistics Administration

from August 15, 2016

In 2014, Jonathan Auerbach, from Columbia University’s Statistics Department, and Christopher Eshleman, recently graduated from Columbia’s School of International and Public Affairs, set out to analyze the underlying demand patterns generated by a series of storm events which recently hit New York City. They used a combination of data from the City of New York’s open data portal and the US Census Bureau, and worked closely with the City’s Parks Department and NYC311. With this data they produced a series of statistical analyses that helped to identify communities in the City that had a higher propensity to report storm damage.

Why are these findings important? They can help enable municipal officials and policymakers to develop strategies that better target underserved communities – those less likely to report storm damage – who may be underreporting issues not because they aren’t affected, but because they aren’t aware or have adequate access to information on the services the city provides.

Through the Commerce Data Usability Project and in collaboration with the Commerce Data Service, we have presented Auerbach and Eshleman’s work as a two-part R tutorial focused on extracting insights from spatial data utilizing recent developments in Bayesian statistics.

Evolutionary Pressures on the Electronic Health Record – Caring for Complexity

Journal of the American Medical Association; Donna M. Zulman, Nigam H. Shah, Abraham Verghese

from August 15, 2016

… the evolution of EHRs has not kept pace with technology widely used to track, synthesize, and visualize information in many other domains of modern life. While clinicians can calculate a patient’s likelihood of future myocardial infarction, risk of osteoporotic fracture, and odds of developing certain cancers, most systems do not integrate these tools in a way that supports tailored treatment decisions based on an individual’s unique characteristics. Similarly, some algorithms (many developed by insurers) can identify patients at high risk for hospitalization,3 but evidence lags when it comes to using predictive analytics to deliver preventive care and services to targeted individuals. Existing EHRs also have yet to seize one of the greatest opportunities of comprehensive record systems—learning from what happened to similar patients and summarizing that experience for the treating physician and the patient.4 For instance, when a 55-year-old woman of Asian heritage presents to her physician with asthma and new-onset moderate hypertension, it would be helpful for an EHR system to find a personalized cohort of patients (based on key similarities or by using population data weighted by specific patient characteristics) to suggest a course of action based on how those patients responded to certain antihypertensive medication classes, thus providing practice-based evidence when randomized trial evidence is lacking. [full text]

Illumina Would Like You to Sequence More DNA, Please

WIRED, Science

from August 15, 2016

What do you when you’re so clearly winning? When you’ve crushed your competitors and left them fighting over crumbs? If you’re Illumina, the biotech giant whose name has become synonymous with DNA sequencing machines, you look around and put some of your extra cash in a startup trying to make better wine. Or healthier dairy cows. Or smart tampons.

These are all industries Illumina thinks can benefit from an influx of genetic sequencing, and these are all real startups that have gone through Illumina Accelerator, which nurtures young companies with cash, San Francisco office space, and access to its DNA sequencing machines. Today, Illumina is announcing the two members in the fourth round of its accelerator program: REX, a Kansas City-based animal health company and the Center of Individualized Diagnostics, a genomics center in Saudi Arabia.

We don’t understand AI because we don’t understand intelligence

Engadget, Jessica Conditt

from August 15, 2016

By 2045, belief will also be a machine thing, according to these researchers. We just need to create the most advanced AI possible, and then bam — conscious machines.

This is where they lose me.

Building an Accessible Archive for Water Column Sonar Data

Eos

from August 15, 2016

Acoustic surveys by the National Oceanic and Atmospheric Administration (NOAA) National Marine Fisheries Service (NMFS) have collected more than 30 terabytes (TB) of water column sonar data, and they continue to collect more than 10 TB every year. The NOAA National Centers for Environmental Information (NCEI), in collaboration with NMFS and the University of Colorado, have established the Water Column Sonar Data Archive to steward water column sonar data to preserve and make these data available for the next generation.

Tools & Resources

Communicating data science: Why and (some of the) how to visualize information

Kaggle, no free hunch blog;

from August 10, 2016

In this blog post, I do as so many have done before me (but with my own spin): introduce you to the why and (some of the) how of communicating information through visualization.

Scalable data science with R

O'Reilly Media, Federico Castanedo

from August 10, 2016

“In the particular case of R, data size problems usually arise when the input data do not fit in the RAM of the machine and when data analysis takes a long time because parallelism does not happen automatically. Without making the data smaller (through sampling, for example) this problem can be solved in two different ways:”

“Scaling-out vertically”, using a machine with more available RAM.

“Scaling-out horizontally”

Supplemental materials for my OReilly project, the Python Data Science Handbook

GitHub – jakevdp

from August 11, 2016

“See also the free companion project, A Whirlwind Tour of Python.”

Careers

Full-time, non-tenured academic positions

Scientific Application Developer (1+ positions), Physics

Princeton University; Princeton, NJ

Call for TA: CCT 490 (Social Data Analytics) · University of Toronto Mississauga

GitHubGist, alexhanna

Full-time positions outside academia

Data Scientist, ONE

ONE

Sports.BradStenger.com

NYU Data Science newsletter – August 16, 2016

Leave a Comment Cancel reply