NYU Data Science newsletter – August 24, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 24, 2015

GROUP CURATION: N/A

 
Data Science News



Probably Overthinking It: The Inspection Paradox is Everywhere

Allen Downey, Chance magazine


from August 18, 2015

The inspection paradox is a common source of confusion, an occasional source of error, and an opportunity for clever experimental design. Most people are unaware of it, but like the cue marks that appear in movies to signal reel changes, once you notice it, you can’t stop seeing it.

A common example is the apparent paradox of class sizes. Suppose you ask college students how big their classes are and average the responses. The result might be 56. But if you ask the school for the average class size, they might say 31. It sounds like someone is lying, but they could both be right.

 

How Machine Learning Makes Databases Ready for Big Data

Dataversity


from August 17, 2015

… a completely new, adaptive and machine learning-based approach to database technology is gaining traction. With machine learning, any database can become dynamic and optimal. Unlike classic tree-structures, which are governed by fixed mathematical limits and static behavior, this new database science delivers flexible data structures that adapt to behavior based on observed changes in the data and operational capabilities of hardware resources. Machine learning techniques continuously optimize how that data is organized in memory and on disk according to application workload and resource capabilities.

 

Behind the Scenes of “What’s Really Warming The World?” with the Bloomberg Team (DS#59)

Data Stories


from August 20, 2015

Hi folks! We have Blacki Migliozzi and Eric Roston from Bloomberg on the show to talk about their recent data graphic piece on climate change called “What’s Really Warming The World?“.

The graphics shows, through a “scrollytelling”, what factors may influence the world’s temperature according to well established climate models and it guides your through a series of questions and visuals to see with your eyes what does correlate (spoiler: carbon emissions) and what does not. [audio, 44:46]

 

Scholars seek a lingua franca for linguistics research | Santa Fe Institute

Santa Fe Institute


from August 20, 2015

Over time, English has swirled into dialects so different that speakers from the same country cannot always understand each other. Similarly, linguists – as they have catalogued words, spellings, pronunciations, and meanings – have stylized their individual academic databases to suit the needs of their own research.

In an age of computational linguistics, that can be a problem.

 

Big Data Doesn’t Automatically Produce Better Predictions | Dart-Throwing Chimp

Dart-Throwing Chimp blog


from August 20, 2015

… more and better data don’t automatically lead to more accurate predictions. Observation and prediction are interrelated, but the latter does not move in lock step with the former. At least two things can weaken the link between those two steps in the analytical process.

 

Interview with Sherri Rose and Laura Hatfield

simply stats


from August 21, 2015

Laura Hatfield and Sherri Rose are Assistant Professors specializing in biostatistics at Harvard Medical School in the Department of Health Care Policy. Laura received her PhD in Biostatistics from the University of Minnesota and Sherri completed her PhD in Biostatistics at UC Berkeley. They are developing novel statistical methods for health policy problems.

SimplyStats: Do you consider yourselves statisticians, data scientists, machine learners, or something else?

Rose: I’d definitely say a statistician. Even when I’m working on things that fall into the categories of data science or machine learning, there’s underlying statistical theory guiding that process, be it for methods development or applications. Basically, there’s a statistical foundation to everything I do.

 

Chicago’s Data Science for Social Good project trying to predict police misconduct before it happens

WBEZ 91.5 Chicago


from August 21, 2015

Every day there are thousands of interactions between police officers and citizens across the country. While most are uneventful, a small number leave a member of the public disrespected, unprotected, harassed or — in all too many cases seen recently — hurt or even killed.

This summer, fellows with Data Science for Social Good — a program at the University of Chicago that connects data scientists with governments and nonprofits — are working to predict when officers are at risk of misconduct, the goal being to prevent problems before they happen. [audio, 10:03]

 

Newly Published: Common Framework Unifies all Food Webs

NCEAS, University of California Santa Barbara


from August 21, 2015

Ecological food web modeling has been a research focus since 1872 when Thomas Malthus suggested the theory that availability of resources constrains consumer population growth. Since then thousands of modeling studies have been conducted exploring the consumer-resource relationship. Yet, it was not until just recently that an NCEAS Working Group was able to connect the variables from classical consumer-resource models to create a mathematical expression which underlies the structure of all food webs. The NCEAS Parasites and Food Webs Working Group published their findings in Science.

 

Study Uses 311 Complaints to Track When and Where Neighborhood Conflict Emerges

NYU News


from August 21, 2015

Each year, 311 – New York City’s main hub for government information and non-emergency services – receives millions of requests and complaints, including New Yorkers’ gripes about their neighbors.

In a new study from New York University using 311 complaint data, researchers tracked when and where New Yorkers complain about their neighbors making noise, blocking driveways, or drinking in public. They found that these complaints – a defining aspect of urban life – are more likely to occur in areas sandwiched between two homogenous communities, where the boundaries between different ethnic and racial groups aren’t clearly defined.

 

Causation in Context: Brendan Frey’s Vision for Deep Genomics

Bio-IT World


from August 21, 2015

Last month, Brendan Frey got a flurry of media attention as he and his cofounders formally launched Deep Genomics, a company whose mission is to combine deep learning with genomics to model the body.

By the time Frey and I got a chance to discuss the venture, his company-genesis spiel was perfected, rolling off the tongue in one 1,200-word monologue.

But Frey isn’t a pitchman. Frey’s background is in machine learning, specifically computer vision and speech processing. About twelve years ago, Frey started asked questions about genomics, and was dissatisfied with the answers. As he dug into the literature, he found genome-wide association studies were common and the methodology struck Frey as particularly “ill-conceived.”

 

Unsupervised learning, attention, and other mysteries – O’Reilly Radar

O'Reilly Radar, David Beyer


from August 20, 2015

David Beyer: Let’s start with your background. What was the evolution of your interest in machine learning, and how did you zero-in on your Ph.D. work?

Ilya Sutskever: I started my Ph.D. just before deep learning became a thing. I was working on a number of different projects, mostly centered around neural networks. My understanding of the field crystallized when collaborating with James Martins on the Hessian-free optimizer. At the time, greedy layer-wise training (training one layer at a time) was extremely podpular. Working on the Hessian-free optimizer helped me understand that if you just train a very large and deep neural network on a lot of data, you will almost necessarily succeed.

 
Deadlines



ICLR 2016 will be held May 2-4, 2016 in the Caribe Hilton, San Juan, Puerto Rico.

deadline: subsection?

t is well understood that the performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field, and include in it topics such as deep learning and feature learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.

Despite the importance of representation learning to machine learning and to application areas such as vision, speech, audio and NLP, there was no venue for researchers who share a common interest in this topic. The goal of ICLR has been to help fill this void.

Deadline for submitting the title and abstract of conference contributions is Thursday, November 12

 

Leave a Comment

Your email address will not be published.