NYU Data Science newsletter – September 1, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 1, 2016

GROUP CURATION: N/A

 
Data Science News



Editorial: At the Crossroads: Lessons and Challenges in Computational Social Science | Interdisciplinary Physics

Frontiers in Physics; Javier Borge-Holthoefer, Yamir Moreno and Taha Yasseri


from August 29, 2016

We are nowadays at a crossroads, at which different approaches converge. We name such crossroads computational social science (CSS) : a new discipline that can offer abstracted (simplified, idealized) models and methods (mainly from statistical physics), large storage, algorithms and computational power (computer and data science), and a set of social hypotheses together with a conceptual framework for the results to be interpreted (Social Science) [3–5]. Despite its youth, the field is developing rapidly in terms of contents (articles, books, etc.), but also institutionally—either under the form of labs, institutes, and academic programs; or as consolidated events and scientific gatherings.

This “work-in-progress” spirit is reflected as well in this volume: the call was launched in late 2014 and 10 articles were eventually accepted and published, including reviews—a look behind—, one methods paper, and six original contributions—a look ahead—introducing a broad range of research, from models with a strong analytical flavor to data-driven problems.

 

G.E., the 124-Year-Old Software Start-Up – The New York Times

The New York Times


from August 27, 2016

It may not qualify as a lightning-bolt eureka moment, but Jeffrey R. Immelt, chief executive of General Electric, recalls the June day in 2009 that got him thinking. He was speaking with G.E. scientists about new jet engines they were building, laden with sensors to generate a trove of data from every flight — but to what end?

That data could someday be as valuable as the machinery itself, if not more so. But G.E. couldn’t make use of it.

“We had to be more capable in software,” Mr. Immelt said he decided. Maybe G.E. — a maker of power turbines, jet engines, locomotives and medical-imaging equipment — needed to think of its competitors as Amazon and IBM.

 

Do You Believe in God, or Is That a Software Glitch?

The New York Times, SundayReview, Kate Murphy


from August 27, 2016

… when you divide the brain into bitty bits and make millions of calculations according to a bunch of inferences, there are abundant opportunities for error, particularly when you are relying on software to do much of the work. This was made glaringly apparent back in 2009, when a graduate student conducted an fM.R.I. scan of a dead salmon and found neural activity in its brain when it was shown photographs of humans in social situations. Again, it was a salmon. And it was dead.

This is not to say all fM.R.I. research is hooey. But it does indicate that methods matter even when using whiz-bang technology. In the case of the dead salmon, what was needed was to statistically correct for false positives that arise when you make so many comparisons between voxels.

 

Tolerance Used to Predict Infection’s Death Toll

Quanta Magazine, Emily Singer


from August 30, 2016

Scientists have developed new ways to forecast who will bounce back from disease by studying not just the way the immune system fights infections, but how the body repairs itself.

 

How a Japanese cucumber farmer is using deep learning and TensorFlow

Google Cloud Big Data and Machine Learning Blog, Kaz Sato


from August 31, 2016

It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes. … Makoto learned very quickly that sorting cucumbers is as hard and tricky as actually growing them. “Each cucumber has different color, shape, quality and freshness,” Makoto says.

 

EXCLUSIVE: New York City’s biggest buildings cut energy use, greenhouse gas emissions

New York Daily News


from August 24, 2016

Thousands of the city’s biggest buildings have slashed greenhouse gas emissions by 8% and energy use by 6%, a new report found.

The 3,000 buildings measured made the cuts from 2010, when the city started requiring large buildings to report their energy use, and 2013.

“Energy use has been going up in this country since Edison started delivering it to customers in New York,” said Cecil Scheib of the Urban Green Council, one of the authors of the report. “We have something that always got worse, and for the first time in a century it’s getting better, and that’s amazing.”

 

New Yorkers Will Pay $56 A Month To Trim A Minute Off Their Commute

FiveThirtyEight, Carl Bialik


from August 29, 2016

How much would you be willing to pay to shave a minute off your commute? For New Yorkers, the answer appears to be around $56 per month. That’s how much more New Yorkers pay in rent, on average, for a one-bedroom apartment that’s a minute closer by subway to Manhattan’s main business districts.

That finding — based on our analysis of data from the real estate listing service StreetEasy, which provided more than 175,000 rental prices from 2015 listings for New York City homes — puts an approximate value on the old real estate adage about the importance of location, location, location. The higher rents show how much New Yorkers are willing to pay for convenience — and how those who can’t afford to live close to jobs are getting pushed to the edges of the city and must spend more of their time on the subway, leaving less time for leisure and sleep.

 

Civil rights, tech groups sound the alarm about ‘predictive policing’

TheHill


from August 31, 2016

“Predictive policing tools threaten to provide a misleading and undeserved imprimatur of impartiality for an institution that desperately needs fundamental change,” they said. … The message was led by the Leadership Conference on Civil & Human Rights and signed by organizations including the NAACP, the American Civil Liberties Union and the Electronic Frontier Foundation. It accompanies a report from the tech policy consultancy Upturn on predictive policing practices.

 

[1608.08225] Why does deep and cheap learning work so well?

arXiv, Condensed Matter > Disordered Systems and Neural Networks; Henry W. Lin, Max Tegmark


from August 29, 2016

We show how the success of deep learning depends not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through “cheap learning” with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics. The exceptional simplicity of physics-based functions hinges on properties such as symmetry, locality, compositionality and polynomial log-probability, and we explore how these properties translate into exceptionally simple neural networks approximating both natural phenomena such as images and abstract representations thereof such as drawings. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to renormalization group procedures. Various “no-flattening theorems” show when these efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss – even for linear networks.

 

Inferring urban travel patterns from cellphone data

MIT News


from August 29, 2016

In the latest issue of the Proceedings of the National Academy of Sciences, researchers from MIT and Ford Motor Company describe a new computational system that uses cellphone location data to infer urban mobility patterns. Applying the system to six weeks of data from residents of the Boston area, the researchers were able to quickly assemble the kind of model of urban mobility patterns that typically takes years to build.

The system holds the promise of not only more accurate and timely data about urban mobility but the ability to quickly determine whether particular attempts to address cities’ transportation needs are working.

 

UC Berkeley launches Center for Human-Compatible Artificial Intelligence

University of California-Berkeley, Berkeley News


from August 29, 2016

UC Berkeley artificial intelligence (AI) expert Stuart Russell will lead a new Center for Human-Compatible Artificial Intelligence, launched this week. … The primary focus of the new center is to ensure that AI systems are beneficial to humans, he said.

 
Events



Fall 2016 BIDS Data Science Lecture Series Lineup



Berkeley, CA; University of California – Berkeley, Doe Library Fridays, 1:10–2:30 pm [free]
 

Data Hackathons: Lessons Learned & Best Practices Workshop



Denver, CO; West Big Data Innovation Hub (WBDIH), RDA, and Federation of Earth Science Information Partners (ESIP) September 15, 2016 2-5 pm [free]
 

Data: It’s The Real Thing



Atlanta, GA This conference will provide a holistic perspective on how marketers can utilize data. BBI will be sponsoring the social data presentation and we’re excited to be a part of this amazing event. — Thursday, November 17, at Goizueta Auditorium, Coca-Cola Headquarters.
 

Python in Astronomy 2017



Leiden, Netherlands The third edition of the Python in Astronomy Conference will be held at the Lorentz Center from 8 – 12th May 2017.
 
Deadlines



Call for Papers: Workshop on Fairness, Accountability, and Transparency in Machine Learning

deadline: Conference

New York, NY “When algorithms make
automated decisions that can affect our lives so profoundly, how do we make
sure that their decisions are fair, verifiable, and accountable? This workshop will explore how to integrate these concerns into machine learning and how to address them with computationally rigorous methods.” — Friday, November 18 at Columbia University.

Submission Deadline: September 9, 2016

 

Call for Papers – ChASM 2016

deadline: Conference

Bellevue, WA “The 4th edition of the Workshop on Computational Approaches to Social Modeling (ChASM) will be colocated with SocInfo’16, taking place in Bellevue, WA on Monday-Thursday, November 14-17.”

The deadline for submitting papers is Monday, September 12.

 
CDS News



Orientation for the incoming class of @NYUDataScience students. Amazing class of ~80 selected from 1300 applicants.

Twitter, Claudio Silva


from August 31, 2016

 
Tools & Resources



A Data-Cleaning Tool for Building Better Prediction Models

Columbia University, Data Science Institute


from August 31, 2016

Big data sets are full of dirty data, and these outliers, typos and missing values can produce distorted models that lead to wrong conclusions and bad decisions, be it in health care or finance. With so much at stake, data cleaning should be easier.

That’s the inspiration for software developed by computer scientists at Columbia University and University of California at Berkeley that hands much of the dirty work over to machines. Called ActiveClean, the system analyzes a user’s prediction model to decide which mistakes to edit first, while updating the model as it works. With each pass, users see their model improve.

 

Basic quality assurance and control, and data manipulation in spreadsheets

Data Carpentry, Christie Bahlai and Aleksandra Pawlik


from September 01, 2015

When you have a well-structured data table, you can use several simple techniques within your spreadsheet to ensure the data you enter is free of errors. These approaches include techniques that are implemented prior to entering data (quality assurance) and techniques that are used after entering data to check for errors (quality control).

 

Adversarial_Video_Generation: A TensorFlow Implementation of “Deep Multi-Scale Video Prediction Beyond Mean Square Error” by Mathieu, Couprie & LeCun.

GitHub – dyelax


from August 21, 2016

This project implements a generative adversarial network to predict future frames of video, as detailed in “Deep Multi-Scale Video Prediction Beyond Mean Square Error” by Mathieu, Couprie & LeCun. Their official code (using Torch) can be found here.

Adversarial generation uses two networks – a generator and a discriminator – to improve the sharpness of generated images. Given the past four frames of video, the generator learns to generate accurate predictions for the next frame. Given either a generated or a real-world image, the discriminator learns to correctly classify between generated and real. The two networks “compete,” with the generator attempting to fool the discriminator into classifying its output as real. This forces the generator to create frames that are very similar to what real frames in the domain might look like.

 

TensorFlow.jl: A Julia wrapper for TensorFlow

GitHub – malmaud


from August 31, 2016

A wrapper around TensorFlow, a popular open source machine learning framework from Google.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor, Biostatistics (2 openings)
 

Twin Cities, MN; School of Public Health, University of Minnesota
 

Leave a Comment

Your email address will not be published.