NYU Data Science newsletter – November 24, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 24, 2015

GROUP CURATION: N/A

 
Data Science News



Test-Driven Data Analysis

Test-Driven Data Analysis blog


from November 05, 2015

A dozen or so years ago I stumbled across the idea of test-driven development from reading various posts by Tim Bray on his Ongoing blog. It was obvious that this was a significant idea, and I adopted it immediately. It has since become an integral part of the software development processes at Stochastic Solutions, where we develop our own analytical software (Miró and the Artists Suite) and custom solutions for clients. But software development is only part of what we do at the company: the larger part of our work consists of actually doing data analysis for clients. This has a rather different dynamic.

Fast forward to 2012, and a conversation with my long-term collaborator and friend, Patrick Surry, during which he said something to the effect of: So what about test-driven data analysis?

 

Created by statisticians for statisticians: How R took the world of statistics by storm – Statistics Views

Statistics Views


from November 19, 2015

Any particular day seems to be fit to change history. For instance, on May 5, 1976, Argentine former football player Juan Pablo Sorín was born. He played for many teams, such as River Plate, Cruzeiro, Paris Saint-Germain and Barcelona [1]. On that very same Wednesday of 1976, David Bowie held a concert at the Empire Pool in London, just a few years before it changed its name to Wembley Arena [2]. And as a matter of fact, another event took place on that same day: May 5, 1976. It was an event that would trigger one of statistics most notorious revolutions, one that we are still witnessing up to this day.

It was an informal meeting of just five people, who gathered at Murray Hill, New Jersey [3]. They all worked at Bell Labs and gathered to discuss ideas for designing a system for statistical computing, or perhaps adopting an existing system for their research work. Ideally the system would be used by around 20 statisticians who worked at Bell Labs, or so the team thought. Back in the 1970s, computers were, of course, far different from what we have now and statistical systems were also still in a rudimentary phase. At that time, the statistical analysis group of Bell Labs used mostly a library of Fortran-based software, designed to produce simulations, data analysis and graphs, though these tools were too restricted for their research purposes. They needed something new, but it wasn’t until after a month of meetings that the team decided to go ahead and create a new system.

 

The Information Theory of Life | Quanta Magazine

Quanta Magazine


from November 19, 2015

The polymath Christoph Adami is investigating life’s origins by reimagining living things as self-perpetuating information strings.

 

Linguistic Obfuscation in Fraudulent Science

Journal of Language & Social Psychology; David M. Markowitz and Jeffrey T. Hancock


from November 08, 2015

The rise of scientific fraud has drawn significant attention to research misconduct across disciplines. Documented cases of fraud provide an opportunity to examine whether scientists write differently when reporting on fraudulent research. In an analysis of over two million words, we evaluated 253 publications retracted for fraudulent data and compared the linguistic style of each paper to a corpus of 253 unretracted publications and 62 publications retracted for reasons other than fraud (e.g., ethics violations). Fraudulent papers were written with significantly higher levels of linguistic obfuscation, including lower readability and higher rates of jargon than unretracted and nonfraudulent papers. We also observed a positive association between obfuscation and the number of references per paper, suggesting that fraudulent authors obfuscate their reports to mask their deception by making them more costly to analyze and evaluate. This is the first large-scale analysis of fraudulent papers across authors and disciplines to reveal how changes in writing style are related to fraudulent data reporting.

 

The Last Days Of Marissa Mayer?

Forbes, Miguel Helft


from November 19, 2015

The late October offsite meeting for more than 120 of Yahoo YHOO +0.00%’s top executives at the Park Central Hotel in San Francisco started well enough. The first day jumped from strategy sessions to upcoming product discussions. But on day two, when the topic shifted to employee engagement, and with CEO Marissa Mayer in and out of the room, things went downhill fast. When Bryan Power, Yahoo’s head of h.r., glossed over results from a recent employee survey that showed dramatic double-digit drops in metrics like morale and trust in the company’s executive leadership, various vice presidents began venting to one another. Those murmurs of discontent erupted into outright heckling when another session—billed as an opportunity to improve communication—turned into a lecture from Yahoo’s top brass that many found patronizing. Vice presidents started calling out their superiors for “not listening,” “not understanding” and “not being interested in changing.” Some cursed.

“It was the most stressful and acrimonious professional meeting I’ve ever attended,” says one participant.

 

The Algorithm That Creates Diets That Work for You

The Atlantic, Ed Yong


from November 19, 2015

Take a slice of cake and cut it in two. Eat one half, and let a friend scoff the other. Your blood-sugar levels will both spike, but to different degrees depending on your genes, the bacteria in your gut, what you recently ate, how recently or intensely you exercised, and more. The spikes, formally known as “postprandial glycemic responses” or PPGR, are hard to forecast since two people might react very differently to exactly the same food.

But Eran Elinav and Eran Segal from the Weizmann Institute of Science have developed a way of embracing that variability. By comprehensively monitoring the blood sugar, diets, and other traits of 800 people, they built an algorithm that can accurately predict how a person’s blood-sugar levels will spike after eating any given meal.

 

Data Science Skills To Boost Your Salary

InformationWeek, Big Data Analytics blog


from November 17, 2015

Are you a data scientist wondering how your compensation stacks up to your peers? Or are you considering a career shift to data science? Here’s a look at how much you can expect to earn.

 

NeuralTalk and Walk

Vimeo, Kyle McDonald


from November 21, 2015

Andrej Karpathy’s “NeuralTalk” code github.com/karpathy/neuraltalk2 slightly modified to run from a webcam feed. I recorded this live while walking near the bridge at Damstraat and Oudezijds Voorburgwal in Amsterdam.

All processing is done on my 2013 MacBook Pro with the NVIDIA 750M and only 2GB of GPU memory. I’m walking around with my laptop open pointing it at things, hence the shaky footage and people staring at themselves. The openFrameworks code for streaming the webcam and reading from disk is available at gist.github.com/kylemcdonald/b02edbc33942a85856c8

 

Our Simulator Can Assess Your N.F.L. Team’s Playoff Chances – The New York Times

The New York Times, The Upshot blog, Justin Wolfers


from November 22, 2015

… At the moment, there are 109 games remaining in the season, which, without counting ties, translates into over 649 nonillion possible ways the season could end. (That’s 649 followed by 30 zeros.) Yes, it’s good if your team wins all its remaining games. But then what? … we built an interactive tool to help us understand.

 

A Planet of Cities

Santa Fe Institute, Christian Science Monitor


from November 17, 2015

… What’s really happening within each city is a massive exchange of information across social and economic networks of people and organizations, all taking place on a complex infrastructural landscape of buildings, roads, pipes, and wires. For the most part there is no maestro; the properties of cities emerge from countless interactions of millions of people, driven by their individual goals and motivations.

In this sense, cities are an outstanding example of complex adaptive systems: collections of individual constituents (people, in this case) that interact in myriad ways, usually mediated by some sort of network.

 
Events



The Secret Science Club: Smashing! Monday, November 23, 8PM, Secret Science Club presents Particle Physicist Kyle Cranmer @ the Bell House, FREE!



Smashing! Monday, November 23, 8PM, Secret Science Club presents Particle Physicist Kyle Cranmer @ the Bell House, FREE!

Take a spin through the science at the world’s largest and most powerful particle accelerator with physicist Kyle Cranmer—from the discovery of the elusive Higgs boson (one of the fundamental particles that make up the Universe) to the ongoing search for dark matter. After a two-year break, the atom-smashing Large Hadron Collider is now back for Season 2—running at nearly double its previous energy, hurling 600 million protons together every second, and reaping juicy, cosmos-hacking data.

 

HITLAB Innovators Summit — HITLAB Innovators Summit



The HITLAB Innovators Summit brings together leaders in public health, medicine, technology, and design to discuss health issues and solutions in a two-day live event. The Summit offers a series of carefully curated talks, panels, and collaboration opportunities to engage the most creative minds in health and technology.

Thursday-Friday, December 3-4, at Columbia University

 

2016 International Conference on Computational Social Science



This 2nd Annual International Conference on Computational Social Science is an interdisciplinary event designed to engage a broad community of researchers – academics, industry experts, open data activists, government agency workers, and think tank analysts – dedicated to advancing social science knowledge through computational methods.

Evanston, IL Thursday-Sunday, June 23-26, at Northwestern University [$$$]

 
Deadlines



Postdoctoral Scholar – Data Science – Berkeley Institute for Data Science (BIDS)

deadline: subsection?

The Berkeley Institute for Data Sciences (BIDS) is seeking two postdoctoral scholars to work on the research and development activities funded by the grant titled “Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science”. The Jupyter and IPython Projects are a set of open-­­source software tools for interactive and exploratory computing, developed at UC Berkeley, Cal Poly, Simula Research Lab, Southampton University and industry partners. These software projects support
reproducible and collaborative scientific computing and data science across a wide range of programming languages (Python, Julia, R, etc.). The main application offered by Project Jupyter is the Jupyter Notebook (https://try.jupyter.org), a web-­­based, interactive computing platform that allows users to perform data cleaning, data analysis, statistical modelling, numerical simulation and data visualization. We have over 2 million users worldwide across a wide array of technical fields.

Deadline for next applications’ review is Monday, December 14.

 

Alan Turing Fellowships

deadline: subsection?

This is a unique opportunity for early career researchers to join The Alan Turing Institute. The Alan Turing Institute (ATI) is the UK’s new national data science institute, established to bring together world-leading expertise to provide leadership in the emerging field of data science. The Institute has been founded by the universities of Cambridge, Edinburgh, Oxford, UCL and Warwick and EPSRC.

Fellowships are available for 3 years with the potential for an additional 2 years of support following interim review. Fellows will pursue research based at the Institute hub in the British Library, London. Fellowships will be awarded to individual candidates and fellows will be employed by a joint venture partner university (Cambridge, Edinburgh, Oxford, UCL or Warwick).

Deadline for an submitting application is Sunday, December 20.

 

Leave a Comment

Your email address will not be published.