NYU Data Science newsletter – June 19, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 19, 2015

GROUP CURATION: N/A

 
Data Science News



Software Development Practices in Academia

Software Carpentry


from June 18, 2015

Derek Groen, Xiaohu Guo, James Grogan, Ulf Schiller, and James Osborne have just submitted a paper to arXiv.org titled “Software development practices in academia: a case study comparison”.

 

Building self-service tools to monitor high-volume time-series data

O'Reilly Radar, Ben Lorica


from June 18, 2015

One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to a large extent prompted by my discussions with engineers and entrepreneurs building monitoring tools for IT operations. In many ways, data centers are perfect laboratories in that they are controlled environments managed by teams willing to instrument devices and software, and monitor fine-grain metrics.

During a recent episode of the O’Reilly Data Show Podcast, I caught up with Phil Liu, co-founder and CTO of SignalFx, a SF Bay Area startup focused on building self-service monitoring tools for time series. We discussed hiring and building teams in the age of cloud computing, building tools for monitoring large numbers of time series, and lessons he’s learned from managing teams at leading technology companies.

 

A Unified Approach to Measurement Error and Missing Data: Overview and Applications

Gary King


from June 18, 2015

Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a comp

 

Gaming Your Brain

ESPN The Magazine


from June 15, 2015

… Every time you play a game, its maker is, in all probability, recording everything you do and using that information to keep you playing longer and spending more.

These techniques are related to those seen in the gambling industry, where slot machines are designed to create a sense of total immersion in the game, so players continue to hit the button for as long as possible. That’s particularly true for more repetitive freemium games, like Candy Crush, by the design studio King. “These games allow you to zone out,” says Natasha Dow Schull, an MIT professor who has been researching the links between freemium video games and slots. “They all follow this high-hit-frequency/low-volatility model that puts you into a trance where your worries drop away. It’s a mechanism for escape even more than it is a mechanism for excitement.”

 

What we learnt from organizing the first data science camp for kids.

Medium, Varun Aggarwal


from June 18, 2015

Two weeks back, we decided to experiment teaching data science to grade 6th-9th kids! We think it is important to introduce students to thinking in a data-driven way early on in their lives; also kids are way more fun than higher-ed students, so it was an easier choice for us to make!

 

Yes, androids do dream of electric sheep

The Guardian


from June 18, 2015

Google sets up feedback loop in its image recognition neural network – which looks for patterns in pictures – creating hallucinatory images of animals, buildings and landscapes which veer from beautiful to terrifying.

 

The inside story of Answers: How six people built the #2 mobile analytics tool in just a few months

Wayne Chang


from June 02, 2015

… When we sat down for coffee, I could feel the passion Jeff had for his small side project — an SDK that would make it easier for app developers to uncover why their apps were crashing.

My first thought was that this problem should’ve already been solved, but the market hadn’t made developers lives any easier since they started building mobile apps. Considering nearly 10 percent of all app store reviews mentioned the word “crash”, I told Jeff that this didn’t sound like a side project — this was a startup.

It was then that Crashlytics was born.

 

Scientists fly drones to map sagebrush for wildfire strategy

Associated Press


from June 17, 2015

Scientists have deployed drones over western Idaho to map a little-known landscape as part of an effort to reduce wildfire risks and protect sage grouse and other wildlife across the West.

 

This Is Twitter’s Top Secret Project Lightning

BuzzFeed News


from June 18, 2015

The new tool will give the company a way to show the best of Twitter to both logged-in and logged-out users on a variety of platforms. It will have instantly loading videos and rich images, and will occupy a front-and-center location in an upcoming version of the Twitter app.

 
Events



Leveraging Big Data and Predictive Knowledge to Fight Disease



Drug development is entering an era of precision medicine that is centered on the analysis of massive amounts of data. The ability to integrate, interrogate, model and interpret biological, chemical, pharmacological, genomic and clinical data holistically is key to making more effective and truly personalized medicines to fight disease. Researchers are using innovative technologies and computational techniques to develop predictive knowledge for the identification of promising new treatments, new therapeutic uses for existing molecules, patients who are good candidates for particular clinical trials or treatment protocols, and population signals of adverse drug reactions. This symposium explores the many uses of big data and predictive knowledge to guide drug development and clinical trials.

Tuesday, July 28, at 8:30 a.m., The New York Academy of Sciences

 
CDS News



Who is the most conservative Republican candidate for president?

The Washington Post, Wonkblog


from June 16, 2015

Pablo Barberá analyzes Republican candidates for President in Wonkblog:

The Republican field is crowded, which implies that primary voters have little information about where some of the candidates stand. That is particularly the case this season, with a few relatively unknown contenders who lack legislative experience or a long history of campaign contributions that would allow researchers to precisely identify where they stand on the liberal-to-conservative political dimension.

However, one characteristic all candidates share is that they have active and popular Twitter accounts. And as I showed in an article published earlier this year in the journal Political Analysis — now freely available online as an Editors’ Choice article — it is possible to analyze the candidates’ Twitter networks to compute precise ideological scores and thus identify how conservative or liberal each of them is.

 

Leave a Comment

Your email address will not be published.