NYU Data Science newsletter – June 18, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 18, 2015

GROUP CURATION: N/A

Data Science News

Link: Visualization Publication Data Collection | eagereyes

Robert Kosara, eagereyes blog

from June 17, 2015

People from Georgia Tech, INRIA, University of Stuttgart, and other institutions have put together a comprehensive dataset of all papers presented at Vis/VisWeek/VIS since 1990. This was first collected for a set of visualizations last year, but has been updated with the 2014 data. They intend on keeping it up to date.The dataset contains not just titles, authors, etc., but also unique authors (for cases where there are differences in spelled-out names, use of middle names, etc.), and even citations within the dataset. The data is kept in a Google Doc, so anybody can help improve the quality.

[1506.02575] Software Carpentry get more done in less time

arXiv, Computers and Society

from June 08, 2015

The aim of this study was to investigate if participants of Software Carpentry (SC) get more done in less time. We asked 32 questions to assess 24 former participants to analyse if SC gave them the computing skills to accomplish this. Our research shows that time was already saved during the workshop as it could shorten the learning process of new skills. A majority of participants were able to use these new skills straight away and thus could speed up their day to day work.

Facial-recognition talks collapse over privacy issues

BBC News

from June 17, 2015

Privacy campaigners have walked out of talks aimed at creating a code of conduct for companies keen to use facial-recognition technology.

Big Data in the Big Apple – web version

Capital City Foundation

from June 11, 2015

The lessons London can learn from New York’s data-driven approach to smart cities

The Unreasonable Effectiveness of Random Forests — Medium

Medium, Ahmed El Deeb

from June 17, 2015

It’s very common for machine learning practitioners to have favorite algorithms. It’s a bit irrational, since no algorithm strictly dominates in all applications, the performance of ML algorithms varies wildly depending on the application and the dimensionality of the dataset. And even for a given problem and a given dataset, any single model will likely be beaten by an ensemble of diverse models trained by diverse algorithms anyway. But people have favorites nevertheless. Some like SVMs for the elegance of their formulation or the quality of the available implementations, some like decision rules for their simplicity and interpretability, and some are crazy about neural networks for their flexibility.

My favorite out-of-the-box algorithm is (as you might have guessed) the Random Forest, and it’s the second modeling technique I typically try on any given data set (after a linear model).

Twitter is joining Google and Facebook in the artificial intelligence arms race

Business Insider

from June 17, 2015

The changing of the guard in Twitter’s executive suite hasn’t put a halt to the company’s M&A activity.

On Wednesday, Twitter announced that it has acquired Whetlab, a startup focused on machine learning technology.

MIT-Singapore design center creates free software tool to analyze cities as spatial networks

MIT News

from June 16, 2015

Network analysis — the mathematical analysis of relationships between elements or actors in a complex system — has become popular among transportation planners and spatial analysts, but its use remains relatively limited among architects and urban designers, whose day-to-day work demands more visioning than analysis.

Now, researchers at the joint MIT-SUTD International Design Center (IDC) have created a free network analysis plugin for Rhinoceros 3-D modeling software, one of the most popular software platforms among architects and urban designers. The new Urban Network Analysis (UNA) plugin enables urban planners and architects to describe spatial patterns of cities using mathematical network analysis methods.

Google Cloud Platform Blog: A look inside Google’s Data Center Networks

Google Cloud Platform blog

from June 17, 2015

… for the past decade, we have been building our own network hardware and software to connect all of the servers in our datacenters together, powering our distributed computing and storage systems. Now, we have opened up this powerful and transformative infrastructure for use by external developers through Google Cloud Platform.

Today at the 2015 Open Network Summit, we are revealing for the first time the details of five generations of our in-house network technology.

The new Google Trends is a real-time news detection system | Poynter.

Poynter, MediaWire

from June 17, 2015

Think of the new Google Trends update as the difference between scanning yesterday’s headlines and scrolling through Twitter.

Users who looked at Google Trends two weeks ago would have seen a snapshot of trending Google searches based on data that was hours old. The new version, announced today, displays the latest search trends using up-to-the-minute data, said Simon Rogers, data editor at Google’s News Lab.

Sports.BradStenger.com

NYU Data Science newsletter – June 18, 2015

Leave a Comment Cancel reply