NYU Data Science newsletter – February 9, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for February 9, 2016

GROUP CURATION: N/A

 
Data Science News



At the Forefront of Urban Informatics, Constantine Kontokosta Is Widely Recognized for His Pioneering Work

NYU Tandon School of Engineering


from February 08, 2016

Assistant Professor Constantine Kontokosta, who also serves as the Deputy Director for Academics at the NYU Center for Urban Science and Progress (CUSP), is an expert in the field of Urban Informatics—which focuses on gathering, visualizing, and analyzing data on factors such as traffic volume, noise levels, energy use, and mobility in order to help make cities more efficient, livable, and sustainable. At CUSP and Tandon, he heads the Quantified Community Research Lab—a groundbreaking project that is making Hudson Yards, the 28-acre, 20-million square foot “city-within-a-city” on the west side of Manhattan, the nation’s first fully instrumented and quantified neighborhood lab; he’s since expanded the Quantified Community initiative with projects in Lower Manhattan and the Red Hook section of Brooklyn.

 

Why Big Data Isn’t Paying Off for Companies (Yet)

Fortune, Anne Fisher


from February 05, 2016

Roughly 80% of large companies report they’ve seen an important strategic decision go haywire in the past three years because it was based on “flawed” data. Almost three-quarters (72%) say that delays in getting information to the right people have torpedoed “at least one” major effort in the same period.

There’s more. Just 27% of C-level executives think their company makes “highly effective” use of data, while about a third (32%) say access to mountains of information has actually “made things worse.”

That’s according to a new study from the American Institute of CPAs (AICPA) and its London sister organization, the Chartered Institute of Management Accountants.

 

Author Interview: Kelsey Stilson on Gnarly Rhino Bones

PLOS Paleo Community


from February 05, 2016

… Given massive changes in the rhino body plan over time, it stands to reason what we should see changes in their medical condition, too. If a big animal evolved from a smaller ancestor, you might predict that the big animal could be susceptible to joint problems, because its body plan wasn’t perfectly adapted for the extra weight.

A new study by Kelsey Stilson and colleagues investigates this topic, using a sample of hundreds of bones from six extinct rhino species of various body sizes, as well as modern black rhinoceros. The authors documented anomalies along joint surfaces (e.g., the joints between individual toe bones), and mapped their distribution across the various species. As predicted, larger rhinos generally show more joint issues (although other factors, such as locomotion style and habitat, may be involved too). So, any “positive” of being bigger, or changing locomotion styles, or whatever, has accompanying costs in skeletal wear-and-tear.

 

Defer to the Algorithm

Alex Tabarrok, Marginal REVOLUTION


from February 06, 2016

A BuzzFeed article predicts that Twitter will soon move from a time-ordered feed to an algorithmic feed, one that shows you tweets that it predicts you will like before it show you lesser-ranked tweets. Naturally, twitter exploded with outrage that this is the end of twitter. … It is peculiar that people are more willing trust their physical lives to an algorithm than their twitter feed. Is the outrage real, however, or will people soon take the algorithm for granted? How many people complaining about algorithmic twitter don’t use junk-email filters?

 

In Galaxy Clustering, Mass May Not Be the Only Thing That Matters

Carnegie Mellon University, CMU News


from January 25, 2016

An international team of researchers, including Carnegie Mellon University’s Rachel Mandelbaum, has shown that the relationship between galaxy clusters and their surrounding dark matter halo is more complex than previously thought. The researchers’ findings, published in Physical Review Letters today (Jan. 25), are the first to use observational data to show that, in addition to mass, a galaxy cluster’s formation history plays a role in how it interacts with its environment.

There is a connection between galaxy clusters and their dark matter halos that holds a great deal of information about the universe’s content of dark matter and accelerating expansion due to dark energy. Galaxy clusters are groupings of hundreds to thousands of galaxies bound together by gravity, and are the most massive structures found in the universe. These clusters are embedded in a halo of invisible dark matter. Traditionally, cosmologists have predicted and interpreted clustering by calculating just the masses of the clusters and their halos. However, theoretical studies and cosmological simulations suggested that mass is not the only element at play — something called assembly bias, which takes into account when and how a galaxy cluster formed, also could impact clustering.

 

MIT spin-off nuTonomy secures $3.6M to fund self-driving car technology; delivering more human-like experience

Green Car Congress


from February 02, 2016

nuTonomy Inc., a developer of advanced software for autonomous vehicles, closed a $3.6M seed funding round, with investments from Signal Ventures, Samsung Ventures, Fontinalis Partners and Dr. Steven LaValle. The funding will help support nuTonomy’s continued work in the US as well as in Singapore, where industry and government are jointly piloting autonomous vehicle technology.

 

The Impact of Machine Learning on IT Departments

CTOvision.com


from February 04, 2016

In much the same way businesses have been eager to use big data analytics to improve their operations, many companies have paid a lot of interest to the growing field of machine learning. Unlike some other tech trends that have come and gone, machine learning appears to be more than just some fad. The recent rise in the number of machine learning SaaS solutions launched in 2015 shows that this development has genuine staying power. Machine learning products are on the rise as well, and as organizations become more familiar with the concept, the demand will almost certainly increase. While much has been made over what affects machine learning can have on the world and businesses in particular, it’s equally important to look at the impact it may have on IT departments. Even for businesses that might not consider themselves tech companies, their IT departments will certainly see some big changes as a result of machine learning, will could easily affect the rest of the business.

 

Scientists say the ways humans describe nature transcends culture and geography

Quartz


from February 04, 2016

Even very different languages seem to organize the world in surprisingly similar ways.

Researchers with the non-profit Santa Fe Institute studied 81 different languages and found that they tend to draw similar connections between objects in nature, despite differences in culture and geography. People connect sea and salt, for example, whether they live on the coast or far inland. As you’ll see in the video above, the research could shed new light on the evolution of language, and the people who speak it. [video, 2:11]

 
Deadlines



HCOMP 2016 Papers headsup

deadline: subsection?

The Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016) will be held October 30 – November 3, 2016, Austin, TX, USA.

The deadline for full papers will be due around May 15.

 
Tools & Resources



Cleaning the Imagenet Dataset, collected notes

Dave Andersen, Dave's Data blog


from February 07, 2016

As part of my sabbatical at Google, I spent the last month working on processing images from the
Imagenet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset using Tensorflow. (Note that I’ve linked to the ’14 dataset because it contains the image blacklist I discuss below, but the it has the same classification images as the ’12 dataset).

As is well-known enough that there’s an entire subreddit dedicated to it, cleaning data before feeding it into a machine learning system is both time-consuming and somewhat annoying. Despite being a curated “challenge” dataset, it turns out that ILSVRC’12 needs cleaning as well. Much of this is known already among people who use the dataset, but with the recent explosion in popularity of machine and deep learning, I figured I’d put my collected notes here to save others the time.

Without further ado, the ILSVRC 2014 Data Gotchas…

 

Probabilistic Programming and Bayesian Methods for Hackers

Jupyter Notebook Viewer, Addison-Wesley, Cameron Davidson-Pilon


from December 30, 2015

Welcome to Bayesian Methods for Hackers. The full Github repository is available at github/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers. The other chapters can be found on the project’s homepage. We hope you enjoy the book, and we encourage any contributions!

 

Build your own neural network classifier in R

Weebly, Jun Ma


from February 06, 2016

Image classification is one important field in Computer Vision, not only because so many applications are associated with it, but also a lot of Computer Vision problems can be effectively reduced to image classification. The state of art tool in image classification is Convolutional Neural Network (CNN). In this article, I am going to write a simple Neural Network with 2 layers (fully connected). First, I will train it to classify a set of 4-class 2D data and visualize the decision boundary. Second, I am going to train my NN with the famous MNIST data (you can download it here: https://www.kaggle.com/c/digit-recognizer/download/train.csv) and see its performance. The first part is inspired by CS 231n course offered by Stanford: http://cs231n.github.io/, which is taught in Python.

 

Create easy animations with ggplot2

GitHub – dgrtwo/gganimate


from February 08, 2016

gganimate wraps the animation package to create animated ggplot2 plots. It can be installed using devtools:

devtools::install_github(“dgrtwo/gganimate”)

The core of the approach is to treat “frame” (as in, the time point within an animation) as another aesthetic, just like x, y, size, color, or so on. Thus, a variable in your data can be mapped to frame just as others are mapped to x or y.

 

Leave a Comment

Your email address will not be published.