NYU Data Science newsletter – November 12, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 12, 2015

GROUP CURATION: N/A

Data Science News

[1511.02222] Deep Kernel Learning

arXiv, Computer Science > Learning

from November 06, 2015

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

Visualising Networks Part 1: A Critique

Fabien Goubet, Mathematics and Digital Humanities blog

from November 11, 2015

Thanks to the facilitated access to network analysis tools and the growing interest in many disciplines towards studying the relations structuring datasets, networks have become ubiquitous objects in science, in newspapers, on tech book covers, all over the Web, and to illustrate anything big data-related (hand in hand with word clouds.). Unfortunately, the resort to networks has reached a point where in a conference I heard a speaker say:

“Since this is mandatory, here is a network visualisation of these data. Sorry if you cannot see anything in this big hairball.“

Tor Says Feds Paid Carnegie Mellon $1M to Help Unmask Users | WIRED

WIRED, Security

from November 11, 2015

Ever since a Carnegie Mellon talk on cracking the anonymity software Tor was abruptly pulled from the schedule of the Black Hat hacker conference last year, the security community has been left to wonder whether the research was silently handed over to law enforcement agencies seeking to uncloak the internet’s anonymous users. Now the non-profit Tor Project itself says that it believes the FBI did use Carnegie Mellon’s attack technique—and paid them handsomely for the privilege.

The Tor Project on Wednesday afternoon sent WIRED a statement from its director Roger Dingledine directly accusing Carnegie Mellon of providing its Tor-breaking research in secret to the FBI in exchange for a payment of “at least $1 million.”

How the NFL Serves 32 Teams with a Single Dashboard in the Cloud | Tableau Software

Tableau Software

from November 11, 2015

The National Football League has a huge fan base that pledges to 32 different teams. In order to serve each team and its fans, the league needs to know what each one wants and needs. In other words, the league has to make sense of data—and lots of it.

The NFL collects data from traditional websites, mobile websites, and mobile apps. To make sense of the numbers, the teams had to sift through multiple spreadsheets and a dizzying array of reports. The league knew there was a better way.

“One of our primary challenges was to visualize this data,” says Allison Brown, a certified Tableau Jedi who works on the digital media team at the NFL.

Predictive Analytics is Dominating Retail

Celect

from November 11, 2015

The concept of “optimizing” product assortments in brick and mortar stores isn’t a new one. For years, retail merchandisers, buyers, and planners have been braving a world of spreadsheets with never ending rows of product classes, styles, prices, etc. Recently, many in the retail and data analytics professions are discovering the promise of predictive analytics for retail.

Why thousands of veterans are donating their DNA to science

Stat

from November 11, 2015

… The [2-story tall] freezer, in fact, is at the heart of one of the most ambitious projects ever undertaken to understand our DNA. The Department of Veterans Affairs is gathering blood from 1 million veterans and sequencing their DNA. At the same time, computer scientists are creating a database that combines those genetic sequences with electronic medical records and other information about veterans’ health.

The ultimate goal of the project, known as the Million Veteran Program, is to uncover clues about disorders ranging from diabetes to post-traumatic stress disorder.

Artificial Intelligence for Email Overload: Knowmail’s Israel to Boston Move | BostInno

BostInno

from November 11, 2015

The CEO of Knowmail, an Israeli startup that’s trying to solve email overload with artificial intelligence, has relocated to Boston with plans to grow a local office to as many as 20 people in 2016.

Knowmail is focused on developing AI approaches to improve corporate messaging, to help workers prioritize emails.

Psychologists’ betting market hints at most reliable research findings

Nature News & Comment

from November 09, 2015

When psychologists are asked which findings in their field will pass a replication attempt, their predictions prove little better than flipping a coin. But if they play a betting market on the same question, the aggregate results give a fairly good prediction of which studies will hold up if repeated, a prediction experiment suggests1.

CDS News

Did Rubio win the last Republican debate? Here’s what we learned from Twitter.

The Washington Post, Monkey Cage blog; Andy Guess, Jonathan Nagler and Joshua Tucker

from November 10, 2015

… closer look at publicly available Twitter data shows that while that view gained traction over the course of the night, post-debate impressions were likely also shaped by the emerging consensus of media analysts the morning after.

The distinction between the spontaneous reactions of debate watchers and the narrative imposed by political analysts is important. Real-time Twitter data allows us to potentially disentangle these effects, and to see how different groups of people responded to the same events on screen. Using tweets about the debate that we collected at the New York University Social Media and Political Participation (SMaPP) lab and merged with our unique data on Twitter users’ ideology, we took a closer look at how people responded to the debate — both as it was happening and after it ended.

Sports.BradStenger.com

NYU Data Science newsletter – November 12, 2015

Leave a Comment Cancel reply