NYU Data Science newsletter – May 12, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 12, 2016

GROUP CURATION: N/A

Data Science News

Introducing Ozlo

Medium, Team Ozlo

from May 08, 2016

Today, I’m excited to introduce you to Ozlo, a faster and better way to find information from your phone. Ozlo is a personal AI (Artificial Intelligence). While he’s still very young, with your help, he will soon grow to become the digital companion people turn to everyday for help.

The Internet of Things is a security nightmare, warns EFF

TechCrunch

from May 09, 2016

A panel discussion on finding a balance between security and privacy here at Disrupt New York 2016 touched on various aspects of a complex topic, including strategies for securing customer data and the big risks posed as more types of devices come online.

How can startups best lock down customer data? By not having access to it in the first place, suggested Nate Cardozo, senior staff attorney for digital rights organization the Electronic Frontier Foundation.

Also:

DC Internet of Things Summit on May 17 in Washington DC (AFCEA Washington)

Highlights from PRS2016 workshop

The Netflix Tech Blog

from May 05, 2016

Personalized recommendations and search are the primary ways Netflix members find great content to watch. We’ve written much about how we build them and some of the open challenges. Recently we organized a full-day workshop at Netflix on Personalization, Recommendation and Search (PRS2016), bringing together researchers and practitioners in these three domains. It was a forum to exchange information, challenges and practices, as well as strengthen bridges between these communities. Seven invited speakers from the industry and academia covered a broad range of topics, highlighted below.

The NYPD Was Systematically Ticketing Legally Parked Cars for Millions of Dollars a Year- Open Data Just Put an End to It

I Quant NY

from May 11, 2016

How many spots received 5 or more of these pedestrian ramp tickets in the last 2.5 years? We are talking 1,966 spots that are generating about 1.7 million dollars a year in tickets at parking spots that are mostly legal. Are all 1966 spots legal? Surely not, but the majority sure are and many more that have fewer than five tickets are likely legal too.

Rochester’s first data science graduates delve into big data

University of Rochester, NewsCenter

from May 10, 2016

This spring, the inaugural crop of students in Rochester’s new data science programs—at the undergraduate and master’s degree levels—are completing their first year of study at the Goergen Institute for Data Science, a program of the Schools of Arts & Sciences. Some are completing their degrees, having finished a year of graduate study or having gotten started early by taking prerequisites for an undergraduate degree. Others are just beginning. But what unites them all is an enthusiasm for discovering ways to make the most of information.

“We want to teach the students the fundamentals—computer science and statistics—and then let them explore some of the applications in different areas. That’s what I think is unique,” says Henry Kautz ’87 (PhD), the Robin and Tim Wentworth Director of the Goergen Institute for Data Science and a professor of computer science. He notes that programs emerging at other schools are often more specialized—dedicated to business analytics or health analytics, for example—but that the Rochester program provides a strong foundation for every area of focus.

Hannaneh Hajishirzi (UW): Learning to Read, Ground, and Reason in Multimodal Text

YouTube, UW AI Research

from May 09, 2016

Web data, news, and textbooks offer informative but unstructured multimodal text. The ability to translate multimodal text into a semantic representation that is amenable to further reasoning is a key step toward taming information overload, one of the fundamental problems in modern AI. A core challenge is to do robust, scalable, context-aware semantic analysis and reasoning on multimodal text. Designing systems that can understand and use multimodal text requires multiple interconnected components: semantic interpretation, multimodal alignment, knowledge acquisition, and reasoning.

Backed by Amazon and Paul Allen, KITT.AI launches first ‘hotword detection’ software toolkit

GeekWire

from May 11, 2016

KITT.AI wants to help developers add voice activation features to almost any device for free.

The Seattle startup today unveiled its first software toolkit called Snowboy, which lets developers add verbal “hotword detection” to devices. It’s the same technology that tech giants like Amazon and Apple use for products like Alexa and Siri, but now KITT.AI is enabling anyone to easily add the functionality to their own hardware.

Processing Digital Research Data

SAA Electronic Records Section, bloggERS!

from May 11, 2016

The University of Illinois at Urbana-Champaign’s library-based Research Data Service will be launching an institutional data repository, the Illinois Data Bank (IDB), in May 2016. The IDB will provide University of Illinois researchers with a repository for research data that will facilitate data sharing and ensure reliable stewardship of published data. The IDB is a web application that transfers deposited datasets into Medusa, the University Library’s digital preservation service for the long-term retention and accessibility of its digital collections. Content is ingested into Medusa via the IDB’s unmediated self-deposit process.

As we conceived of and developed our dataset curation workflow for digital datasets ingested in the IDB, we turned to archivists in the University Archives to gain an understanding of their approach to processing digital materials.

An Entrepreneur Who Didn’t Ditch Hard Copies | Rising Stars

OZY

from May 11, 2016

Kuang Chen was a grad student when he descended on Dar es Salaam in a programmer’s cape, ready to save the day. In Tanzania on summer break from his doctoral program at UC Berkeley and with a plan to make clunky old nonprofits less inefficient, Chen figured he’d write a few SQL queries and be a hero.

But as any idealist could have guessed, that’s not exactly how things went. Chen’s eagerness and competence encountered some of the daily realities of work in the developing world. At his health clinic job, Chen found his colleagues were collecting massive amounts of data on paper as they dispatched community health workers to quiz residents of nearby towns and villages about their health practices. What’s a coder to do? Write an app that makes the whole thing move faster, right?

Wrong. Chen went the other way — in the direction of paperwork. Mounds and mounds of handwritten reports and surveys, some more legible than others, contained useful, even invaluable, data. But inputting all that data by hand was achingly slow and expensive, and rarely done. So that’s where Chen focused, on the chasm between the chickenscratch and the computer; he built system that bridges them easily. Today, that system is a high-tech startup called Captricity.

Introducing FBLearner Flow: Facebook’s AI backbone

Facebook Code, Engineering Blog

from May 09, 2016

Many of the experiences and interactions people have on Facebook today are made possible with AI. When you log in to Facebook, we use the power of machine learning to provide you with unique, personalized experiences. Machine learning models are part of ranking and personalizing News Feed stories, filtering out offensive content, highlighting trending topics, ranking search results, and much more. There are numerous other experiences on Facebook that could benefit from machine learning models, but until recently it’s been challenging for engineers without a strong machine learning background to take advantage of our ML infrastructure. In late 2014, we set out to redefine machine learning platforms at Facebook from the ground up, and to put state-of-the-art algorithms in AI and ML at the fingertips of every Facebook engineer.

Join the Human-Centred Machine Learning Community

Human Centred Machine Learning at CHI 2016

from May 11, 2016

Please enter your email to receive links to online publication of workshop outcomes (including videos), announcement of journal special issue plans, links to Facebook group, etc.

Also please see http://hcml2016.goldsmithsdigital.com for workshop proceedings and discussion notes.

Amazon open-sources its own deep learning software, DSSTNE

VentureBeat, Jordan Novet

from May 11, 2016

Amazon is not the most active technology company in the realm of open source. Facebook or Google would be better candidates for that honor. But Amazon supplies a reason for this move in a frequently asked questions (FAQ) page included in the repo:

We are releasing DSSTNE as open source software so that the promise of deep learning can extend beyond speech and language understanding and object recognition to other areas such as search and recommendations. We hope that researchers around the world can collaborate to improve it. But more importantly, we hope that it spurs innovation in many more areas.

Also, in corporate Open Source-ing:

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source (May 12, Google Research Blog, Slav Petrov)

Introducing FBLearner Flow: Facebook’s AI backbone (May 09, Facebook Code, Engineering Blog)

Open Source at Bloomberg: Introducing BuckleScript (May 12, Bloomberg)

Events

DATA BY THE BAY

Spanning 150 talks over 5 days on May 16-20, Data By the Bay 2016 is a by-data engineers, for-data engineers developer and data scientist conference. … Data By the Bay is the first five-day, seven-conference matrix, or Data Grid conference. It consists of vertical tracks, corresponding to application areas such as NLP, IoT, Life Sciences, UX, Business Workflows, Data for Democracy and Government.

San Francisco, CA Training workshops on Sunday, May 15, and the conference starts on Monday, May 16.

Tools & Resources

How HDBSCAN Works

Jupyter Notebook Viewer, lmcinnes

from May 06, 2016

HDBSCAN is a clustering algorithm developed by Campello, Moulavi, and Sander. It extends DBSCAN by converting it into a hierarchical clustering algorithm, and then using a technique to extract a flat clustering based in the stability of clusters. The goal of this notebook is to give you an overview of how the algorithm works and the motivations behind it. In contrast to the HDBSCAN paper I’m going to describe it without reference to DBSCAN. Instead I’m going to explain how I like to think about the algorithm, which aligns more closely with Robust Single Linkage with flat cluster extraction on top of it.

swoopyDrag add-on for D3

GitHub – 1wheel

from March 12, 2016

swoopyDrag helps you hand place annotations on d3 graphics. It takes an array of objects representing annotations and turns them into lines and labels. Drag the text and control circles below to update the annotations array.

Building and scaling the Fastly network, part 1: Fighting the FIB

Fastly

from May 11, 2016

This post is the first in a series detailing the evolution of network software at Fastly. We’re unique amongst our peers in that from inception, we’ve always viewed networking as an integral part of our product rather than a cost center. We rarely share what we do with the wider networking community however, in part because we borrow far more from classic systems theory than contemporary networking practice.

Careers

DataKind – Data Scientist

DataKind

Researcher (or postdoc) job: Climate change biology computation and visualization

University of Washington, Department of Biology, Buckley Lab

Software Engineer, Data Science Systems

Cloudera

Senior Research Fellow in Data Intensive Science

University of Portsmouth, Institute of Cosmology and Gravitation

Sports.BradStenger.com

NYU Data Science newsletter – May 12, 2016

Leave a Comment Cancel reply