Data Science newsletter – February 22, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 22, 2017

GROUP CURATION: N/A

Data Science News

The Student-Built Website That Keeps Government Climate Data Safe by Terri Hansen

YES! Magazine, Terri Hansen

from February 21, 2017

Since Trump’s election, scientists have been scrambling to save climate change data sets. And one Michigan graduate student thought the more copies, the better.

[1702.05695] Non-negative Tensor Factorization for Human Behavioral Pattern Mining in Online Games

arXiv, Computer Science > Learning; Anna Sapienza, Alessandro Bessi, Emilio Ferrara

from February 19, 2017

Multiplayer online battle arena has become a popular game genre. It also received increasing attention from our research community because they provide a wealth of information about human interactions and behaviors. A major problem is extracting meaningful patterns of activity from this type of data, in a way that is also easy to interpret. Here, we propose to exploit tensor decomposition techniques, and in particular Non-negative Tensor Factorization, to discover hidden correlated behavioral patterns of play in a popular game: League of Legends. We first collect the entire gaming history of a group of about one thousand players, totaling roughly 100K matches. By applying our methodological framework, we then separate players into groups that exhibit similar features and playing strategies, as well as similar temporal trajectories, i.e., behavioral progressions over the course of their gaming history: this will allow us to investigate how players learn and improve their skills.

Reproducible Data-Driven Discovery

Medium, Rayna M Harris

from February 20, 2017

I spent the two weeks in January hanging out with some awesome scientists who are all passionate about the future of science. I was participating in two professional development events with support from the non-profit organization, Data Carpentry, and I’d like to share some of the highlights.

It’s time to replace the internet, warns researcher

Silicon Republic, John Kennedy

from February 17, 2017

The internet needs to be replaced by something else or it will be useless when 5G comes along, says a TSSG researcher who claims to have the answer.

Microsoft Revs Up Healthcare Artificial Intelligence Projects

HealthIT Analytics

from February 20, 2017

Microsoft is getting very serious about bringing artificial intelligence into the healthcare system, launching a brand new research division and several development projects with provider groups and vendor partners.

The company, better known for its personal computing pursuits, is hoping to use the 2017 HIMSS Conference and Exhibition in Orlando as a springboard for promoting its new activities, including a partnership with the University of Pittsburg Medical Center (UPMC) focused on using AI to reduce physician burnout, improve productivity, and streamline health IT workflows.

Artificial intelligence grows a nose

Science, Latest News, Robert F. Service

from February 19, 2017

Predicting color is easy: Shine a light with a wavelength of 510 nanometers, and most people will say it looks green. Yet figuring out exactly how a particular molecule will smell is much tougher. Now, 22 teams of computer scientists have unveiled a set of algorithms able to predict the odor of different molecules based on their chemical structure. It remains to be seen how broadly useful such programs will be, but one hope is that such algorithms may help fragrancemakers and food producers design new odorants with precisely tailored scents.

This latest smell prediction effort began with a recent study by olfactory researcher Leslie Vosshall and colleagues at The Rockefeller University in New York City, in which 49 volunteers rated the smell of 476 vials of pure odorants. For each one, the volunteers labeled the smell with one of 19 descriptors, including “fish,” “garlic,” “sweet,” or “burnt.” They also rated each odor’s pleasantness and intensity, creating a massive database of more than 1 million data points for all the odorant molecules in their study.

An Online Tool to Predict Gentrifying Neighborhoods

CityLab, Tanvi Misra

from February 17, 2017

“I know it when I see it,” is as true for gentrification as it is for pornography. Usually, it’s when a neighborhood’s property values and demographics are already changing that the worries about displacement set in—rousing housing advocates and community organizers to action. But by that time, it’s often hard to pause, and put in safeguards for the neighborhood’s most vulnerable residents.

But what if there was an early warning system that detects where price appreciation or decline is about to occur? Predictive tools like this have been developed around the country, most notably by researchers in San Francisco. And their value is clear: city leaders and non-profits pinpoint where to preserve existing affordable housing, where to build more, and where to attract business investment ahead of time. But they’re often too academic or too obscure, which is why it’s not yet clear how they’re being used by policymakers and planners.

That’s the problem Ken Steif, at the University of Pennsylvania, is working to solve, in partnership with Alan Mallach, from the Center for Community Progress.

Building privacy right into software code

The Conversation, Jean Yang

from February 20, 2017

I discovered that many of today’s most common languages make it difficult for programmers to protect users’ privacy and security. It’s bad enough that this state of affairs means programmers have lots of opportunities to make privacy-violating errors. Even worse, it means we users have trouble understanding what computer programs are doing with our information – even as we increasingly rely on them in our daily lives.

Fully Invested – Andrew Lo harnesses the tools and technologies of finance for the common good

MIT Spectrum magazine, Ken Shulman

from February 20, 2017

A few years ago, while preparing to participate in a roundtable conference on finance, Andrew Lo happened upon the website of the American Psychological Association. “Their mission statement focuses on applying their knowledge of psychology for the benefit of society,” says Lo, financial economist, hedge fund manager, and the Charles E. and Susan T. Harris Professor at the MIT Sloan School of Management. “Reflexively, I compared it to the mission statement for the American Finance Association, which simply focuses on what we do and how we do it. It was quite a contrast. And I began to reflect, as a financial economist, on what our true mandate was.”

Lo has spent the better part of his professional life exploring that mandate.

An AI Hedge Fund Created a New Currency to Make Wall Street Work Like Open Source

WIRED, Business, Cade Metz

from February 21, 2017

Wall Street is a competition, a Darwinian battle for the almighty dollar. Gordon Gekko said that greed is good, that it captures “the essence of the evolutionary spirit.” A hedge fund hunts for an edge and then maniacally guards it, locking down its trading data and barring its traders from joining the company next door. The big bucks lie in finding market inefficiencies no one else can, succeeding at the expense of others. But Richard Craib wants to change that. He wants to transform Wall Street from a cutthroat competition into a harmonious collaboration.

This morning, the 29-year-old South African technologist and his unorthodox hedge fund, Numerai, started issuing a new digital currency—kind of. Craib’s idea is so weird, so unlike anything else that has preceded it, that naming it becomes an exercise in approximation. Inspired by the same tech that underpins bitcoin, his creation joins a growing wave of what people in the world of crypto-finance call “digital tokens,” internet-based assets that enable the crowdsourcing of everything from venture capital to computing power. Craib hopes his particular token can turn Wall Street into a place where everyone’s on the same team. It’s a strange, complicated, and potentially powerful creation that builds on an already audacious arrangement, a new configuration of technology and money that calls into question the market’s most cherished premise. Greed is still good, but it’s better when people are working together.

Based in San Francisco, Numerai is a hedge fund in which an artificially intelligent system chooses all the trades.

If AI Can Fix Peer Review in Science, AI Can Do Anything

WIRED, Science, Nick Stockton

from February 21, 2017

Recently, a competition called ScienceIE challenged teams to create programs that could extract the basic facts out of sentences in scientific papers, and compare those to the basic facts from sentences in other papers. “The broad goal of my project is to help scientists and practitioners gain more knowledge about a research area more quickly,” says Isabelle Augenstein, a post-doctoral AI researcher at University College of London, who devised the challenge.

That’s a tiny part of artificial intelligence’s biggest challenge: processing natural human language. Competitors designed programs to tackle three subtasks: reading each paper and identifying its key concepts, organizing key words by type, and identifying relationships between different key phrases.

The big ‘Engine’ that very well might – The Boston GlobeThe big ‘Engine’ that very well might – The Boston Globe

The Boston Globe, Scott Kirsner

from February 17, 2017

Last week, MIT said it had hired Katie Rae, a cofounder of the Boston venture capital firm Project 11 Ventures and the former managing director of the TechStars Boston accelerator program, to lead The Engine.

It’s a big gig that has the potential to spawn new business clusters in Boston, building on the region’s strength in life sciences, hardware and software for corporate use, and robotics.

But it could also be one of the more political posts in Boston, with the need to serve lots of different constituencies, from MIT president Rafael Reif and treasurer Israel Ruiz, to the outside investors who put money into The Engine’s new fund, to a board of directors, advisory committee, and numerous other committees formed to have a voice into how The Engine operates.

Ornithologists are using drones to eavesdrop on songbirds

TechCrunch, Lara Kolodny

from February 18, 2017

When conservationists put drones to work in field research, they typically function as flying eyes that gather imagery of the habitat and wildlife below. Now, ornithologists from Gettysburg College in Pennsylvania are using drones as flying ears to monitor songbirds in the Appalachian Mountains.

Results of their drone study were published in the peer-reviewed journal The Auk: Ornithological Advances this week. The study concluded that data gathered by drones was about as effective as data gathered by human experts on the ground in deriving an accurate population estimate of songbirds. The full study, “The feasibility of counting songbirds using unmanned aerial vehicles,” was authored by Gettysburg College environmental studies professor Andy Wilson with two undergraduate students in his lab, Janine Barr and Megan Zagorski.

How a College Kid Made His Honda Civic Self-Driving for $700

MIT Technology Review, Tom Simonite

from February 21, 2017

Brevan Jorgenson’s grandma kept her cool when he took her for a nighttime spin in the Honda Civic he’s modified to drive itself on the highway. A homemade device in place of the rear-view mirror can control the brakes, accelerator, and steering, and it uses a camera to identify road markings and other cars.

“She wasn’t really flabbergasted—I think because she’s seen so much from technology by now,” says Jorgenson, a senior at the University of Nebraska, Omaha. Others are more wary of the system, which he built using plans and software downloaded from the Internet, plus about $700 in parts. Jorgenson says the fact that he closely supervises his homebrew autopilot hasn’t convinced his girlfriend to trust the gadget’s driving. “She’s worried it’s going to crash the car,” he says.

How Body Labs is using machine learning and AI to alter the future of fashion

Built In NYC, Taylor Majewski

from February 14, 2017

In 2002, Dr. Michael Black was researching how to create statistical models of the human body and preparing to teach a course on computer vision at Brown University. However, before the course began, the Virginia state police contacted Black with hopes of utilizing his research to identify a perpetrator in a robbery and murder case. Black took the opportunity to change his course syllabus to focus on identifying human beings through computer vision techniques and, ultimately, the class’s research helped to confirm the perpetrator’s height.

It also became the basis for his next venture — Body Labs.

Events

MIT Endeavor + Y Combinator: CEO Innovation Seminar and Office Hours

MIT Endeavor

from February 24, 2017

Cambridge, MA Friday, February 24, starting at 5:30 p.m., Stata Center. [free, registration required]

NYU Computer Science Department Colloquium

NYU Computer Science Department

from February 27, 2017

New York, NY Presenter: Mark Dredze from Johns Hopkins University on Compositional Models for Information Extraction, February 27, 2 p.m., 60 Fifth Avenue, Room 150. [free]

No-Bullshit Data Science

Szilard Pafka

from February 28, 2017

Santa Monica, CA Presentation by by Szilard Pafka, February 28, at 6:30 p.m., Headspace (2415 Michigan Avenue) [free, registration required]

Behind the process of designing with AI

designing with AI meetup, New York City

from March 02, 2017

New York, NY Speakers: Nitzat Hermon and Grant Custer, Thursday, March 2, at 7 p.m., ustwo (26 Broadway, 16th Floor) [rsvp required]

Deadlines

Deep Learning Summer School and Reinforcement Learning Summer School

Montreal, Quebec, Canada Programs run June 26-July 1 and July 2-4. Deadline to apply is March 20.

NYU Center for Data Science News

Big Data, Big Questions: How Does Tumblr’s Graph-Based Topic Modeling Work?

NYU Center for Data Science

from February 21, 2017

What makes Tumblr stand apart from other social media platforms lies in the unique way its users communicate with each other. Each user has their own highly customizable blog where they can post and share content—like articles, images, GIFs, or videos—or re-post content published by another user. Sharing and re-posting content is not only key to how social connections are formed, but also how trending and popular topics are established, since the user must tag each post that they publish.

But with over 335 million microblogs, how can Tumblr keep track of which topics are most popular? While handling Tumblr’s massive data set is already a challenge, another part of the problem is interpretability. For example, one user may tag an image of Pikachu as ‘Pokemon’, while another may tag it as ‘Pokemon Go!’

Daniel Duncan awarded NSF Doctoral Dissertation Research Improvement Grant

New York University, Department of Linguistics

from February 21, 2017

Dan Duncan has won an NSF Doctoral Dissertation Research Improvement Grant. The title of his project is Language Variation and Change in the Geographies of Suburbs.

Tools & Resources

IoT Analytics: How to Move from Data Integration to Real-Time Intelligence

RTInsights

from February 20, 2017

“Wringing value from an IoT project often requires integrating devices, business systems, and databases. A company that wants to optimize sales and supply chain, for instance, will need to have point-of-sale data, warehouse data, and shipping data.”

“But integration has been cited as one of the top and most costly barriers to adopting IoT analytics.”

phrasemachine

GitHub – slanglab

from February 20, 2017

“Have you ever tried using word counts to analyze a collection of documents? Lots of important concepts get missed, since they don’t appear as single words (unigrams). For example, the words ‘social’ and ‘security’ don’t fully represent the concept ‘social security.'”

#Dataviz tutorial: Mapping San Francisco home prices using R

Urban Spatial, Ken Steif

from February 20, 2017

“My colleague Simon and I recently worked together on a machine learning model of gentrification using Census data throughout the U.S. In that piece we discuss the importance of parcel level data.”

“We thought we’d revisit the subject here using higher resolution data and report on our findings by way of data visualization.”

Working with Fashion Models

Lyst Engineering Blog, Eddie Bell

from February 21, 2017

Fashion is a visual medium so it makes sense for our models of fashion to include visual features. One typical use-case at Lyst is ordering a set of products in accordance with some critera. Most retailers use human experts to order products manually (so called ‘merchandising’) but Lyst has so many new products a day that this process must be automated and personalised for each user.

We hypothesize that when a user browses products they primarily make visual judgements based on the images rather than based on the textual description. If we order the products using only textual features then it will be hard to match user expectations and replicate the manual merchandising process. — to do this we need image feature

Careers

Internships and other temporary positions

PhD Studentship in Data Science: Measuring and Predicting Human Behaviour with Online Data

University of Warwick; Coventry, England

Technology Fellow

Ford Foundation; New York, NY

Postdocs

Postdoctoral Scholar – Research Associate at USC

University of Southern California, Information Sciences Institute; Marina Del Ray, CA

Full-time positions outside academia

Community Development Lead

Software Carpentry; Austin, TX

Full-time, non-tenured academic positions

Strategic Programme Director – Economic Data Science

The Alan Turing Institute; London, England

Sports.BradStenger.com

Data Science newsletter – February 22, 2017

Leave a Comment Cancel reply