Data Science newsletter – May 10, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 10, 2017

GROUP CURATION: N/A

 
 
Data Science News



Why UX Design For Machine Learning Matters

Fast Company, Caroline Sinders


from

Machine learning is going to radically change product design. But what is the future of machine learning? Is it the singularity, flying cars, voiceless commands, or an Alexa that can actually understand you? Before we can even get to that part–the grand futurism part–I want to offer a provocation: Machine learning won’t reach its potential–and may actually cause harm–if it doesn’t develop in tandem with user experience design.


NYU Stern Launches New Focused One-Year MBA Programs: The Tech MBA and Fashion & Luxury MBA

Business Wire, NYU Stern School of Business


from

New York University Stern School of Business is launching a new category of specialized MBA, optimized for MBA seekers who are firmly committed to business careers either in technology or in fashion and luxury. Stern’s new Tech MBA and Fashion & Luxury MBA will enable students to build applicable knowledge, gain relevant real-world experience and earn their MBA in one year.


Microsoft invests in Agolo, a startup that’s fighting information overload with automated summarizations

VentureBeat, Paul Sawers


from

Agolo, a New York-based startup that’s setting out to help companies fight information overload through AI-powered summarizations, has closed a $3.5 million seed round of funding co-led by Microsoft’s corporate investment fund, Microsoft Ventures, and CRV, with participation from Point72 Ventures and Franklin Templeton.


NYC’s New Tech to Track Every Homeless Person in the City

WIREd, Business, Issie Lapowsky


from

hink of StreetSmart as a customer relationship management system for the homeless. Every day in New York, some 400 outreach workers walk the streets checking in on homeless people and collecting information about their health, income, demographics, and history in the shelter system, among other data points. The workers get to know this vulnerable population and build trust in the hope of one day placing them in some type of housing.


80 AI Startups at GTC Show How They’re Changing Industries

NVIDIA Blog, Kimberly Powell


from

Better medicine with AI. A deep learning stock picker. Self-driving trucks.

That’s just a sample of the groundbreaking work by nearly 80 AI startups at the GPU Technology Conference this week.

Get a close-up look at how they’re changing industries by coming to any of 40-plus talks, checking out the exhibits and even attending an awards gala honoring the winners of our $1.5 million AI startup competition.


Facebook’s new AI aims to destroy the language barrierEngadgetEngadgetsavesharesavesharesavesharesavesharesaveshareear iconeye icontext file

Engadget, Rob LeFebvre


from

Language translation has typically been done by recurrent neural networks (RNN), which process language one word at a time in a linear order, either right-to-left or left-to-right, depending on the language. This CNN-based architecture pays attention to words farther along in a sentence to help understand the meaning from context farther along the string of words, much like humans do. While the older RNN method has been typically fine for end users in regards to speed and accuracy, there’s a functional limit to the tech, one which the parallel processing model of CNNs can address. This is the first time a CNN has outperformed the more traditional RNN techniques. Facebook hopes to use the new methodology to scale its translation efforts to cover “more of the world’s 6,500 languages.”

Now that the popular social network has chosen CNN translation processing architecture, it will be interesting to see what comes next. Fast, accurate language translation might make our world feel a little smaller and more connected without the barrier of language in the way. The impact of this new technology will likely be felt globally, especially across the many Facebook-owned apps that help connect us all, like Messenger, WhatsApp and Instagram.


Drivers of insecticide use

University of California-Santa Barbara, The UCSB Current


from

A new study by UC Santa Barbara scientists has overcome that obstacle. Using detailed data from roughly 13,000 fields observed from 2005 to 2013 in Kern County, California, Ashley Larsen and Frederik Noack parsed the different effects of landscape characteristics such as crop diversity, field size and cropland extent on insecticide use. Their findings appear in the Proceedings of the National Academy of Sciences.

“If we are to minimize the negative effects of insecticides on human and environmental health, it is critical to understand if and how we can leverage landscape features to reduce insecticide use,” said lead author Larsen, an assistant professor in UCSB’s Bren School of Environmental Science & Management. “Unlocking the different landscape components across multiple crops and spatial scales provides a novel understanding of which characteristics are likely to increase or decrease insecticide use for specific crops.”


Statistics and Data Science Center holds inaugural conference

MIT News


from

The first annual Statistics and Data Science Center Day (SDSCon) at MIT highlighted a variety of research projects, including efforts to better understand gene editing, climate change, microcredit programs, international trade, and recommendation systems. The common thread of all of these diverse research areas is that researchers can use statistics and data science to learn about and accurately model different systems — leading to insights into how the systems work, as well as the ability to make better-informed decisions and policies.

SDSCon 2017 was hosted by the Statistics and Data Science Center (SDSC), which is part of the MIT Institute for Data, Systems, and Society. The April 21 conference was the first of what will be an annual celebration of statistics and data science, bringing together a growing community at MIT and beyond.


Mayor Turner supports Houston data science institute, urges collaboration

University of Houston, Daily Cougar, C. McRae Peavy


from

Mayor Sylvester Turner, a University of Houston alumnus, endorsed the building of a collaborative institute for data science on the 300-plus acre plot of land purchased in Houston by the University of Texas System in November 2015.

The data science institute idea was proposed in a UT System task force report that was obtained by the Houston Chronicle in March. According to the article, Houston leaders and UT alumni were enthusiastic about the project.


Anaconda Joins Forces with Leading Companies to Further Innovate Open Data Science

Anaconda, Continuum Analytics


from

In addition to announcing the formation of the GPU Open Analytics Initiative with H2O and MapD, today, we are pleased to announce an exciting collaboration with NVIDIA, H2O and MapD, with a goal of democratizing machine learning to increase performance gains of data science workloads. Using NVIDIA’s Graphics Processing Unit (GPU) technology, Anaconda is mobilizing the Open Data Science movement by helping teams avoid the data transfer process between Central Processing Units (CPUs) and GPUs and move toward their larger business goals.


X-Ray of the oceans

National Geographic, Flowing Data blog


from



Using satellite data and spatial models, researchers estimate human influence in the ocean. Darker means more impact.


AI everywhere

TechCrunch, Darrell Etherington


from

Nvidia’s Jensen Huang is one of those rare CEOs who has, for a remarkably long and storied period, been at the helm of a company he helped create. In 1993, at the age of 30, he co-founded Nvidia and has occupied the top executive spot ever since. What began as a provider of relatively niche graphics processing units (GPUs) with a narrow field of general computing uses has evolved to become, arguably, the bedrock underlying the current AI market explosion.

As Nvidia gears up for its eighth annual GPU Technology Conference (GTC), which happens May 8-11 in San Jose, the company has a lot to celebrate. Its stock price hit record highs this year, its tech was everywhere at CES, and in addition to general AI applications, it’s found a new collection of deep-pocketed partners among automakers looking to usher in autonomous driving using neural networks powered by GPUs.


X-ray imaging and computer modeling help map electric properties of nanomaterials

Argonne National Laboratory


from

“Our approach reveals a wealth of information about the underlying mechanisms that regulate the transfer of energy in such materials, as well as how stable these materials are under extreme conditions,” said Argonne computational scientist and co-author Subramanian Sankaranarayanan.

“Using experimental data, we make informed models which in turn make predictions at space and time scales that experiments cannot reach,” said Mathew Cherukara, the lead author of the study.


How Close Are We Really To Connecting Human Minds To Artificial Intelligence?

GE Reports, Rajesh P. N. Rao&James Wu


from

Brain-computer interfacing is a hot topic in the tech world, with Elon Musk’s announcement of his new Neuralink startup. Here, researchers separate what’s science from what’s currently still fiction.


People Lie, But Search Data Tell the Truth

Bloomberg, Peter R. Orszag


from

Seth Stephens-Davidowitz, a former research assistant of mine, would not strike most people as a revolutionary. Yet in his new book Everybody Lies, he argues persuasively for a mutiny in social science.

The problem should be familiar to anyone who’s followed political polling in the past few years, despite the successful predictions of Emmanuel Macron’s victory in France. Put simply, most people tend to lie on surveys and on social media, too. As a result, when we study people’s responses to surveys or what they say on social media, we come up with a misleading picture.

Rather than disparage surveys and social media posts, Stephens-Davidowitz points to a different way of understanding ourselves. In the ostensible privacy of online searching, he argues, we inadvertently reveal ourselves, and this digital truth serum offers the best way of finding out who we really are.


Why Investors Should Trust Hard Data More

Bloomberg View, Charles Lieberman


from

There’s always a critical trade-off between accuracy and timeliness. We experience large data revisions in the U.S., partly because our data collection agencies wish to provide the timeliest information, even if they must revise later. The first gross domestic product report for each quarter is released near the very end of the first month of the following quarter, which is fairly prompt. But then, it is revised in the second month and again in the third month. By the second revision, it is old news and not considered very relevant. It is also revised a year later and may be revised again many times, even decades after it is first released, sometimes changing quite significantly our perception of how the economy performed in the past


Web inventor Tim Berners-Lee on ‘fake news’ and why net neutrality is so important

Business Insider, Welt


from

Axel Springer CEO Mathias Döpfner sat down with Sir Tim Berners-Lee, the inventor of the World Wide Web, for a far-ranging conversation on the future of the internet, fake news, net neutrality, and the rising tide of censorship in countries across the world.


Government Data Science News

The National Health Service in Britain has been attacked with ransomware, impacting at least 25 medical facilities, including hospitals. Doctors cannot access patient records or, presumably, critical digital diagnostic tools. (Where is the sphygmomanometer? Are nurses comfortable taking blood pressure manually?) The same attack is also impacting Spain’s Telefónica and Russia’s MegaFon phone companies. The attack has hit 11 countries so far. According to the New York Times, “the computers all appeared to be hit with the same ransomware, and similar ransom messages demanding about $300 to unlock their data” stemming from “a vulnerability that was discovered and developed by the National Security Agency“. The Times covered the NSA breach last August, placing tentative blame on Russian intelligence.

An aptly timed investigative report on the impact of 2bn euro cybersecurity spending bluntly states that the funds “are mostly good for one thing: filling the coffers of the security industry…. Only rarely did a project lead to concrete, sellable technology. Most of them stopped at a prototype, a study, a report, or a wiki page.” Priorities may need to be reconsidered.

Closer to home, Sam Biddle at The Intercept reports that highly sensitive documents detailing the hardware design of WindsorGreen, a program likely designed to brute-force crack encrypted data (like passwords), were left on the web, in the clear. No encryption. No password. NYU Tandon looks to be the guilt-stanky partner. The documents were hosted on servers in the Institute for Mathematics and Advanced Supercomputing, though the article stops short of assigning blame. I bet if this had happened at IBM or the NSA, the other two collaborators on the project, someone would have been fired. But this is academia; that will not happen.

Interest rates on new federal student loans will increase quite a bit this year:

Undergrads: 4.45%

Grads: 6.00%

PLUS: 7.00%

The US Census Director, John Thompson, suddenly quit last week likely due to frustration with significant underfunding of the 2020 census. We will likely undercount groups including immigrants, poor people, and racial/ethnic minorities which corrupts our ability to even approach ground truth in social science research. As we’ve seen previously in this newsletter, if we can’t rely on the Census, we may be able to approximate with Google image data or satellite data.

New York has a new database for tracking and helping homeless people. Called StreetSmart, it has been somewhat uncomfortably described as “a customer relationship management system for the homeless”. (Aside: Why do we see our customer status as a more rightful entitlement to good treatment than, say, our status as citizens, residents, or workers?) The new system makes it easier to identify unsafe shelters and to provide appropriate services to homeless people who may move from one borough/database to another. The data are collected by humans, not by sensors.

London is hiring its first Chief Digital Officer


U.S. Census director resigns amid turmoil over funding of 2020 count

The Washington Post, Tara Bahrampour


from

The director of the U.S. Census Bureau is resigning, leaving the agency leaderless at a time when it faces a crisis over funding for the 2020 decennial count of the U.S. population and beyond.

John H. Thompson, who has served as director since 2013 and worked for the bureau for 27 years before that, will leave June 30, the Commerce Department announced Tuesday.

The news, which surprised census experts, follows an April congressional budget allocation for the census that critics say is woefully inadequate.


RIT a big winner in first-ever New York State Game Development Challenge

WXXI, Innovation Trail


from

State and local officials were at the Strong Museum of Play Monday to hand out monetary awards to the video game developers of the future.

Rochester Institute of Technology’s Center for Media, Arts, Games, Interaction and Creativity, or MAGIC Center, announced the winners in the student and independent category in first New York State Game Development Challenge.

 
Events



Time and Causality in the Sciences 2017

Stevens Institute of Technology


from

Hoboken, NJ The Causality in the Sciences conference series at Stevens Institute of Technology brings together philosophers and scientists to explore various aspects of causality. June 7-9. [$$$]


Spark Bootcamp

Southern Data Science, LLC


from

Atlanta, GA September 1-3. Labs will run on Databricks Community Edition, and will focus on Apache Spark functionality, not Databricks enterprise features. [$$$$]

 
Deadlines



The 2017 World Science Festival is now accepting applications for volunteers!

New York, NY Organizers are looking for science pros and students in NYC to volunteer at World Science Fest, May 30-June 4.
 
Tools & Resources



Bayesian Methods for Hackers

CamDavidsonPilon


from

Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

The choice of PyMC as the probabilistic programming language is two-fold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe. The official documentation assumes prior knowledge of Bayesian inference and probabilistic programming. We hope this book encourages users at every level to look at PyMC. Secondly, with recent core developments and popularity of the scientific stack in Python, PyMC is likely to become a core component soon enough.


FMA: A Dataset For Music Analysis

Michaël Defferrard


from

“The dataset is a dump of the Free Music Archive, an interactive library of high-quality, legal audio downloads.”


Empirical Software Engineering using R

Derek M. Jones


from

This book aims to discuss all of what is currently known about software engineering, based on an analysis of all publicly available software engineering data.

This aim is not as ambitious as it sounds because there is not a great deal of data publicly available. Until recently researchers in software engineering concentrated on producing work that gave readers mathematical orgasms, rather than anything useful to industry based on experimental evidence.


Voyager 2: Augmenting Visual Analysis with Partial View Specifications

Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, Jeffrey Heer


from

“Visual data analysis involves both open-ended and focused exploration. Manual chart specification tools support question answering, but are often tedious for early-stage exploration where systematic data coverage is needed. Visualization recommenders can encourage broad coverage, but irrelevant suggestions may distract users once they commit to specific questions. We present Voyager 2, a mixed-initiative system that blends manual and automated chart specification to help analysts engage in both open-ended exploration and targeted question answering.”


A novel approach to neural machine translation

Facebook Code, Engineering Blog; Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin


from

Language translation is important to Facebook’s mission of making the world more open and connected, enabling everyone to consume posts or videos in their preferred language — all at the highest possible accuracy and speed.

Today, the Facebook Artificial Intelligence Research (FAIR) team published research results using a novel convolutional neural network (CNN) approach for language translation that achieves state-of-the-art accuracy at nine times the speed of recurrent neural systems.1 Additionally, the FAIR sequence modeling toolkit (fairseq) source code and the trained systems are available under an open source license on GitHub so that other researchers can build custom models for translation, text summarization, and other tasks.

 
Careers


Full-time, non-tenured academic positions

Data Visualization Engineer



University of Colorado, School of Medicine; Aurora, CO

Reader



University College London, Department of Security and Crime Science; London, England
Full-time positions outside academia

Economist- DEU



Federal Trade Commission; Washington, DC

Senior User Experience Researcher



Bloomberg, Professional Financial Products group; New York, NY
Internships and other temporary positions

Cyclotron Road Applied Research Fellows



Oak Ridge National Laboratory and Lawrence Berkeley Laboratory; Berkeley, CA
Tenured and tenure track faculty positions

George J. Klir Endowed Professor, Systems Science



Binghampton University; Binghampton, NY

Leave a Comment

Your email address will not be published.