Data Science newsletter – September 6, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 6, 2017

GROUP CURATION: N/A

 
 
Data Science News



10 Things Everyone Should Know About Machine Learning

LinkedIn, Dan Tunkelang


from

As someone who often finds himself explaining machine learning to non-experts, I offer the following list as a public service announcement.

1. Machine learning means learning from data; AI is a buzzword.


Facebook Debuts Gatekeeper System to Speed Up Coding Efforts

Fortune, Jonathan Vanian


from

Facebook’s billions of users may not have noticed any changes. But over the past year, the social network has overhauled its underlying software so that it can debut new features more quickly and save its engineers time.

The fixes, both small and large, help keep the site operating smoothly, even during huge spikes in traffic during major events like Hurricane Harvey. They also help the company add new features like video streaming and messaging, without many technical hiccups.

To help with these projects, Facebook used software development techniques typically used by startups that are a fraction of its size, said Chuck Rossi who oversees the company’s big software release projects. When he first joined Facebook nearly 10 years ago after stints at Google (goog, -0.61%), VMware (vmw, +0.49%), and IBM (ibm, +0.91%), he saw how the “crazy kids,” as he put it, built the site and its related software infrastructure unlike anything he’d seen before.


These Scientists Got To See Their Competitors’ Research Through Public Records Requests

BuzzFeed News, Teresa L. Carey and Aylin Woodward


from

Over the past decade, scientists or universities have used the Freedom of Information Act to get thousands of competitors’ grants proposals. And many of the targeted scientists are upset.


The Looming Decline of the Public Research University

Washington Monthly, Jon Marcus


from

The endowments of the universities of Iowa, Wisconsin, and Illinois and Ohio State, which together enroll nearly 190,000 students, add up to about $11 billion—less than a third of Harvard’s $37.6 billion. Together, Harvard, MIT, and Stanford, which enroll about 50,000 students combined, have more than $73 billion in the bank to help during lean times. They also have robust revenues from high tuitions, wealthy alumni donors, strong credit, and other support to fall back on. Compare that to the public university system in Illinois, which has cut its higher education budget so deeply that Moody’s downgraded seven universities, including five to junk bond status.

This ominous reality could widen regional inequality, as brainpower, talent, and jobs leave the Midwest and the Rust Belt—where existing economic decline may have contributed to the decisive shift of voters toward Donald Trump—for places with well-endowed private and better-funded public universities. Already, some midwestern universities have had to spend millions from their battered budgets to hang on to research faculty being lured away by wealthier schools. A handful of faculty have already left, taking with them most if not all of their outside funding.


Hey MTA: Here Are Five Qualities of a Useful Transit Dashboard

TransitCenter


from

The students’ assignment was to develop a comprehensive performance measurement tool that was accessible and relevant to riders, agency board members, and staff — the sort of dashboard that the MTA can and should publish. They used the MTA’s real time bus data to underpin the dashboard, but the MTA’s dashboard should report on performance across the agency, including both buses and subways at a bare minimum.

While we may never know what a rider-focused dashboard covering the MTA’s entire portfolio of services would look like, here are five characteristics of a good transit dashboard. Some are drawn from the CUSP work, others from agencies with a stronger commitment to transparency than New York’s MTA:

Measures the right things. Not all metrics are created equal. Excellent dashboards measure and present the right information. They include both operational and customer-centric metrics that can be applied internally to evaluate service, all improvements upon the MTA’s “wait assessment” metric, which the agency now acknowledges to be flawed. For individual bus routes, the CUSP dashboard presents excess wait time (which we have written about at length) average speed, and ‘route lateness factor’ — an estimate of the average percent delay on the route. In English, that means with a route lateness factor of 20 percent, a trip that should take 15 minutes will take three extra minutes, or 18 minutes total, on average. These metrics capture what is most important to transit riders — compared with the schedule’s overly optimistic time estimates (scheduled journey time), how long should riders really expect to wait for and spend time on-board the bus, both on average (average journey time) and in order to plan to be on time at least 19 times out of 20 (‘planning’ journey time). Busy New Yorkers need to be on time, so these metrics, when considered together, tell the story of how transit riders really experience the MTA’s bus system.
Easy to use. The dashboard should be easy to navigate and use for engaged members of the general public, journalists, and the MTA board, building on examples set by MBTA and NYU CUSP. Ideally, the MTA would seek out members of these groups to act as user-testers for its dashboard.
Presents historical context. Performance data is not informative without the historical context needed to judge whether an agency is improving, plateauing or worsening over time. Both the data made available to the public and the dashboard itself should make historical trends clear, particularly year-over-year. The CUSP student dashboard includes six months of data (December 2016-May 2017) in this beta version, but is designed to support longitudinal analysis.
Provides open and accessible data. Make performance data open and downloadable in an easily usable and machine-readable format — not only locked up in pdfs. MBTA does this through its dashboard as well as a publicly-accessible API. The MTA has also set a good precedent by internally using the same Bus Time data it publishes openly for developers to program – now it should provide the data in comprehensible form for the general public.
Links metrics to goals. The best performance dashboards create a clear link between agency goals and specific performance metrics, and mirrors criteria used internally to evaluate staff and departmental performance. They can include goals anywhere from social equity to service reliability to disability access to fiscal responsibility. SFMTA’s performance metrics dashboard looks at key performance indicators and relates them to stated agency goals.

The new MTA leadership has the power and know-how to do something about the delays and uncertainty that bedevil New York City’s commuters. But to act requires an understanding of the problems, a willingness to communicate honestly and transparently about those problems, and accountability for the agencies responsible for solving those problems. The CUSP students have taken an important step to demonstrate how a common understanding of the problems facing NYC bus service can be articulated. The MTA would do well to do draw inspiration from these students’ ambitious work, using their considerable expertise and resources to demonstrate and advance industry best practice in their forthcoming dashboard — resolving a regional transit system in a state of “Emergency” must start with a clear understanding of the problems and strong accountability to the public who rely on that system.


How the U.S. Open Is Using Artificial Intelligence to Stay Running Smoothly

Inc.com, Kevin J. Ryan


from

Watson’s new system, called Cognitive Highlights, measures the volume of the crowd (which is known to be raucous), the commentators’ analysis, and the players’ reactions. It assigns each of those a score from 0 to 1, and then uses those inputs to determine an Overall Excitement score. A point that results in a player yelling and fist pumping, then, will generate a higher score than one in which she pats her racket against her leg. The system can also parse the analysts’ language, so a “solid” shot won’t score as well as an “awesome” one.


This KFC brand store lets you pay for food with a smile

Digital Trends, Stephen Jordan


from

You won’t have to worry about having your wallet at this KFC store in Hangzhou, China. All you have to do is pay with a smile … literally. Yum China Holdings Inc, the biggest Chinese fast food chain, has created the “Smile to Pay” facial recognition system at a KFC restaurant located in Hangzhou called KPro. No cash, credit cards, or even smartphones are necessary.


Managing the new NIH requirements for clinical trials

UC3, Data Pub blog, John Borghi


from

If you think your work may be reclassified as a clinical trial, it’s probably worth getting a head start on meeting the new requirements. … If you believe your research may soon be reclassified as a clinical trial, now is probably a good time to take a hard look at how you and your lab handle research data management.The best way to relieve the administrative burden of these new requirements is to plan ahead and ensure that your materials are well organized, your data is securely saved, and your decisions are well documented.


You’re Using Foursquare All the Time. You Just Don’t Know It

Bloomberg BusinessWeek, Jeff Wise


from

Today the company works with Apple, Airbnb, Samsung, and more than 100,000 other app developers. Foursquare lets Snapchat tailor its photo filters to a user’s location and enables Uber Technologies Inc. customers to enter a venue name as a destination. Privately held Foursquare says its revenue grew 74 percent in 2016 and that it should hit $100 million, and turn a profit, in 2018 or the year after.

Foursquare’s greatest strength remains its uncannily accurate location data. As Swarm users move around in the world, Foursquare’s Pilgrim technology monitors the numerous sensors in their smartphones, then guesses their location based on such signals as the identity and relative strength of available cell and Wi-Fi networks.


Big data: How the University of Michigan navigates ethics, unpredictability of data science research

University of Michigan, The Michigan Daily, Alexa St John


from

Even before this influx of data-driven institutions, researchers at the University had been using these large data sets to tackle issues such as disease prediction and augment the research of computer science.

Eric Michielssen, University associate vice president, Advanced Research Computing, said there is a “tsunami” of data available for use for these issues and more, but he and Athey cited the importance of a concept called the Four V’s of Big Data.

Aside from simply the volume of data Michielssen said researchers consider velocity, or the ability to receive vast amounts of data with unprecedented speeds. They must also note the variety of formats available, not just the receipt of data through structured spreadsheets, and the veracity, or uncertainty and trustworthiness of the data received.


‘Cloud’ research involving Clemson University gets $10 million bump

Clemson University, the Newsstand


from

Clemson University is part of a team that is receiving $10 million from the National Science Foundation to help build what experts call a “cloud computing system,” bringing the total funding to $20 million since 2014.

The project, CloudLab, includes Clemson’s Kuang-Ching “K.C.” Wang as a co-principal investigator. Wang, a professor in the Holcombe Department of Electrical and Computer Engineering, is collaborating with researchers from the University of Utah and the University of Wisconsin.

The ultimate goal is to advance what often goes by a single buzzword: the cloud.


Universities rush to add data science majors as demand explodes

San Francisco Chronicle, Isha Salian


from

Across the UC system, campuses are quickly adding data science programs in response to soaring workplace demand. UC San Diego is starting a data science undergraduate major and minor this fall. UC Davis opened a statistical data science track within its statistics major effective this year. And at UC Santa Cruz, a new D3 Research center — short for Data, Discovery and Decisions — pairs students with companies to work on research projects using data science skills.

UC Irvine was the first in the University of California system to create a data science major in fall 2015.

“This is the national conversation at pretty much all of the leading universities,” Culler said.


Elsevier Signs Up to Transparency Guidelines

The Scientist Magazine®, Catherine Offord


from

The publisher will ask its journal editors to adopt widely accepted standards on transparency and openness in scientific communication.


Nvidia to Play Big Role in ‘Huge’ Wal-Mart Cloud Push, Says Global Equities

Barron's, Tiernan Ray


from

Wal-Mart is building out its cloud computing network, OneOps, to one-tenth the size of Amazon’s AWS, writes Trip Chowdhry of Global Equities, and Nvidia GPU chips will play a major role, he believes.


White House Announces Picks for NASA Chief and NOAA Deputy Head

Eos, Randy Showstack


from

The Trump administration has chosen an Oklahoma lawmaker who is a former fighter pilot and air and space museum executive to direct the nation’s space agency. In an announcement this past Friday evening that included dozens of intended nominations, the White House also revealed its choice for the number two position at the National Oceanic and Atmospheric Administration (NOAA).

The administration’s intention to nominate Rep. Jim Bridenstine (R-Okla.) as the next NASA administrator is drawing praise from space industry groups but is also sparking concerns among some senators who worry that a politician in the post could embroil the agency in partisan battles. In addition, some climate experts are vexed by the White House’s 1 September pick because of Bridenstine’s past statements expressing skepticism about human-induced climate change.


Brain researchers in uproar over NIH clinical-trials policy

Nature News & Comment, Sara Reardon


from

An open letter to the US National Institutes of Health says that classifying human-behaviour studies as clinical trials creates unnecessary red tape.


Two sciences tie the knot

MIT News, School of Engineering


from

Economics and computer science had always been on friendly terms at MIT. With the growth of cloud computing, e-commerce, machine learning, and online social networks, their relationship grew more serious. Now that these tools and applications have become ubiquitous and gone global, economics and computer science are taking their relationship to the next level.

Starting in the fall of 2017, the two academic departments will offer a joint major — Course 6-14: Computer Science, Economics, and Data Science — because elements of the two fields have become, well, inseparable. The new major aims to prepare students to think at the nexus of economics and computer science, so they can understand and design the kinds of systems that are coming to define modern life. Think Amazon, Uber, eBay, etc.

“This area is super-hot commercially,” says David Autor, the Ford Professor of Economics and associate head of the Department of Economics. “Hiring economists has become really prominent at tech companies because they’re filling market-design positions.”

 
Events



HackNY Hackathon

HackNY


from

New York, NY Our Fall 2017 hackNY Student Hackathon will take place at NYU’s Courant Institute of Mathematical Sciences on October 14-15, 2017! Applications for the Fall 2017 hackathon have not yet opened. Fill out this typeform to get notified via email when applications do open!


Rock Stars of Machine Learning and Deep Learning

IEEE


from

Santa Clara, CA September 12 – Amazon, Bank of America, Merrill Lynch, the leaders who are out in front of the latest AI and Machine Learning developments, share their insights at Rock Stars of Machine Learning. [$$$]


Healthcare + AI #2

Silicon Valley Artificial Intelligence


from

Berkeley, CA September 12. “We’re exploring more about how NLP and Linguistic Computing are percolating into the healthcare patient-provider space with ResultCare and Doc.ai.” [$$]


Open Machine Learning Hackathon

Open Machine Learning


from

Leiden, The Netherlands October 9-13. [free, registration required]


AI World Conference & Expo

AI World


from

Boston, MA December 11-13. Industry’s Largest Independent AI Business Event [$$$$]

 
Deadlines



ECS Data Sciences Hack Day

National Harbor, MD This ECS Data Sciences Hack Day is the Society’s first foray into building an electrochemical data sciences and open source community from the ground up. Hack Day is October 4. Deadline to apply to participate is September 15.

NAACL HLT 2018, Call for Papers

New Orleans, LA The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies will be held in New Orleans, Louisiana, June 1 to June 6, 2018. Deadline for long paper submissions is December 15.
 
Tools & Resources



How to Recognize Burnout Before You’re Burned Out

The New York Times, Kenneth R. Rosen


from

… In today’s era of workplace burnout, achieving a simpatico work-life relationship seems practically out of reach. Being tired, ambivalent, stressed, cynical and overextended has become a normal part of a working professional life. The General Social Survey of 2016, a nationwide survey that since 1972 has tracked the attitudes and behaviors of American society, found that 50 percent of respondents are consistently exhausted because of work, compared with 18 percent two decades ago.

Where once the term burnout was applied exclusively to health care workers, police officers, firefighters, paramedics or social workers who deal with trauma and human services — think Graham Greene’s novel “A Burnt-Out Case,” about a doctor in the Belgian Congo, a book that gave rise to the term colloquially — the term has since expanded to workers who are now part of a more connected, hyperactive and overcompensating work force.

But occupational burnout goes beyond needing a simple vacation or a family retreat, and many experts, psychologists and institutions, including the Centers for Disease Control and Prevention, highlight long-term and unresolvable burnout as not a symptom but rather a major health concern. (Though it does not appear in the Diagnostic Statistic Manual, the benchmark for psychological ailments, it does appear in the International Statistical Classification of Diseases and Related Health Problems, a classification used by the World Health Organization.)


Keras for R

RStudio Blog, JJ Allaire


from

We are excited to announce that the keras package is now available on CRAN. The package provides an R interface to Keras, a high-level neural networks API developed with a focus on enabling fast experimentation.


VoiceLoop

GitHub – facebookresearch


from

PyTorch implementation of the method described in the Voice Synthesis for in-the-Wild Speakers via a Phonological Loop.


Letters to a Young Blogger

Pacific Standard, Seth Masket


from

For academics looking to play a larger public role, publishing online can be a useful counter to some of the downsides of scholarly writing.

 
Careers


Postdocs

Post-Doctoral Associate



New York University, Center for Urban Science and Progress; New York, NY

Postdoctoral Training Fellow



Francis Crick Institute; London, England

PostDoc Position, Numerical Linear Algebra



EPFL; Lausanne, Switzerland
Full-time positions outside academia

Execution Trader



Alpha Architect; Philadelphia, PA

Community Manager



New Knowledge AI; Austin, TX

Senior Research Scientist



Bloomsbury AI; London, England

Developer Advocate (Tidyverse)



Microsoft, R Studio; Boston, MA

Leave a Comment

Your email address will not be published.