Data Science newsletter – October 12, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 12, 2017

GROUP CURATION: N/A

 
 
Data Science News



It Costs Money to Get It Right

Simply Statistics, Roger Peng


from

… I often argue with people over whether companies like Google, Facebook, and Apple do research. My argument is that for the most part, they do not, because they are not interested in creating new knowledge. They do not make any specific public claims or inferences about the data they’ve gathered and so there isn’t really anything for them to defend. They are interested in taking whatever information they collected and channeling it into products. Yes, all of these companies occasionally publish a paper (I think Apple has a grand total of five), and I would say that those papers represent real research. But I would wager that those papers represent a small fraction of the work going on in those companies.


Go with the flow: the hypnotic beauty of public transport – mapped

The Guardian, Cities, Max Galka


from

Public transit maps are a testament to the power of visual communication. They reduce the task of navigating hundreds or even thousands of miles of transport routes to simply glancing at an image. However, understanding a city’s transit frequency – how often the trains and buses run along those routes – is not nearly as easy, and typically requires scanning through pages of timetables.

How can cities make their transit frequencies as simple and intuitive to understand as their routes? Columbia University grad student Will Geary offers a solution: TransitFlow, an experimental set of tools for building animated transport frequency maps.

“Transit frequency is hard to visualise using timetables,” says Geary, who built TransitFlow while interning this summer at mapping startup Mapzen. “Timetables provide information about frequency but can be overwhelming, unintuitive and lacking geographic context.”


Universal Paperclips

Jason Kottke


from

There’s a new meta game by Frank Lantz making the rounds: Universal Paperclips, “in which you play an AI who makes paperclips”. Basically, you click a button to make money and use that money to buy upgrades which gives you more money per click, rinse, repeat.


ASA receives NSF grant for two-year college data science summit

EurekAlert! Science News, American Statistical Association


from

Recognizing the growing demand for students and professionals with data science skills as part of the ongoing big data revolution, the National Science Foundation (NSF) awarded the ASA a $50,000 grant to conduct a symposium that will foster the establishment and growth of data science programs at two-year colleges. The symposium is tentatively scheduled for May 10-11, 2018.


Allen School and AWS team up on new NNVM compiler for deep learning frameworks

University of Washington, Allen School News


from

Tianqi Chen, Thierry Moreau, Haichen Shen, Luis Ceze, Carlos Guestrin, Arvind KrishnamurthyA team of researchers at the Allen School and AWS have released a new open compiler for deploying deep learning frameworks across a variety of platforms and devices. The NNVM compiler simplifies the design of new front-end frameworks and back-end hardware by offering the ability to compile front-end workloads directly to hardware back-ends. The new tool is built upon the TVM stack previously developed by the same Allen School researchers in order to bridge the gap between deep learning systems optimized for productivity, and the programming, performance, and efficiency constraints enforced by different types of hardware.

“While deep learning is becoming indispensable for a range of platforms — from mobile phones and datacenter GPUs, to the Internet of Things and specialized accelerators — considerable engineering challenges remain in the deployment of those frameworks,” noted Allen School Ph.D. student Tianqi Chen. “Our TVM framework made it possible for developers to quickly and easily deploy deep learning on a range of systems. With NNVM, we offer a solution that works across all frameworks, including MXNet and model exchange formats such as ONNX and CoreML, with significant performance improvements.”


Special Investigation: How America’s Biggest Bank Paid Its Fine for the 2008 Mortgage Crisis—With Phony Mortgages!

The Nation, David Dayen


from

You know the old joke: How do you make a killing on Wall Street and never risk a loss? Easy—use other people’s money. Jamie Dimon and his underlings at JPMorgan Chase have perfected this dark art at America’s largest bank, which boasts a balance sheet one-eighth the size of the entire US economy.

After JPMorgan’s deceitful activities in the housing market helped trigger the 2008 financial crash that cost millions of Americans their jobs, homes, and life savings, punishment was in order. Among a vast array of misconduct, JPMorgan engaged in the routine use of “robo-signing,” which allowed bank employees to automatically sign hundreds, even thousands, of foreclosure documents per day without verifying their contents. But in the United States, white-collar criminals rarely go to prison; instead, they negotiate settlements. Thus, on February 9, 2012, US Attorney General Eric Holder announced the National Mortgage Settlement, which fined JPMorgan Chase and four other mega-banks a total of $25 billion.

JPMorgan’s share of the settlement was $5.3 billion, but only $1.1 billion had to be paid in cash; the other $4.2 billion was to come in the form of financial relief for homeowners in danger of losing their homes to foreclosure.


Shafi Goldwasser appointed Director of the Simons Institute for the Theory of Computing

Simons Institute for the Theory of Computing


from

Turing Award-winning computer scientist Shafi Goldwasser will become the new Director of the Simons Institute for the Theory of Computing at the University of California, Berkeley, on January 1, 2018. The Simons Institute is the world’s leading venue for collaborative research in theoretical computer science. Established on July 1, 2012 with a grant of $60 million from the Simons Foundation, the Institute is housed in Calvin Lab, a dedicated building on the UC Berkeley campus. The Simons Institute brings together the world’s leading researchers in theoretical computer science and related fields, as well as the next generation of outstanding young scholars, to explore deep unsolved problems about the nature and limits of computation.


The US Postal Service Is Working on Self-Driving Mail Trucks

WIRED, Transportation, Aarian Marshall


from

Neither snow nor rain nor heat nor gloom of night stays these couriers from the swift completion of their appointed rounds—and if the United States Postal Service has its way, the robots won’t stop them, either.

Yes, the agency you know best for bringing you junk mail addressed to whomever lived in your apartment before you has caught robofever. It plans to put semiautonomous mail trucks into service in just seven years, and it seems to think it can pull off a shift away from human driving without shedding mail carrier jobs.

That’s all according to the postal service’s Office of the Inspector General, which oversees the agency and last week released a report on its plans to work autonomy into its 228,000-vehicle fleet. Those plans are already in motion: The post office has partnered with the University of Michigan to build what it’s calling an Autonomous Rural Delivery Vehicle, which it wants to launch on 28,000 rural routes nationwide as early as 2025.


Data From 3.5 Million Employees Shows How Innovation Really Works

Harvard Business Review; Dylan Minor, Paul Brook and Josh Bernoff


from

[Dylan Minor] has analyzed five years of data from 154 public companies covering over 3.5 million employees that have used an idea management system called Spigit. For the millions of employees of these companies, the idea management system functions a little like Facebook – people can post ideas, get votes, deliver or respond to feedback, and develop the ideas into innovations that make a difference to the company. The innovation teams at these companies use them to track and process all the ideas and whether the company committed to putting them into practice. Some companies use this software for process innovation; others develop new products; others seek efficiencies and cost savings.

Once you put innovation into a system like this, you can track everything. We know how many innovation challenges the companies are running, how many people are suggesting ideas, and how many ideas they suggest. We know how many people are participating in other ways – by voting or making comments, for example. And we also know how many of those ideas get through the endpoint of the challenge, which is where the company’s management determines which ideas to pursue further. We used linear regression to analyze every potential measure the system includes over every 3-month time period when the system was active within the company.


University Data Science News

Stanford had a class session on the theory underlying deep learning. Two hundred students and faculty showed up. There’s no video, but slides are up on GitHub.

In research at Stanford ecologist Jose Fragoso (who has since moved to the California Academy of Sciences) and biologist Rodolfo Dirzo provided evidence that large animal diversity, in addition to plant life, also influences carbon stocks and contributes to climate change mitigation. The coolest part of this project is how they collected data. They have “more than a million records of animal sightings and activity collected by 340 indigenous technicians in the Amazon during more than three years.”



University of Oxford and Tel Aviv University scientists have just completed a map of all the world’s living reptiles. This map has been combined with similar maps for birds, mammals and amphibians to produce what they’re calling an “atlas of life.” The goal of the project is to highlight areas in need of conservation efforts.

Go easy on the undergraduates. A new survey result from the Higher Education Research Institute found that 41% of college freshmen are “frequently overwhelmed” by all they had to do. This is a dramatic increase from 18% in 1985. When they miss class or assignments, reach out to see why.



The Simons Foundation has a large collaborative physics group working on a project called “It from Qubit.” They are looking to find whether quantum bits or qubits, give rise to the structure of space and gravity. It is led by Patrick Hayden of Stanford University with a team of 16 other principal investigators around the world, adding evidence to the pattern of massive collaboration networks in physics.



Starting next year, The Simons Foundation will have a new director, Shafi Goldwasser who will move from her post at MIT to UC-Berkeley to take the new position. She’s a theoretical computer scientist star so this announcement is further evidence of the Simons Foundation’s ability to attract the best and brightest.



The Simons Foundation is also inviting sabbatical-ready researchers in Astrophysics to apply to undertake projects at the organization’s Flatiron Institute near NYU in Manhattan.

Kevin Chao at Northwestern’s Institute on Complex Systems is using data science to predict earthquakes, a known hard problem. The thinking goes like this: there are slow earthquakes that release energy over a period of hours to months. Changes in these slow earthquakes can be detected and if the data is robust, perhaps these slow earthquakes can be used to predict regular earthquakes.

Columbia University’s Data Science Institute has a new director this year, Jeanette Wing. Wing was formerly the corporate vice president of Microsoft Research and has committed to the tagline “data for good.” That’s a promising moral compass.



UW-Seattle data science ethnographers Britta Fiore-Gartland, Anissa Tanweer, and Meg Drouhard have coined the term “data walk” and explain how they use these walks to open data scientists perspectives to the human contexts of data in the environment.

Elsewhere at UW-Seattle’s Allen School of Computer Science & Engineering released a new NNVM compiler for deep learning frameworks.



And one more UW-Seattle item: the Bill & Melinda Gates Center for Computer Science & Engineering will soon materialize the campus across the street from the aforementioned Allen School building. The building’s donor list is a who’s who of Seattle’s tech elite and includes top execs from Microsoft, Zillow, and Amazon. It is great to see this kind of money being poured into a state school. (Now, how about some donors for social science to make sure our technology is working for all of us?)

Regina Barzilay from MIT and Stefan Savage from University of California-San Diego are computer scientists who were both awarded MacArthur genius grants. Sweet.



Jason Schultz of NYU Law wrote a quick, thoughtful piece on why standards developed for open software might not work for open AI. (Here I am talking about the concept of open AI not the company called OpenAI.) He asks, “So what is to be done about it all? While efforts to provide ‘open’ code, algorithms and training data are laudable, the computation, competition, and accountability/audit concerns are unlikely to be answered with standard open source approaches. Instead, we will need new methods both to measure the negative impacts of AI closure and to ‘fork’ alternatives in meaningful technological, economic, and social ways.”

The National Science Foundation has given a $50,000 grant to the American Statistical Association to host two symposia that will address teaching data science in 2-year colleges. This impetus to include 2-year college students in the data-driven future is critical, but has received little attention thus far. (Please correct me if I am wrong about the claim of “little attention.” I haven’t seen any other efforts, but I may have missed something).



The National Academy of Science published a report on the state of data science education, finding that it is “still in its infancy”. The report notes that, “the main concepts, skills and ethics powering this emerging discipline of data science” haven’t been identified yet.” Lots of smart people are working to improve data science education and I’m hopeful that all the concern about human contexts and ethics will find a way into the coursework.


Put Humans at the Center of AI

MIT Technology Review, Will Knight


from

As the director of Stanford’s AI Lab and now as a chief scientist of Google Cloud, Fei-Fei Li is helping to spur the AI revolution. But it’s a revolution that needs to include more people. She spoke with MIT Technology Review senior editor Will Knight about why everyone benefits if we emphasize the human side of the technology.


Africa leads the way in election technology, but there’s a long way to go

The Conversation, Stephen Chan


from

Kenya’s recently annulled elections will soon be re-run, but the long-term questions they raised about election management are still unanswered. The spotlight is on the work of international observer teams, but there are also much wider questions of electoral capacity – problems that extend to the top of the African Union, and thence across the whole continent.

African democracies are in the process of co-ordinating a generation jump in applied technology. So far, they have actually done a remarkable job by global standards.


This is how much Google is spending on cutting edge AI research

Quartz, Mike Murphy


from

DeepMind lost £123.5 million (roughly $162 million) last year, according to information filed with the UK government Oct. 2. Under UK law, any private company (DeepMind is still registered as a private company in the UK) with over £10.2 million ($13.4 million) in annual revenue, assets worth over £5.1 million, and more than 50 employees is required to disclose their financials to the government.

The company generated £40.2 million ($52.7 million) in revenue last year, which was entirely from work it did for other sections of is parent company Alphabet, rather than external clients. DeepMind also had a £41.1 million “management service fee,” which includes costs such as real estate and the running and maintenance of its computer systems. But the biggest loss by far was the £104.7 million ($137 million) expense it incurred for “staff costs and other related costs,” which includes things like the company’s payroll, travel, office hardware, and software—and presumably, stock-based compensation.


How Computers Turned Gerrymandering Into a Science

The New York Times, Jordan Ellenberg


from

Gerrymandering used to be an art, but advanced computation has made it a science. Wisconsin’s Republican legislators, after their victory in the census year of 2010, tried out map after map, tweak after tweak. They ran each potential map through computer algorithms that tested its performance in a wide range of political climates. The map they adopted is precisely engineered to assure Republican control in all but the most extreme circumstances.

In a gerrymandered map, you concentrate opposing voters in a few districts where you lose big, and win the rest by modest margins. But it’s risky to count on a lot of close wins, which can easily flip to close losses. Justice Sandra Day O’Connor thought this risk meant the Supreme Court didn’t need to step in. In a 1986 case, she wrote that “there is good reason to think political gerrymandering is a self-limiting enterprise” since “an overambitious gerrymander can lead to disaster for the legislative majority.”


A National Center For Digital Health Innovation

Northwestern University, NUCATS: NU Clinical and Translational Sciences Institute


from

Northwestern University, in partnership with Oregon Health & Science University, University of Washington, Johns Hopkins University School of Medicine, and Sage Bionetworks, has been awarded a five-year $25M cooperative agreement from the National Center for Advancing Translational Science (NCATS) to create a new CTSA Program National Center for Data to Health (CD2H). This award will coalesce and coordinate informatics activities across the Clinical and Translational Science Award (CTSA) Program, a network of more than 50 medical research institutions, to provide collaborative clinical and translational research infrastructure.

The new CTSA Program National Center for Data to Health will be led by Dr. Melissa Haendel (OHSU), Dr. Kristi Holmes (Northwestern University), Dr. Sean Mooney (University of Washington), Dr. Christopher Chute (Johns Hopkins University School of Medicine), and John Wilbanks (Sage Bionetworks). These institutions, together with The Scripps Research Institute, Washington University in St. Louis, the University of Iowa, and The Jackson Laboratory will work with CTSA Program hubs and the National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health to launch the CD2H.


Scientists complete conservation puzzle, shaping understanding of life on earth

University of Oxford


from

Wildlife map reveals urgent conservation concerns

An international team of scientists have completed the ‘atlas of life’ – the first global review and map of every vertebrate on Earth.

Led by researchers at the University of Oxford and Tel Aviv University, the 39 scientists have produced a catalogue and atlas of the world’s reptiles. By linking this atlas with existing maps for birds, mammals and amphibians, the team have found many new areas where conservation action is vital.


Scientists Use Machine Learning to Translate ‘Hidden’ Information that Reveals Chemistry in Action

Brookhaven National Laborratory, Newsroom


from

Now scientists have a way to capture the details of chemistry choreography as it happens. The method—which relies on computers that have learned to recognize hidden signs of the steps—should help them improve the performance of catalysts to drive reactions toward desired products faster.

The method—developed by an interdisciplinary team of chemists, computational scientists, and physicists at the U.S. Department of Energy’s Brookhaven National Laboratory and Stony Brook University—is described in a new paper published in the Journal of Physical Chemistry Letters. The paper demonstrates how the team used neural networks and machine learning to teach computers to decode previously inaccessible information from x-ray data, and then used that data to decipher 3D nanoscale structures.


NISS Welcomes James Rosenberger as New Director

Amstat News


from

National Institute of Statistical Sciences (NISS) chair, Mary Batcher, announced James Rosenberger as the new NISS director, which became effective August 1, during the NISS JSM reception July 31. Rosenberger succeeds Nell Sedransk.

Speaking about Rosenberger at the NISS reception, Batcher said, “Rosenberger is highly regarded and well liked in the statistical community. He has held several leadership roles in the American Statistical Association and is well qualified to lead NISS through a period of expansion. His continuing association with Penn State is also valuable.”


Where is technology taking the economy?

McKinsey & Company, W. Brian Arthur


from

A year ago in Oslo Airport I checked in to an SAS flight. One airline kiosk issued a boarding pass, another punched out a luggage tag, then a computer screen showed me how to attach it and another where I should set the luggage on a conveyor. I encountered no single human being. The incident wasn’t important but it left me feeling oddly that I was out of human care, that something in our world had shifted.

That shift of course has been going on for a long time. It’s been driven by a succession of technologies—the Internet, the cloud, big data, robotics, machine learning, and now artificial intelligence—together powerful enough that economists agree we are in the midst of a digital economic revolution. But there is less agreement on how exactly the new technologies are changing the economy and whether the changes are deep. Robert Gordon of Northwestern University tells us the computer revolution “reached its climax in the dot-com era of the 1990s.” Future progress in technology, he says, will be slower.

So in what way exactly are the new technologies changing the economy? Is the revolution they are causing indeed slowing—or is it persistent and deep? And if so how will it change the character of the economy?

 
Events



National Transportation Data Challenge Summit

NSF Big Data Innovation Hubs


from

Washington, DC There are many ways to get involved with the National Transportation Data Challenge in the run-up to our Challenge Summit on November 9.[free]


The Self-Stirred Genome: How and Why? Alexandra Zidovska, Assistant Professor of Physics

NYU


from

New York, NY October 16, starting at 5 p.m., Irving H. Jurow Lecture Hall (31 Washington Place). Part of the NYU College of Arts & Sciences Scholars Lecture Series. [free]

 
Deadlines



Interpretable ML Symposium – NIPS 2017

Long Beach, CA December 7. “We invite researchers to submit their recent work on interpretable machine learning from a wide range of approaches, including (1) methods that are designed to be more interpretable from the start, such as rule-based methods, (2) methods that produce insight into existing ML models, and (3) perspectives either for or against interpretability in general.” Deadline for submissions is October 20.

PLMW 2018 Programming Languages Mentoring Workshop – POPL 2018

“The Programming Languages Mentoring Workshop (PLMW) aims to encourage graduate students and senior undergraduate students to pursue careers in programming language research.” … The Workshop will be co­-located with POPL 2018 in Los Angeles on January 9. Deadline to apply is October 29.
 
Tools & Resources



Visualising Activation Functions in Neural Networks

David Sheehan


from

In neural networks, activation functions determine the output of a node from a given set of inputs, where non-linear activation functions allow the network to replicate complex non-linear behaviours. As most neural networks are optimised using some form of gradient descent, activation functions need to be differentiable (or at least, almost entirely differentiable- see ReLU). Furthermore, complicated activation functions may produce issues around vanishing and exploding gradients. As such, neural networks tend to employ a select few activation functions (identity, sigmoid, ReLU and their variants).

Select an activation function from the menu below to plot it and its first derivative. Some properties relevant for neural networks are provided in the boxes on the right.


DevOps Pipeline for a Machine Learning Project

Stats and Bots, Boris Tavorska


from

Machine learning is getting more and more popular in applications and software products, from accounting to hot dog recognition apps. When you add machine learning techniques to exciting projects, you need to be ready for a number of difficulties. The Statsbot team asked Boris Tvaroska to tell us how to prepare a DevOps pipeline for an ML based project.


Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

arXiv, Computer Science > Artificial Intelligence; Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese


from

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, where bottom-level programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well to- wards unseen tasks with increasing lengths, variable topologies, and changing objectives.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor of Music Technology



New York University, Steinhardt School of Culture, Education and Human Development; New York, NY
Postdocs

James Webb Space Telescope Postdoctoral Fellowships



NASA Goddard Space Flight Center; Greenbelt, MD

Leave a Comment

Your email address will not be published.