Data Science newsletter – May 9, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 9, 2017

GROUP CURATION: N/A

Data Science News

Mind-Reading Algorithms Reconstruct What You’re Seeing Using Brain-Scan Data

arXiv, MIT Technology Review

from May 06, 2017

Perceived images are hard to decode from fMRI scans. But a new kind of neural network approach now makes it easier and more accurate.

Research Blog: Updating Google Maps with Deep Learning and Street View

Google Research Blog; Julian Ibarz and Sujoy Banerjee

from May 03, 2017

Every day, Google Maps provides useful directions, real-time traffic information and information on businesses to millions of people. In order to provide the best experience for our users, this information has to constantly mirror an ever-changing world. While Street View cars collect millions of images daily, it is impossible to manually analyze more than 80 billion high resolution images collected to date in order to find new, or updated, information for Google Maps. One of the goals of the Google’s Ground Truth team is to enable the automatic extraction of information from our geo-located imagery to improve Google Maps.

In “Attention-based Extraction of Structured Information from Street View Imagery”, we describe our approach to accurately read street names out of very challenging Street View images in many countries, automatically, using a deep neural network.

story time in data science land

Professors like Mariana Mazzucato who work in the UK on EU passports are so fed up with the citizenship application process they are considering personal #Brexit. A colleague of mine just left a great tenure-track job because his partner couldn’t get permission to work in the US for who knows how long. Everyone I know thinks of the scientific community as a global community, but we are not immune to political boundaries or political games. This twitter thread is choice. How do you think Brexit and America’s current anti-immigrant climate will impact science?

The Economist has a provocative op-ed asserting that data are now the world’s most valuable resources and that current anti-monopoly policies are a terrible match for the agglomeration of knowledge-power wielded by firms like Google, Amazon, and Facebook. Spoiler: the article uses the word “googlet” to refer to a baby Google. The Economist also wrote about how much easier tacit collusion in price setting is when dynamic pricing meets perfect, instantaneous information about competitor’s prices.

Seth Stephens-Davidowitz argues that social scientists should use search data, not survey data, to reveal people’s true intentions. Why? Because people lie on surveys but reveal their truths in search. He expands on how this would work and what it means for social science in his new book Everybody Lies.

Because Tim Berners-Lee has a unique perspective on all things internet, it’s always good to read his interviews about what the web has wrought. He talks about the organizational dynamics that allowed the idea of the internet to gestate into being: “It was all unofficial, zero budget, but Mike allowed my 20 percent time to expand to 100 percent time.” Dreamy.

Steve Miller of IBM writing for KDNuggets has a comprehensive new report on data science jobs. He chastises academia for focusing on producing data scientists but ignoring, “the much larger demand for data-savvy managers (1.5 million new positions)”. The report also identifies the fastest growing data science roles, the top cities, and top industries.

If you aren’t reading The Pudding, you should be. This week on their elegant website they determine whether the lyrics in pop songs are becoming more repetitive using a dataset of 15,000 songs from 1958-2017.

Will AI eliminate jobs? We have heard this question and prognostications quite a bit. John Horton, Microsoft Research alum and NYU Stern professor, explains why we may need to worry about accountants being displaced more than we need to worry about truck drivers. Not entirely sure I agree with all of his reasoning, but his writing and thinking style are deep, clear, and engaging. At this point – 10:30 on a Friday night – I’m wishing AI might make part of my job redundant.

VCs Put $110 Million Into Grammar-Checking Software

Bloomberg Technology, Nico Case

from May 08, 2017

Venture capitalists want a piece of just about anything involving artificial intelligence, whether it’s computers learning to drive or helping people shop for clothing. The latest to get a sizable investment is a startup looking to use AI to improve people’s grammar.

General Catalyst, a Silicon Valley venture firm, said Monday that it led a $110 million investment in Grammarly Inc. The San Francisco startup makes software that underlines awkward words and phrases in the user’s writing and makes suggestions, similar to a feature in Microsoft Word.

Data science support offered to UC Berkeley instructors

Berkeley Data Science

from May 01, 2017

The Data Science Education Program (http://data.berkeley.edu/education) has three near-term opportunities to support instructors in learning and incorporating data science approaches. We are offering an early summer workshop, connector course development resources, and course module development support. Please let us know about your interests and needs!

Innovation in Network Measurement Can and Should Affect the Future of Internet Privacy

Princeton CITP, Freedom to Tinker blog, Nick Feamster

from May 08, 2017

As most readers are likely aware, the Federal Communications Commission (FCC) issued a rule last fall governing how Internet service providers (ISPs) can gather and share data about consumers that was recently rolled back through the Congressional Review Act. The media stoked consumer fear with headlines such as “For Sale: Your Private Browsing History” and comments about how ISPs can now “sell your Web browsing history to advertisers“. We also saw promises from large ISPs such as Comcast promising not to do exactly that. What’s next is anyone’s guess, but technologists need not stand idly by.

Technologists can and should play an important role in this discussion in several ways. In particular, conveying knowledge about the capabilities and uses of network monitoring, and developing both new monitoring technologies and privacy-preserving capabilities can and should shape this debate in three important ways: (1) Level-setting on the data collection capabilities of various parties; (2) Understanding and limiting the power of inference; and (3) Developing new monitoring technologies that help facilitate network operations and security while protecting consumer privacy.

DARPA specialist to lead Pitt’s School of Computing and Information

TribLIVE, Aaron Aupperlee

from May 08, 2017

A program manager specializing in artificial intelligence with DARPA will lead Pitt’s investment in big data and the computing requirements to analyze it.

Paul R. Cohen was named the founding dean of University of Pittsburgh’s new School of Computing and Information , Pitt announced Monday.

U.S. Social Sentiment Index

WSJ.com, WSJ Graphics

from May 09, 2017

Measuring the content of millions of Twitter messages each hour is one way to gauge the nation’s changing mood in near-real time. Here, the Wall Street Journal and IHS Markit have plotted that sentiment, comparing it to recent norms.

Does my algorithm work? There’s no shortcut for community detection

Santa Fe Institute

from May 04, 2017

Real-world networks are large and complex. Food webs, social networks, or genetic relationships may consist of hundreds, or even millions, of nodes. To understand the overarching layout of a large network, scientists design algorithms to divide the network’s nodes into significant groups, which make the network easier to understand. In other words, community detection allows a researcher to zoom out, seeing big patterns in the forest, instead of being caught up in the trees. In the past, researchers have used metadata as a sort of answer key or “ground truth” to verify that their community detection algorithms are performing well.

“Unfortunately, tempting as this practice is, with real-world data, there is no answer key, no ground truth,” explains Daniel Larremore, one of two lead authors of the paper and an Omidyar Fellow at the Santa Fe Institute. “Our research rigorously shows that using metadata as ground truth to validate algorithms is fundamentally problematic and introduces biases without telling us what we really need to know: does my algorithm work?”

Can Predictive Analytics Help Avert Pittsburgh’s Next Disaster?

Government Technology, Allen Young

from April 28, 2017

Emergency medical services personnel are increasingly adopting predictive modeling software that identifies patterns of geospatial data to predict future events. The automated platforms can call in extra ambulances from neighboring counties and direct the vehicles into areas with the most vulnerable populations before floods of 911 calls begin. Early detection also allows time to set up emergency shelters and overflow hospital rooms.

Data Storage Appears to be Artificial Intelligence’s New Frontier

Edgy Labs, William McKinney

from May 05, 2017

How is advanced storage helping us now?

One good example of this comes from NVIDIA. They may be famous for their graphic acceleration hardware, but they are also the creator of one of the most prolific deep learning platforms on the market. Give their system enough storage and it will enable your business to harness the power of AI to help handle the immense amount of data that you are producing.

U.S. life expectancy varies by two decades depending on location

Reuters, Lisa Rapaport

from May 08, 2017

Nationwide in 2014, the average life expectancy was about 79.1 years, up 5.3 years from 1980, the study found. For men, life expectancy climbed from 70 years to 76.7 years, while for women it increased from 77.5 years to 81.5 years.

But the study also highlighted stark disparities: a baby born in Oglala Lakota County, South Dakota, can expect to live just 66.8 years, while a child born in Summit County, Colorado, can expect to live 86.8 years, on average.

“For both of these geographies, the drastically different life expectancies are likely the result of a combination of risk factors, socioeconomics and access and quality of health care in those areas,” said senior study author Dr. Christopher Murray, director of the Institute for Health Metrics and Evaluation at the University of Washington in Seattle.

How robots can help us embrace a more human view of disability

The Conversation, Thusha Rajendran

from May 04, 2017

When dealing with the otherness of disability, the Victorians in their shame built huge out-of-sight asylums, and their legacy of “them” and “us” continues to this day. Two hundred years later, technologies offer us an alternative view. The digital age is shattering barriers, and what used to the norm is now being challenged.

What if we could change the environment, rather than the person? What if a virtual assistant could help a visually impaired person with their online shopping? And what if a robot “buddy” could help a person with autism navigate the nuances of workplace politics? These are just some of the questions that are being asked and which need answers as the digital age challenges our perceptions of normality.

Inside Baidu’s Billion Dollar Push To Become An AI Global Leader

Forbes, Yue Wang

from May 08, 2017

Baidu is banking on AI to make a strong comeback. It has amassed a 1,700-member team comprised of top talents from the world’s best universities and built four research labs in China and Silicon Valley, with the second Silicon Valley lab ready to accommodate 150 scientists, the company announced in March. In January, it hired Lu Qi, a Microsoft veteran who was the architect of the software giant’s AI effort, as its president and chief operating officer. Now Lu oversees Baidu’s AI research.

“Baidu’s strategic focus, organization, and resources have shifted increasingly towards AI,” Wang the vice president said.

The company has its advantages. Its online search app boasts 665 million monthly active users, whose search data and behavioral patterns can be harnessed for deep learning, a branch of AI specializing in teaching machines to learn by themselves. And because search encompasses so many different topics, Baidu has more diversified data tranches than other Chinese tech giants, which may allow it to train computers better, said Wang Shengjin, a professor at China’s prestigious Tsinghua University.

Money still missing as the plan to synthesize a human genome takes another step forward

Science, ScienceInsider, Ryan Cross

from May 08, 2017

Tuesday morning, more than 200 biologists, businesspeople, and ethicists will converge on the New York Genome Center in New York City to jump-start what they hope will be biology’s next blockbuster: Genome Project-write (GP-write), a still-unfunded sequel to the Human Genome Project where instead of reading a human genome, scientists create one from scratch and incorporate it into cells for various research and medical purposes. For example, proponents suggest that they could design a synthetic genome to make human cells resistant to viral infections, radiation, and cancer. Those cells could be used immediately for industrial drug production. With additional genome tinkering to avoid rejection by the immune system, they could be used clinically as a universal stem cell therapy.

The project got off to a bumpy start last year and despite the central rallying cry of a synthetic human genome, many of those attending the conference will bring in different expectations and ambitions. Some resent the unwanted attention and criticism that the project’s public objective has brought, saying it distracts from the goal of improving DNA synthesis technologies, because cheaper and faster methods to write DNA have many applications in applied and basic research. Others say that a made-to-order human genome is inevitable anyway, hoping to seize the publicity and controversy it creates as an opportunity to educate the public about synthetic biology.

A National Research Agenda for Intelligent Infrastructure

CCC Blog, Beth Mynatt

from May 04, 2017

The Computing Community Consortium (CCC) in collaboration with the Electrical and Computer Engineering Department Heads Association (ECEDHA) recently released eight white papers describing a collective research agenda for intelligent infrastructure. These papers draw from a large network of expertise including CCC Council members, former CCC Council members, CRA Board members, and other members of the academic and industry communities for a total of 40 different authors from 27 different institutions.

We will be blogging about each paper over the next few weeks. Today, we start with the overview paper: A National Research Agenda for Intelligent Infrastructure.

Price-bots can collude against consumers – Trustbusters might have to fight algorithms with algorithms

The Economist

from May 06, 2017

MARTHA’S VINEYARD, an island off the coast of Massachusetts, is a favourite summer retreat for well-to-do Americans. A few years ago, visitors noticed that petrol prices were considerably higher than in nearby Cape Cod. Even those with deep pockets hate to be ripped off. A price-fixing suit was brought against four of the island’s petrol stations. The judges found no evidence of a conspiracy to raise prices, but they did note that the market was conducive to “tacit collusion” between retailers. In such circumstances, rival firms tend to come to an implicit understanding that boosts profits at the expense of consumers.

No one went to jail. Whereas explicit collusion over prices is illegal, tacit collusion is not—though trustbusters attempt to forestall it by, for instance, blocking mergers that leave markets at the mercy of a handful of suppliers. But what if the conditions that foster such tacit collusion were to become widespread? A recent book* by Ariel Ezrachi and Maurice Stucke, two experts on competition policy, argues this is all too likely. As more and more purchases are made online, sellers rely increasingly on sophisticated algorithms to set prices. And algorithmic pricing, they argue, is a recipe for tacit collusion of the kind found on Martha’s Vineyard.

Events

Vancouver Sports Analytics Symposium and Hackathon

SFU Sports Analytics Club

from July 08, 2017

Vancouver, BC, Canada July 8-9 at Simon Fraser University, Harbour Centre. [free]

SciPy 2017 Accepted Talks and Posters

SciPy

from July 10, 2017

Austin, TX July 10-16. [$$$]

Cascadia R Conference

Portland R User Group

from June 03, 2017

Portland, OR June 3 at OHSU Collaborative Life Science Building [$$]

Deadlines

DARPA Broad Agency Announcement- Lifelong Learning Machines (L2M)

DARPA just released a Broad Agency Announcement on Lifelong Learning Machines (L2M) with a June 21, 2017, response date.

Macfang – Complexity Foundations and Applications of Network Geometry

Barcelona, Spain The Macfang workshop focuses on the role of space in complex networks. We bring exciting speakers from around the world to foster a leading collaborative view on the emergent field of Network Geometry. Deadline for abstract submissions is June 26.

Tools & Resources

A researcher is a beggar

klab, Jonas Kubilius

from May 05, 2017

While an outsider might imagine academia as being solely concerned with the eternal pursuit of Knowledge, in practice the head of a lab might be more concerned with raising enough funds to sustain the lab. Money is at the core of all operations. More money leads to more hires, and more hires produce more papers at more prestigious venues that drive more money and enable more ambitious research goals.

Which puts a researcher at a very uncomfortable position–that of a beggar. With nothing to sell––only an ever-deeper pile of papers––there is no way to “earn” funding, leaving a researcher only with the hope for alms from those above her.

Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See

Mapillary

from May 05, 2017

“We present the Mapillary Vistas Dataset—the world’s largest and most diverse publicly available, pixel-accurately and instance-specifically annotated street-level imagery dataset for empowering autonomous mobility and transport at the global scale.”

Machine Learning Pipelines for R

Alex Ioannides, When Localhost Isn't Enough blog

from May 08, 2017

“Building machine learning and statistical models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from these models requires repeated application of transformation and inverse-transformation functions – to go from the domain of the original input variables to the domain of the original output variables (via the model). This is usually quite a laborious and repetitive process that leads to messy code and notebooks.”

“The pipeliner package aims to provide an elegant solution to these issues.”

Kubernetes clusters for the hobbyist.

GitHub – hobby-kube

from April 18, 2017

This guide answers the question of how to setup and operate a fully functional, secure Kubernetes cluster on a cloud provider such as DigitalOcean or Scaleway. It explains how to overcome the lack of external ingress controllers, fully isolated secure private networking and persistent distributed block storage.

Be aware, that the following sections might be opinionated. Kubernetes is an evolving, fast paced environment, which means this guide will probably be outdated at times, depending on the author’s spare time and individual contributions. Due to this fact contributions are highly appreciated.

Experimental: Syntax AI with React.js

Codebox, Jon Sharrat

from May 08, 2017

So we have a big vision here at Codebox to keep experimenting. We are looking to start pushing the boundaries of what dev tools with intelligence could look like.

One such experiment that we have a working prototype internally is something called Syntax AI.

Careers

Postdocs

Postdoctoral researcher in Natural Language Understanding

University of Amsterdam; Amsterdam, The Netherlands

Sports.BradStenger.com

Data Science newsletter – May 9, 2017

Leave a Comment Cancel reply