Data Science newsletter – June 23, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 23, 2017

GROUP CURATION: N/A

 
 
Data Science News



DeepMind’s kidney disease-fighting Streams app is coming to a new hospital

Digital Trends, Kyle Wiggers


from

Google’s DeepMind, the British artificial intelligence firm behind the human-besting AlphaGo software, launched a healthcare platform in partnership with the U.K.’s Moorfields Eye Hospital and Royal Free London in 2015. Since then, it applied computer smarts to eye diagnoses, cancer screening, and electronic patient record management. On Wednesday, DeepMind broadened its efforts to Musgrove Park Hospital.

Starting this month, doctors and nurses at Musgrove Park will get DeepMind’s Streams app for iPhone, which helps spot early signs of acute kidney injury. DeepMind stresses that it is intended to aid, not replace, practitioners — the app will allow clinical staff to view “results of X-rays, scans or blood tests, in one place at the touch of a button.”


A couple more items not to miss …

Google released my favorite dataset ever: 10,000 images of street art. I cannot wait for data artists to set upon it with verve and creativity.

There’s an extremely helpful flowchart for figuring out if you can legally use or re-use an image. Let’s all try not to be sloppy when it comes to sharing. Pro-tip: sharing is not the same as publishing. To clarify what artists want, go play “Exposure Work” bingo by illustrator Immy Smith who has a great sense of irony about getting ripped off.

This data-driven piece on the opioid epidemic adds some new insights, like which other diagnoses accompany opioid use disorder: Hepatitis C, alcoholism, suicidal ideation, bipolar disorder, PTSD, failed back syndrome. It also provides data visualizations for elements we already know, like where cases are concentrated and the steep trend line in usage of the diagnosis “opioid use disorder”. They call it America’s worst drug crisis ever. Not sure I agree. Alcoholism has had a sad and grueling presence for decades.

Seafloor geodesy. That’s your vocab term for the week. Geodesy means “the branch of mathematics dealing with the shape and area of the earth or large portions of it” and is used by people like 79-year-old Jerry Paros who created a sensor to monitor minute changes near the ocean floor at fault lines to better predict earthquakes.

Someone had fun feeding a neural network proverbs and then letting it write its own. It got a few right: “A good wine makes the best sermon” but mostly it failed, “A mouse is a good bound to receive.” Proverbs are uniquely human, often playing on multiple meanings of the same word or employing rarely used words, like ‘ox’. Not surprising a neural net would struggle.


Eyes on Nature: How Satellite Imagery Is Transforming Conservation Science

Yale E360, Richard Conniff


from

High-resolution earth imagery has provided ecologists and conservationists with a dynamic new tool that is enabling everything from more accurate counting of wildlife populations to rapid detection of deforestation, illegal mining, and other changes in the landscape.


Why newsrooms should be thinking differently about social data

News Whip, Lisa Corcoran


from

We look at three ways that publishers should be rethinking how they use social metrics in their workflows.

We’re in an age where people across the world find social media to be their number one source of information each day.

This week, new research surveying 13,000 people in 13 countries found that social media is now judged to be the most relevant source of information worldwide. 59% cite it as important to their ‘personal understanding’ – ahead of television (57%), “word of mouth from friends and family” (45%), and online news (41%).


The Changing Interplay Between News, Government and Society

The Big Boulder Initiative


from

While we were enjoying Big Boulder 2016, a global event of far-reaching implications took place halfway across the world: Britain, against all polling predictions, voted to leave the European Union. Only five months later, the United States experienced our own upset with the presidential election, resulting in an outcome that even our best data analysts didn’t see coming. The aftermath of these events left a lot of people asking the same question: How were we all so wrong?

Deb Roy, Director of the Laboratory for Social Machines at the MIT Media Lab and Chief Media Scientist at Twitter, was already uniquely set up to study this very question. Using the collective power of the MIT Media Lab professionals, the team began collecting data in the months leading up to the election, attempting to determine the outcome. When those results were flagrantly different than anticipated, the team wanted to know why.


Jeff Bezos’ lessons from Washington Post for news industry

CNBC, Matt Rosoff


from

After nearly four years running the Post, which Bezos says turned a profit in 2016 and is expected to do the same this year, Bezos has some valuable lessons to pass along to the rest of the news industry, which is struggling to compete for ad dollars against online juggernauts Google and Facebook.

Bezos delivered some of this advice at the Future of Newspapers conference in Turin, Italy, on Wednesday. Here are the highlights.


Open Academic Search

Semantic Scholar, Marie Hagman


from

The internet was originally created for the distribution of academic information, but decades later much of the scholarly work produced is not easily accessible. Fortunately, many disparate efforts in the open access movement are working to change that. AI2 formed the Open Academic Search (OAS) working group to accelerate the pace of progress for groups engaged in these efforts.

Working in silos means duplication of foundational pieces and time and resources wasted. For example, anyone working on an academic search solution needs to parse and normalize metadata from PDFs, which means everyone has developed a different solution to this problem. What’s the best way and how much better could it be if we collaborated on a single solution?


What the Street!?

moovel lab


from

Let us first look at how much space there is in a city for moving around, and how it is allocated between bikes, rails, and cars. With What the Street!? – The Mobility Space Report, we set out to provide a public tool for exploring this urban mobility space and to answer our questions systematically, interactively, and above all, in a fun way. Inspired by recently developed techniques in data visualization of unrolling, packing, and ordering irregular shapes, we packed and rolled all mobility spaces into rectangular bins to visualize the areas they take up.


Ancient wisdom from the neural network

Janelle Shane, Letting neural networks be weird blog


from

A recurrent neural network (like the open-source char-rnn framework used here) can teach itself to imitate recipes, paint colors, band names, and even guinea pig names. By examining a dataset, it learns to formulate its own rules about it, and can use these rules to generate new text that – according to the neural network – resembles the dataset. But since the neural network is doing all this without cultural context, or any knowledge of what the words really mean, the results are often a bit bizarre.

In this example, the dataset is a list of more than 2000 ancient proverbs, collected by reader Anthony Mandelli. Some of these are well-known, such as “You can lead a horse to water, but you can’t make it drink.” and “Where there’s a will, there’s a way.” Others are frankly a bit strange: “Where there’s muck there’s brass.” and “A curst cow has short horns.” and “Be not a baker if your head is made of butter.”


Fog computing may be IoT’s computational model

Network World, Steven Max Patterson


from

Fog computing and fog networking could fill the latency and range gap in the internet of things (IoT.) For the last couple of years, researchers have been reporting on developments in fog’s role in completing IoT’s ubiquitous connectivity. It is similar to cloud computing architectures but it brings the cloud to the edge to meet the different demands of IoT.

The underlying concept is the cloud for some real-time IoT services could be too slow because the quality of service (QoS) specifications for the IoT application exceeds the cloud’s QoS. The solution is to move the cloud out into the network.


Illinois researchers build Dropbox-like storage, analytical system for scientific data

University of Illinois, Coordinated Science Laboratory


from

As an Illinois graduate student who specializes in fabricating new semiconductor lasers, Tommy O’Brien is like a chef, making new wafers out of his own special recipe of materials, temperature, and baking time. But unlike the technology he’s designing, his method for preserving and analyzing his recipe is anything but cutting edge: he copies each recipe into a notebook by hand.

“I basically have to write down everything step by step, and it’s incredibly time-consuming if I have to go back and search for information later,” said O’Brien, who is pursuing his doctorate in electrical and computer engineering. “It’s equally cumbersome to share information about successes or failures with my lab mates. It often doesn’t happen at all.”


Teaching the Web of Things at MIT

MIT Media Lab, Web of Things, Dominique Guinard


from

What once seemed a rather sub-optimal idea has now grown into being a de-facto standard for connected products in the IoT. As a result of increasing research and industry interest around the Web of Things, we were invited by MIT in frame of their IoT Bootcamp to craft a WoT lecture. To goal of the lecture was to teach the fundamentals of the Web of Things but also go deeper into the practical benefits of using the Web to build the IoT.

It then becomes more technical and follows the layers of Web of Things Architecture proposed in Dom’s PhD thesis and described in our book “Building the Web of Things”. Basically the second part of the lecture is a fast overview of all the chapters of the book and offering a selection of labs to give the audience a sense of what the Web of Things has to offer.

Teaching this lecture at MIT was obviously a fantastic experience nonetheless because a lot of the WoT research was and is done within MIT walls (in particular at the AutoID labs but at now also at W3C). For this IoT boot camp, the organisers at the MIT AutoID labs had selected an impressive crowd of about 80 applicants for a most intense one week lecture on all things IoT. Participants ranged from key technical people at major corporations to sales teams and talented students from all over the world.


New IST research leverages big data to predict severe weather

Penn State University, Penn State News


from

Every year, severe weather endangers millions of people and causes billions of dollars in damage worldwide. But new research from Penn State’s College of Information Sciences and Technology (IST) and AccuWeather has found a way to better predict some of these threats by harnessing the power of big data.

The research team, including doctoral student Mohammad Mahdi Kamani and his thesis adviser, IST professor James Wang, computer science and engineering doctoral student Farshid Farhat, and AccuWeather forensic meteorologist Stephen Wistar, has developed a new approach for identifying bow echoes in radar images, a phenomenon associated with fierce and violent winds.


Farming in the Fourth Dimension

AgWeb, Ben Potter


from

Currently, farmers can use precision ag to construct 2-D images or even 3-D reconstructions of their fields. But a collaborative research project from three Georgia institutions wants to take monitoring abilities into the fourth dimension.


The Human Brain Project Reboots: A Search Engine for the Brain Is in Sight

IEEE Spectrum, Megan Scudellari


from

“We don’t have the faintest idea of the molecular basis for diseases like Alzheimer’s or schizophrenia or others. That’s why there are no cures,” says Paolo Carloni, director of the Institute for Computational Biomedicine at Jülich. “To make a big difference, we have to dissect [the brain] into little pieces and build it up again.”

That’s why there’s no choice but to move from small-scale investigations to large, collaborative efforts. “The brain is too complex to sit in your office and solve it alone,” says neuroscientist Katrin Amunts, who coleads the 3D-PLI project at Jülich. Neuroscientists need to make the same transition that physicists and geneticists once did—from solo practitioners to consortia—and that transformation won’t be easy.


Growing More with Less Using Cell Phones and Satellite Data

Eos, Faisal Hossain, Nishan Biswas, Muhammad Ashraf, and Ahmad Zeeshan Bhatti


from

Researchers from the University of Washington and Pakistan are using 21st century technology to revive farming as a profitable profession in the Indus Valley.

 
Events



LEVERS FOR HEALTH – State of the Science Summit

NCEAS, SNAPP


from

Santa Barbara, CA June 27. Speakers will consider pressing public
health and conservation problems and how ecological approaches can help find joint solutions. [free]


Data Driven Storytelling: A Deep Dive into Visualization Techniques

AAAS


from

Washington, DC Friday, July 14. A data visualization and storytelling event organized by AAAS in collaboration with the South Big Data Hub and West Big Data Innovation Hub. [free, registration required]


IJCAI-17 Workshop on Explainable Artificial Intelligence (XAI)

International Joint Conference on Artificial Intelligence


from

Melbourne, Australia August 20. [$$$]

 
Deadlines



92Y / Gates DataDive Data Ambassador Application Form

DataKind will be running a DataDive August 4-6 with the 92Y and Bill & Melinda Gates Foundation to tackle exciting data challenges related to philanthropic giving and we need YOU to make it a success. … “We’re looking for 8-12 incredible Data Ambassadors – volunteer liaisons – to work with us ahead of time to scope these projects, prepare the data, and lead teams to glory during the event.”

Big Data in Health and Biology

Hinxton, England September 25-27. Produced by Wellcome Genome Campus and EMBL. Deadline for abstracts is July 18.

General AI Challenge, The Warm-up Round

You will be programming and training an AI agent that will engage in a dialogue with the CommAI-Env environment. They will exchange bytes of information, and in addition the environment will give feedback signals to the agent to guide its behavior. Deadline for solution submissions is August 14.

Passenger Screening Algorithm Challenge

Improve the accuracy of the Department of Homeland Security’s threat recognition algorithms. Deadline for pre-trained models and external data posting is December 4.
 
Tools & Resources



Probabilistic programming from scratch

O'Reilly Radar, Mike Williams


from

This article contains highlights from a series of three Orioles (interactive online tutorial notebooks) on probabilistic programming from scratch. In the Orioles, Mike Lee Williams explains both the mathematical background and the Python code more deeply, and delves into a variety of real-world statistical problems.


Learning to Reason with Neural Module Networks

The Berkeley Artificial Intelligence Research Blog, Jacob Andreas


from

“The dominant paradigm in deep learning is a “one size fits all” approach: for whatever problem we’re trying to solve, we write down a fixed model architecture that we hope can capture everything about the relationship between the input and output, and learn parameters for that fixed model from labeled training data.”

“But real-world reasoning doesn’t work this way: it involves a variety of different capabilities, combined and synthesized in new ways for every new challenge we encounter in the wild. What we need is a model that can dynamically determine how to reason about the problem in front of it—a network that can choose its own structure on the fly. In this post, we’ll talk about a new class of models we call neural module networks (NMNs), which incorporate this more flexible approach to problem-solving while preserving the expressive power that makes deep learning so effective.”


[1706.06905] Learnable pooling with Context Gating for video classification

arXiv, Computer Science > Computer Vision and Pattern Recognition; Antoine Miech, Ivan Laptev, Josef Sivic


from

Common video representations often deploy an average or maximum pooling of pre-extracted frame features over time. Such an approach provides a simple means to encode feature distributions, but is likely to be suboptimal. As an alternative, we here explore combinations of learnable pooling techniques such as Soft Bag-of-words, Fisher Vectors, NetVLAD, GRU and LSTM to aggregate video features over time. We also introduce a learnable non-linear network unit, named Context Gating, aiming at modeling interdependencies between features. We evaluate the method on the multi-modal Youtube-8M Large-Scale Video Understanding dataset using pre-extracted visual and audio features. We demonstrate improvements provided by the Context Gating as well as by the combination of learnable pooling methods. We finally show how this leads to the best performance, out of more than 600 teams, in the Kaggle Youtube-8M Large-Scale Video Understanding challenge.


6 ways people are making money with machine learning

Medium, Aaron Dell


from

There has been some astonishing leaps forward in artificial intelligence and machine learning, but none of it is going to matter if it doesn’t offer a return on your investment. So how do you make machine learning useful? Here are some real life examples of how machine learning is saving companies time and money:

1. Find stuff


Accelerating Deep Learning Research with the Tensor2Tensor Library

Google Research Blog, Łukasz Kaiser


from

“We are happy to release Tensor2Tensor (T2T), an open-source system for training deep learning models in TensorFlow. T2T facilitates the creation of state-of-the art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers (Attention Is All You Need, Depthwise Separable Convolutions for Neural Machine Translation and One Model to Learn Them All) to help kick-start your own DL research.”


Knowledge Base

FutureTDM


from

“FutureTDM Knowledge Base showcases structured collections of resources on Text and Data Mining (TDM) that has been gathered throughout the FutureTDM project phase. The collections encompass experts as projects or organisations focusing on TDM, as well as technologies and resources that are useful for TDM practitioners (i.e. TDM methods and TDM tools).”


Creating Protest Event Databases: A New Protocol

Pamela Oliver


from

In constructing and analyzing new data on Black protests, I’ve come to realize the importance of improving standard protocols for collecting protest event data. Although protest event data has one record per event, these events are recorded in news accounts that talk about multiple events in one article and the same event is often described in several different articles. The usual data collection protocols do not retain this information. The important Dynamics of Collective Action data set records only one article per event and the total number of articles, and discards identifiers for additional articles about the event. All research teams building protest event databases have to reconcile “duplications,” events that are described in more than one report. Past researchers have done this reconciliation “off stage” and yield only one composite record per event pointing to only one news article per event (if they document the publication source at all).

Replicable protest event data construction requires a structured process that creates a chain of files that documents the data collecting and coding process. The sources of event data can be documented with at least two shareable files, an event file and an event-article crosswalk file. A modification of past research procedures can make it possible to collect and process protest event data in a way that preserves this multiple-article-per-event data and makes the coding process better documented and more replicable with little if any additional work. The result will be higher quality data.

 
Careers


Full-time positions outside academia

Core Data Science Team



Facebook; New York, NY
Full-time, non-tenured academic positions

Research Associate, Spatiotemporal modelling



University of Münster; Münster, Germany

Leave a Comment

Your email address will not be published.