Data Science newsletter – May 31, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 31, 2018

GROUP CURATION: N/A

 
 
Data Science News



Yes, Alexa is recording mundane details of your life, and it’s creepy as hell

MIT Technology Review, Rachel Metz


from

Since last year I’ve had a smart speaker in my living room—an Echo Dot. My family uses it mostly to ask Amazon’s digital assistant, Alexa, to play music. But after I saw a report that an Alexa-enabled speaker owned by a family in Portland, Oregon, had recorded a conversation and sent it to a contact, I started wondering: what is it picking up on at my house when we’re not talking to it directly?

So I checked my Alexa history (you can do that through the “settings” portion of the Amazon Alexa smartphone app) to see what kinds of things it recorded without my knowledge.

That’s when the hairs on the back of my neck started to stand up.

Beyond all the things I’ve clearly asked Alexa to do, in the past several months it has also tuned in, frequently several times a day, for no obvious reason. It’s heard me complain to my dad about something work-related, chide my toddler about eating dinner, and talk to my husband—the kinds of normal, everyday things you say at home when you think no one else is listening.

And that’s precisely why it’s terrifying: this sort of mundane chitchat is my mundane chitchat. I invited Alexa into our living room to make it easier to listen to Pandora and occasionally check the weather, not to keep a log of intimate family details or record my kid saying “Mommy, we going car” and forward it to Amazon’s cloud storage.


Applying Machine Learning Tools to Earthquake Data Offers New Insights

Columbia University, Earth Institute, State of the Planet blog


from

For all that seismologists have learned about earthquakes, new technologies show how much remains to be discovered.

In a new study in Science Advances, researchers at Columbia University show that machine learning algorithms could pick out different types of earthquakes from three years of earthquake recordings at The Geysers in California, one of the world’s oldest and largest geothermal fields. The repeating patterns of earthquakes appear to match the seasonal rise and fall of water-injection flows into the hot rocks below, suggesting a link to the mechanical processes that cause rocks to slip or crack, triggering an earthquake.

“It’s a totally new way of studying earthquakes,” said study coauthor Benjamin Holtzman, a geophysicist at Columbia’s Lamont-Doherty Earth Observatory. “These machine learning methods pick out very subtle differences in the raw data that we’re just learning to interpret.”


Query Understanding and Voice Interfaces

Medium, Query Understanding, Daniel Tunkelang


from

At best, speech recognition reduces the problem of query understanding with voice to the problem of query understanding. But the state of query understanding is far less mature than that of speech recognition. Indeed the ability of computers to recognize speech but not understand it can be particularly frustrating searchers who don’t distinguish the two problems.

And the biggest challenges come from interface constraints.


Subscribing to Your Patients — Reimagining the Future of Electronic Health Records

New England Journal of Medicine; Katherine Choi, M.D., Yevgeniy Gitelman, M.D., and David A. Asch, M.D.


from

Nearly all U.S. health care systems and many physician practices have by now migrated from paper charts to electronic health records (EHRs). But though this shift could have been a transformative change, current EHRs are largely digital remakes of traditional systems, just as many early motion pictures were merely plays captured on celluloid. In time, movies began using on-location settings and special effects to make the two-dimensional screen deeper than the three-dimensional stage.

As compared with other digital transformations that have redefined the way we consume information, the effect of EHRs on clinicians’ engagement seems limited and effortful. Physicians in the hospital can keep up with feeds on the Philadelphia Eagles, Taylor Swift, and the price of Bitcoin without consulting a newspaper. Yet they must still go to the chart to check on their patients. What would it be like to instead subscribe to Ms. Jones in room 328?

For one thing, receipt of important information on patients’ conditions would depend less on physicians’ remembering to search the chart. For stewardship of antibiotics and antiepileptics for inpatients, for example, Penn Medicine had established automatic medication expiration, but the system required that residents remember when renewals were due.


Smart-Kitchen Startups Give Cooks Digital Help Via Internet of Things

Xconomy, Angela Shah


from

When it comes to connected homes, the hottest spot is located in your kitchen.

The Internet of Things, artificial intelligence, and related technologies are being used to connect ovens, refrigerators, and other kitchen appliances to the Web. Feeling unsure about cooking that fish dish? Smart ovens can more accurately calibrate temperatures to avoid over-cooking. There’s even a robotics company in London called Moley that says it is developing an entire kitchen—complete with robotic arms—that can learn how grandma made her meatloaf and cook it for you, long after she’s gone.


Inside the livecoding algorave movement, and what it says about music

CDM Create Digital Music, Peter Kirn


from

Using code for live music has gone from geeky fringe to underground revolution, offering a fresh approach to music and pattern, even for first-time coders. Alex McLean is one of the people at the center of this medium’s growth.

“Code” in this environment isn’t just about software language. The free, open source tool Alex McLean created, TidalCycles, found its original inspiration in research into analyzing Indian tabla rhythms. These environments are very much about getting closer to the essence of what makes music tick – the underlying technology of musical culture.

With Alex set to play and teach this weekend in Berlin, it seemed the perfect time to get his personal and musical story, and to have him take us on a tour of live coding’s latest frontiers. Alex is a laptop musician and developer, as well as a social organizer for a movement that’s spreading these tools worldwide. Johnny Appleseed style, Alex and others are planting seeds and building up a network of people who share knowledge and support one another. From my vantage point, the live coding scene represents a kind of free, open, collective model for supporting musical practice and culture, in contrast to the generally competitive dominant paradigm.


Facebook Is Giving Scientists Its Data to Fight Misinformation

WIRED, Science, Robbie Gonzalez


from

Facebook is keeping a close eye on misinformation in the lead-up to 2018’s elections. Which elections, exactly? All of them, according to the team working within the company to combat fake news. That means Turkey in June, Mexico in July, Rwanda in September, Brazil in October, and the US in November, to list just a few. It’s a lot to keep track of, even—or perhaps especially—for a company as large, influential, and scrutinized as Facebook.

Which is why the company wants help. Last month, Facebook, together with the non-profit Social Science Research Council, announced an initiative that will connect independent researchers with Facebook’s vast and, until now, largely inaccessible troves of data on human behavior. The goal: investigate social media’s impact on elections and democracy.

The initiative is significant for many reasons, but here’s the big one: It will, for the first time, enable researchers to not only access Facebook’s data, but publish findings from that data without pre-approval from Facebook. That means if scientists uncover something in the social network’s data that makes it look bad, Facebook won’t be able to prevent them from making that information public.


Air Force-backed center to make machine learning more independent, predictable, secure

University of Wisconsin, News


from

Artificial intelligence has become so smart and commonplace that most people accept computer-generated restaurant recommendations or movie suggestions without blinking an eye. Underneath the virtual surface, however, much remains mysterious in the realm of machine learning, where systems attempt to mimic the remarkable way humans learn.

Machine learning capabilities aren’t yet up to the task of handling highly complex, rapidly changing or uncertain environments, and artificial intelligence can easily be tricked by false information from a clever adversary — critical situations for national defense.

In an effort to build the next generation of machine-learning methods to support its needs, the Air Force Office of Scientific Research and the Air Force Research Laboratory have awarded $5 million to establish a university center of excellence devoted to efficient and robust machine learning at the University of Wisconsin–Madison. The center also includes researchers from the Toyota Technological Institute at Chicago (TTIC).


New Machine Learning Approach Could Accelerate Bioengineering

Lawrence Berkeley Lab


from

Scientists from the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) have developed a way to use machine learning to dramatically accelerate the design of microbes that produce biofuel.

Their computer algorithm starts with abundant data about the proteins and metabolites in a biofuel-producing microbial pathway, but no information about how the pathway actually works. It then uses data from previous experiments to learn how the pathway will behave. The scientists used the technique to automatically predict the amount of biofuel produced by pathways that have been added to E. coli bacterial cells.

The new approach is much faster than the current way to predict the behavior of pathways, and promises to speed up the development of biomolecules for many applications in addition to commercially viable biofuels, such as drugs that fight antibiotic-resistant infections and crops that withstand drought.


AI and the Economy

SSRN; Jason Furman and Robert Seamans


from

We review the evidence that artificial intelligence (AI) is having a large effect on the economy. Across a variety of statistics—including robotics shipments, AI startups, and patent counts—there is evidence of a large increase in AI-related activity. We also review recent research in this area which suggests that AI and robotics have the potential to increase productivity growth but may have mixed effects on labor, particularly in the short run. In particular, some occupations and industries may do well while others experience labor market upheaval. We then consider current and potential policies around AI that may help to boost productivity growth while also mitigating any labor market downsides including evaluating the pros and cons of an AI specific regulator, expanded antitrust enforcement, and alternative strategies for dealing with the labor-market impacts of AI, including universal basic income and guaranteed employment.


How To Democratize Healthcare: AI Gives Everyone The Very Best Doctor

Andreessen Horowitz, Vijay Pande


from

The pressing question today is: Can new technologies slow or even reverse the exponentially rising costs to help truly democratize healthcare? The wealthiest patients today benefit not only from being able to afford the top medical services – but also maybe even to fly somewhere to get the opinion of more than one of the top doctors in the world. Imagine we could all do this – if to diagnose any condition, every patient called in, say, a conference call of the top 50 specialists in their field, who all drew upon their unique experiences and knowledge to confer and reach a consensus on an accurate diagnoses and treatment for that patient – who they had been following for years and years. Pretty good medical care, right? And probably the current best possible treatment for that patient. Unfortunately there’s no way in which this kind of approach or scenario is cost feasible – or scalable.

Magnifying and speeding up the human skill of evidence gathering and analysis is exactly what Artificial Intelligence and Machine Learning do best. They can bring 50 experts to bear for a single patient – by codifying the knowledge, taxonomy, and understanding of those experts. Machine learning is built on what the best doctors have learned, and now know: whether a suspicious looking mole is malignant or benign, whether an irregular heartbeat might be atrial fibrillation. Machine learning would be nothing without this essential human input; the technology trains on and scales the knowledge of the best doctors. And modern AI has the remarkable ability to keep learning, continuing to identify new features in the data which will give the most accurate diagnoses. This data is drawn not from a handful patients seen in an exam room but from thousands and thousands of examples – more than most specialists will ever see in a lifetime.


Innovation moving faster than precision medicine policy

MobiHealthNews, Tom Sullivan


from

As precision medicine and its related technologies take off in healthcare with rocket-like speed, many players in the space are advocating for a balance between innovation and regulation. Without policies in place, there are major mistakes that hospitals, researchers, clinicians, and policymakers will need to avoid to ensure that the healthcare system remains available to all, according to speakers at last week’s HIMSS Precision Medicine Summit in Washington DC.

“Precision medicine is here,” Dr. Michael Dulin, director of the Academy for Population Health Innovation at the University of North Carolina Charlotte, said. “We’re living it.”

Dulin, a physician-patient who himself has benefitted from precision medicine, cautioned that there are enormous mistakes to avoid.

“Our healthcare system is broken,” he said. “It’s inefficient, and precision medicine is an amazing tool but if we layer it on top of a broken system we can actually make things worse. Health disparities can get worse, we need to be very proactive when we’re thinking about how to integrate this into our delivery system.”

 
Events



AI Leaders Summit, Boston, June 21-22: Meet 100 AI innovators from top firms

The Forward Leading LTD


from

Boston, MA June 21-22. “A unique 2 day event to learn from and network with 100 AI innovators, robotics scientists, machine learning forerunners, data strategists and business leaders.” [$$$$]

 
Tools & Resources



Open Science Projects Collaborate on Joint Roadmap

JROST


from

A group of organizations building nonprofit, open-source tools for scholarship and publication has joined with open-science researchers in a new collaboration to develop a Joint Roadmap for Open Science Tools (JROST).

While open technologies and services are becoming essential in science practices, so far, there has been no holistic effort to align these tools into a coherent ecosystem that can support the scientific experience of the future. To draw this missing map, we’ve formed the Joint Roadmap as an informal group of like-minded people and organizations with shared goals.


Kepler.gl

Uber Open Source


from

Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.


Word Spaces – visualizing word2vec to support media analysis

MIT Media Lab, Media Cloud project


from

Media Cloud is a database of online news content, and a suite of tools for analyzing online media ecosystems. Researchers using it tend to focus on questions of attention, representation, influence, and language. We’re introducing a new feature today to support the last one on that list – visualizing language via word2vec word-embeddings with what we’re calling the “word space” chart. This feature was created and designed by Becky Bell and Rahul Bhargava. Here’s an example with the word space for a corpus of reporting on climate change in 2016 from American news sources:


Why you need to improve your training data, and how to do it

Pete Warden's blog


from

Andrej Karpathy showed this slide as part of his talk at Train AI and I loved it! It captures the difference between deep learning research and production perfectly. Academic papers are almost entirely focused on new and improved models, with datasets usually chosen from a small set of public archives. Everyone I know who uses deep learning as part of an actual application spends most of their time worrying about the training data instead.

There are lots of good reasons why researchers are so fixated on model architectures, but it does mean that there are very few resources available to guide people who are focused on deploying machine learning in production. To address that, my talk at the conference was on “the unreasonable effectiveness of training data”, and I want to expand on that a bit in this blog post, explaining why data is so important along with some practical tips on improving it.

 
Careers


Full-time, non-tenured academic positions

Research Software Engineer



Princeton University, Research Computing department; Princeton, NJ

Leave a Comment

Your email address will not be published.