Data Science newsletter – August 2, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for August 2, 2019


Data Science News

First Programmable Memristor Computer

IEEE Spectrum, Samuel K. Moore


Hoping to speed AI and neuromorphic computing and cut down on power consumption, startups, scientists, and established chip companies have all been looking to do more computing in memory rather than in a processor’s computing core. Memristors and other nonvolatile memory seem to lend themselves to the task particularly well. However, most demonstrations of in-memory computing have been in standalone accelerator chips that either are built for a particular type of AI problem or that need the off-chip resources of a separate processor in order to operate. University of Michigan engineers are claiming the first memristor-based programmable computer for AI that can work on all its own.

“Memory is really the bottleneck,” says University of Michigan professor Wei Lu. “Machine learning models are getting larger and larger, and we don’t have enough on-chip memory to store the weights.” Going off-chip for data, to DRAM, say, can take 100 times as much computing time and energy. Even if you do have everything you need stored in on-chip memory, moving it back and forth to the computing core also takes too much time and energy, he says. “Instead, you do the computing in the memory.”

Nature Cover Story | Chinese Team’s ‘Tianjic Chip’ Bridges Machine Learning and Neuroscience in Pursuit of AGI



Many AI experts believe humanlike artificial general intelligence (AGI) is but a far-fetched dream, while others find their inspiration in the quest for AGI. Speaking at last November’s AI Frontiers Conference, OpenAI Founder and Research Director Ilya Sutskever said “We (OpenAI) have reviewed progress in the field over the past few years. Our conclusion is near-term AGI should be taken as a serious possibility.”

Today, respected scientific journal Nature boosted the case for AGI with a cover story on a new research paper, Towards artificial general intelligence with hybrid Tianjic chip architecture, which aims to stimulate AGI development by adopting generalized hardware platforms.

Data collected by governments can be useful to researchers, but only when accessed carefully

The Conversation; Jack Teng, Kieran C O'Doherty, Kimberlyn McGrail, Michael Burgess


Personal data are being used to make decisions in the public and private sectors. Public concerns over how data are used, stored and who has access to them is forcing government agencies to take a new look at the data they collect and what they do with it.

Our recent research, published in the International Journal of Population Data Science, reports the recommendations made by 24 members of the public during a public deliberation — a kind of public engagement event. This event was held in Vancouver, B.C., and was hosted by Population Data BC, an organization at the University of British Columbia that facilitates data sharing for research and provides education and training on data use.

Ag-Analytics helps farmers and researchers use AI to prepare for climate change

The Official Microsoft Blog, Microsoft On the Issues, Bonnie Lei


We’re excited to have team members from Microsoft and Ag-Analytics together at the InfoAg Conference in Missouri, discussing their latest and greatest advances, including a host of new datasets and application programming interfaces (APIs). APIs are a key component of the AI for Earth program, because easier recreation of our grantees’ successes enables broader, more scalable impact. As a result, we help grantees expand their models into these reproducible APIs, providing them with expertise and back-end support.

One of Ag-Analytics’ new APIs is a key example of this value. Focused on locational data, the API service is useful for Ag-Analytics users, enabling them to retrieve statistics and a full imaging history for a set of coordinates. This data can educate them on the conditions of that area, which in turn can help them make more informed decisions. Through inclusion in Microsoft’s API gallery, it can go a step further, helping decision makers gather that key data on sustainable farming, or aiding researchers on a myriad of important studies.

New Minor in Data Science and Engineering Offers Students Key Skills

Rensselaer Polytechnic Institute, RPI News


Beginning in the fall semester, all undergraduates at Rensselaer Polytechnic Institute will have the opportunity to pursue a multidisciplinary minor in Data Science and Engineering. The course of study will prepare students to combine techniques from systems engineering and data science to address global challenges.

Sampling the Neighborhoods of the Gut Microbiome

Columbia University Irving Medical Center


One big challenge is the spatial distribution of different microbes, which are not evenly distributed throughout the gut. The gut microbiome is like a large city, with multiple neighborhoods, each with its own mix of occupants and features.

A new method developed by researchers at Columbia University Vagelos College of Physicians and Surgeons should help scientists locate and characterize these neighborhoods, which could shed light on how microbes influence the health of their hosts.

New computational method could advance precision medicine

Oregon Health Sciences University, OHSU News


Scientists have devised a new computational method that reveals genetic patterns in the massive jumble of individual cells in the body.

The discovery, published in the journal eLife, will be useful in discerning patterns of gene expression across many kinds of disease, including cancer. Scientists worked out the formulation by testing tissue taken from the testes of mice. Results in hand, they’re already applying the same analysis to biopsies taken from men with unexplained infertility.

“There have been very few studies that attempt to find the cause of any disease by comparing single-cell expression measurements from a patient to those of a healthy control. We wanted to demonstrate that we could make sense of this kind of data and pinpoint a patient’s specific defects in unexplained infertility,” said co-senior author Donald Conrad.

Will you make your flight? Pittsburgh International now has technology to tell you

Pittsburgh Post-Gazette, Mark Belko


Airport officials have teamed with Oakland-based software firm Zensors to tell travelers how long they will be waiting to clear security, give or take a couple of minutes.

The information is available on monitors in the airport’s Landside Building that keep track of arrivals and departures, as well as on its website,

Finding novel materials for practical devices

MIT News, MIT Energy Initiative


In recent years, machine learning has been proving a valuable tool for identifying new materials with properties optimized for specific applications. Working with large, well-defined data sets, computers learn to perform an analytical task to generate a correct answer and then use the same technique on an unknown data set.

While that approach has guided the development of valuable new materials, they’ve primarily been organic compounds, notes Heather Kulik PhD ’09, an assistant professor of chemical engineering. Kulik focuses instead on inorganic compounds — in particular, those based on transition metals, a family of elements (including iron and copper) that have unique and useful properties. In those compounds — known as transition metal complexes — the metal atom occurs at the center with chemically bound arms, or ligands, made of carbon, hydrogen, nitrogen, or oxygen atoms radiating outward.

Transition metal complexes already play important roles in areas ranging from energy storage to catalysis for manufacturing fine chemicals — for example, for pharmaceuticals. But Kulik thinks that machine learning could further expand their use.

Potholes, Machine Learning, and Compelling Content

Inside Higher Ed, Eric Stoller


Engineering students at LMU were interested in how a new technology – machine learning – could be used to help detect potholes using camera footage in Los Angeles. It was a fascinating premise. With thousands of potholes, the state of L.A.’s roads are like that of any major city…cracked and pitted.

Using an open-source machine learning platform from Google, LMU’s students were able to train a model (an algorithm + data) to detect road cracks and potholes.

DeepMind touts predictive healthcare AI ‘breakthrough’ trained on heavily skewed data

TechCrunch, Natasha Lomas


DeepMind, the Google-owned U.K. AI research firm, has published a research letter in the journal Nature in which it discusses the performance of a deep learning model for continuously predicting the future likelihood of a patient developing a life-threatening condition called acute kidney injury (AKI).

The company says its model is able to accurately predict that a patient will develop AKI “within a clinically actionable window” up to 48 hours in advance.

In a blog post trumpeting the research, DeepMind couches it as a breakthrough — saying the paper demonstrates artificial intelligence can predict “one of the leading causes of avoidable patient harm” up to two days before it happens.

How New York can be the new Silicon Valley — without Amazon

New York Post, Oliver Libby


What New York doesn’t have is a culture that plucks local talent and pushes it into tech. It lacks an environment that makes opening a business and meeting local standards straightforward and reasonable, and not ruinously expensive. It fails to provide enough affordable housing or uniformly good public schools where members of the startup economy can live and raise their families.

There are many ways to fix this. The success of Cornell Tech is a start.

Artificial Intelligence May Be Key to Better Weather Forecasts

Eos; Sid-Ahmed Boukabara, Vladimir Krasnopolsky, Jebb Q. Stewart, Stephen G. Penny, Ross N. Hoffman, and Eric Maddy


Weather forecasting has evolved significantly from the late 1800s, when telegraph and telephone technology first allowed multiple weather stations to share observations and develop a synoptic view of weather systems as they moved across the country. In the late 20th century, satellites enabled another leap forward, providing an “eye in the sky” to monitor Earth system phenomena that include tropical storms evolving into hurricanes. Now, computer models integrate huge volumes of data, producing everything from simulations of long-term climate trends to nowcasts that predict small weather events just a few minutes or hours into the future.

To adapt machine learning to weather-related applications, it is critical to meet certain fundamental needs at multiple spatial and temporal scales for diverse geophysical domains.
And the progress continues: Nowcasting and weather forecasting are on the brink of a major paradigm shift. New approaches are required to enable full Earth system prediction and to make cost-effective use of the dramatic increase in volume, diversity, and capabilities of observations (particularly satellite observations) and environmental products. We predict that machine learning (ML) and other artificial intelligence techniques will have to supplement or supplant major components of operational systems. Fortunately, the fields of satellite remote sensing and numerical weather prediction (NWP) are poised to take advantage of recent years’ rapid advances in ML.

Press releases are bullshit, exhibit umpteen thousand.

Twitter, Calling Bullshit


But that’s not the worst thing about the story. The worst thing is that these are leaked data. This study clearly lacks informed consent by the participants. Now I’m sure an IRB (an institutional ethics oversight board) signed off on it. But they shouldn’t have. … The use of stolen data in academic research furthers my conviction that data scientists, like other professions, need a code of ethics. Given the extraordinary power to extract pattern, cross-correlate data, and de-identify individuals, respecting privacy should be paramount.

Building the case for actionable ethics in digital health research supported by artificial intelligence

BMC Medicine; Camille Nebeker, John Torous & Rebecca J. Bartlett Ellis


The digital revolution is disrupting the ways in which health research is conducted, and subsequently, changing healthcare. Direct-to-consumer wellness products and mobile apps, pervasive sensor technologies and access to social network data offer exciting opportunities for researchers to passively observe and/or track patients ‘in the wild’ and 24/7. The volume of granular personal health data gathered using these technologies is unprecedented, and is increasingly leveraged to inform personalized health promotion and disease treatment interventions. The use of artificial intelligence in the health sector is also increasing. Although rich with potential, the digital health ecosystem presents new ethical challenges for those making decisions about the selection, testing, implementation and evaluation of technologies for use in healthcare. As the ‘Wild West’ of digital health research unfolds, it is important to recognize who is involved, and identify how each party can and should take responsibility to advance the ethical practices of this work. While not a comprehensive review, we describe the landscape, identify gaps to be addressed, and offer recommendations as to how stakeholders can and should take responsibility to advance socially responsible digital health research. [full text]


Library Carpentry, Providence 2019



Providence, RI October 21, starting at 9 a.m. “NEASIST, Brown University Library, and the National Network of Libraries of Medicine, New England Region are pleased to sponsor a two-day library carpentry workshop.” [$$]


NeurIPS 2019 Workshop on Machine Learning with Guarantees

Vancouver, BC, Canada “This one-day workshop will bring together researchers in both theoretical and applied machine learning, across areas such as statistical learning theory, adversarial learning, fairness and privacy, to discuss the problem of obtaining performance guarantees and algorithms to optimize them.” Deadline for submissions is September 18.

NeurIPS 2019 Accepted Competitions

Deadline for competitions are in between September 27 and November 7.

CHIIR 2020 – Calls

Vancouver, BC, Canada March 14-18. “CHIIR provides a forum for the dissemination and discussion of research on the user-centered aspects of information interaction and information retrieval.” Deadline for Workshop and Tutorial proposals is October 1. Deadline for full papers and Perspectives papers is October 15.
Tools & Resources

Robust Neural Machine Translation

Google AI Blog, Yong Cheng


In “Robust Neural Machine Translation with Doubly Adversarial Inputs” (to appear at ACL 2019), we propose an approach that uses generated adversarial examples to improve the stability of machine translation models against small perturbations in the input. We learn a robust NMT model to directly overcome adversarial examples generated with knowledge of the model and with the intent of distorting the model predictions. We show that this approach improves the performance of the NMT model on standard benchmarks.

How Not to Run a Panel – Panel discussions can be very boring, but they don’t have to be if you follow these six rules.

The Atlantic, Yascha Mounk


If you are ever in charge of organizing a panel, it is your professional responsibility—nay, your sacred moral duty—to avoid some of the tortures daily visited upon suspecting conference-goers. In that spirit, here are my six rules for (panel) life.

1. Don’t have more than four people onstage.

Announcing PartiQL: One query language for all your data

AWS Open Source Blog; Yannis Papakonstantinou, Almann Goo, Brad Ruppert, and Jon Wilsdon


We are happy to announce PartiQL, a SQL-compatible query language that makes it easy to efficiently query data, regardless of where or in what format it is stored. As long as your query engine supports PartiQL, you can process structured data from relational databases (both transactional and analytical), semi-structured and nested data in open data formats (such as an Amazon S3 data lake), and even schema-less data in NoSQL or document databases that allow different attributes for different rows. We are open sourcing the PartiQL tutorial, specification, and a reference implementation of the language under the Apache2.0 license, so that everyone can participate, contribute, and use it to drive widespread adoption for this unifying query language.

Machine Learning That’s Light Enough for an Arduino

IEEE Spectrum, Limor Fried


Pete Warden and his team have done amazing work in bringing TensorFlow Lite to chips based on ARM’s Cortex family of processors. This was great to discover, because at my open-source hardware company, Adafruit Industries, our current favorite processor is the 32-bit SAMD51, which incorporates a Cortex-M4 CPU. We’ve used the SAMD51 as the basis for many of our recent and upcoming Arduino-compatible boards, including the PyGamer, a simple battery-powered gaming handheld. What if we could use it to literally put AI into people’s hands?

Warden had created a speech-recognition model that can identify the words “yes” and “no” in an analog audio feed. I set about seeing if I could bring this to the PyGamer, and what I might do with a model that could recognize only two words. I wanted to create a project that would spark the imagination of makers and encourage them to start exploring machine learning on this kind of hardware.

Leave a Comment

Your email address will not be published.