Data Science newsletter – November 1, 2020

Newsletter features journalism, research papers and tools/software for November 1, 2020

GROUP CURATION: N/A

 

The Rise of ML Ops: Why Model Performance Monitoring Could Be the Next Billion-Dollar Industry

Two Sigma Ventures, Frances Schwiep and Vinay Iyengar


from

Over the past two decades, there has been a wave of application performance monitoring (APM) companies that has fundamentally changed the course of software development; Datadog, New Relic, PagerDuty, AppDynamics, Dynatrace, and Splunk have created nearly $90B in market cap and growing. APM has enabled companies to prevent outages, monitor uptime, and ultimately has catalyzed digital transformation and the migration to cloud. These days it is difficult to imagine mission-critical software that relies on manual troubleshooting and spot-checking instead of systematic tooling. But such ad-hoc, unscalable workflows are shockingly commonplace in the world of machine learning systems.*

Much like cloud-native computing has ushered in a new era of software development and tooling, we believe data-driven systems, enabled by machine learning, have the power to unlock the next wave of innovation. In turn, we believe there will be a need for an analogous set of model performance monitoring (MPM) tools to help with data quality, drift, model performance, and system reliability. Companies like WhyLabs, Toro Data, Mona Labs, Monte Carlo Data, LightUp, Soda Data, among many others, are just beginning to capitalize on what we see as a multi-decade long trend.


Why Data Science Must Inherently Be Interdisciplinary

Harvard Alumni Entrepreneurs podcast


from

With a greater impact into more aspects of our lives, data science is no longer exclusively a topic for scientists or engineers. Xiao-Li Meng, Professor of Statistics, and the Founding Editor-in-Chief of Harvard Data Science Review discusses the importance of a broader vision around our overall relationship with data and “data science”. He explains why philosophers and others should be an integral part of our discussions, especially given some of the serious ethical and social challenges we face as a result of data automation. [audio, 29:29]


Have you no humanities?

Santa Fe Institute


from

Later this year, supported by a grant from the National Endowment for the Humanities, SFI will launch a new “NEH institute,” Foundations and Applications of Humanities Analytics, to introduce early-career humanities scholars to new ways of studying culture with a wide range of computational tools. Leading researchers in the humanities and sciences will draw on models and ideas in information theory, statistics, and computer science for new insights into the dynamics of literature, history, and the arts.

In its first year, the institute will be offered online to a wide audience through SFI’s Complexity Explorer platform; in years two and three, a smaller group of students will be invited SFI for an in-person intensive school.

“This institute puts SFI in a position to bridge cultural gaps that don’t need to be there,” says philosopher David Kinney, an Omidyar Postdoctoral Fellow at SFI who is leading the new institute.


The Brown Institute’s Local News Lab is developing “smart paywalls” for local newsrooms

Nieman Journalism Lab, Vignesh Ramachandran


from

Paywalls are nothing new. But using advances in machine learning to make paywalls “smarter” could help resource-strapped local newsrooms up reader engagement — and in turn, hopefully revenue.

After years of testing, The Wall Street Journal has built a paywall that bends to the individual reader
February 22, 2018
That’s the first bet from The Brown Institute for Media Innovation’s new Local News Lab, an interdisciplinary team of journalists, engineers, data scientists, and designers. The new team is in the early days of partnering with small- and medium-sized local newsrooms to help them build more adaptable or “smart” paywalls — taking advantage of machine learning many larger newsrooms might already have resources to explore.


Learning the language of sugars: Deep learning and bioinformatics tools enable in-depth study of glycan molecules for understanding infections and disease

Harvard University, Wyss Institute


from

A team of scientists from the Wyss Institute for Biologically Inspired Engineering at Harvard University and the Massachusetts Institute of Technology (MIT) has cracked the glycan code by developing new machine learning and bioinformatics methods that enable researchers to systematically study glycans and identify sequences that play a role in the interactions of microbes and their host cells, as well as other still-unknown functions. The tools are presented in a new paper published today in Cell Host & Microbe, and are available online as a free Wyss WebApp that researchers can use to perform their own analyses of thousands of glycans.

“The language-based models that we have created can be used to predict whether and how a given glycan will be detected by the human immune system, thus helping us determine whether a strain of bacteria that harbors that glycan on its surface is likely to be pathogenic,” said first author Daniel Bojar, Ph.D., a Postdoctoral Fellow at the Wyss Institute and MIT.


Lundin Family giving gift to University of Arizona to create interdisciplinary school of mining and mineral resources

PR Newswire, Lundin Group


from

The program aims to encourage and prepare a new generation of professionals to enter the mining industry from different educational disciplines and specialties including finance, law, computer science, environment and social sciences, etc.

“We are very excited to be supporting such an important initiative alongside the University of Arizona. The drive toward a safer, more sustainable and efficient mining operation requires the very best talent across all disciplines, not just mining engineering and geology,” said Jack Lundin, President and CEO of Bluestone Resources Inc., one of the Lundin Group companies.


Princeton project expands to create a worldwide fleet of robotic floats to monitor ocean health

Princeton University, News


from

Scientists at Princeton University, Monterey Bay Aquarium Research Institute (MBARI), University of Washington, Scripps Institution of Oceanography at UC San Diego and Woods Hole Oceanographic Institution will use this grant to build and deploy 500 robotic ocean monitoring floats around the globe. The new program builds on the successful Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) project based at Princeton that has deployed similar floats in the ocean around Antarctica, proving their usefulness as year-round reporters of ocean chemistry and biological activity.

The new network of floats, called the Global Ocean Biogeochemistry Array (GO-BGC Array), will collect observations of ocean chemistry and biology between the surface and a depth of 2,000 meters.


Harnessing Big Data to Save Endangered Coral

Tufts University, Tufts Now


from

Finding a way to help the coral survive is at the heart of a National Science Foundation-funded project co-led by Lenore Cowen, a professor of computer science. She and researchers from four other universities are part of an interdisciplinary team trying to use data science methods to understand the exact factors that determine whether corals will be more or less resilient to stresses caused by climate change.

The researchers are taking a multi-disciplinary approach: using emerging bioinformatics data together with classical measures of coral health to understand differences across coral, and testing their hypotheses in laboratory conditions.


The math behind epidemics

Physics Today, Alison Hill


from

This article introduces the essential mathematical quantities that characterize an outbreak, summarizes how scientists calculate those numbers, and clarifies the nuances in interpreting them. For COVID-19, estimates of those quantities are being shared, debated, and updated daily. Physicists are used to distilling real-world complexity into meaningful, parsimonious models, and they can serve as allies in communicating those ideas to the public.


Commuting during COVID-19: Using Open Data to Predict NYC Subway Crowds

Medium, Data Clinic


from

Having recently released a set of data products from our NYC subway accessibility project, including cleaned and interpolated hourly turnstile data, a subway network map, and a crosswalk that joins subway stations, equipment, turnstile data, and General Transit Feed Specification (GTFS) data, we turned first to these sources. We envisioned an app that provided the city and commuters some way of understanding how crowded their route might be from the perspective of an average subway car. The goal would be to identify when train cars in a specific route were likely to be most crowded so that individuals might alter their travel time or trip if possible.


Unlocking AI’s Potential for Social Good

Harvard Business Review, Mark Purdy


from

New developments in AI could spur a massive democratization of access to services and work opportunities, improving the lives of millions of people around the world and creating new commercial opportunities for businesses. Yet they also raise the specter of potential new social divides and biases, sparking a public backlash and regulatory risk for businesses. For the U.S. and other advanced economies, which are increasingly fractured along income, racial, gender, and regional lines, these questions of equality are taking on a new urgency. Will advances in AI usher in an era of greater inclusiveness, increased fairness, and widening access to healthcare, education, and other public services? Or will they instead lead to new inequalities, new biases, and new exclusions?

Three frontier developments stand out in terms of both their promised rewards and their potential risks to equality. These are human augmentation, sensory AI, and geographic AI.


The Computational Limits of Deep Learning

The Data Exchange podcast, Ben Lorica


from

In this episode of the Data Exchange I speak with Neil Thompson, Research Scientist at Computer Science and Artificial Intelligence Lab (CSAIL) and the Initiative on the Digital Economy, both at MIT. I wanted Neil on the podcast to discuss a recent paper he co-wrote entitled “The Computational Limits of Deep Learning” (summary version here). This paper provides estimates of the amount of computation, economic costs, and environmental impact that come with increasingly large and more accurate deep learning models. [audio, 43:04]


Great promise but potential for peril – Ethical concerns mount as AI takes bigger Ethical concerns mount as AI takes bigger decision-making role in more industries

Harvard Gazette


from

Its growing appeal and utility are undeniable. Worldwide business spending on AI is expected to hit $50 billion this year and $110 billion annually by 2024, even after the global economic slump caused by the COVID-19 pandemic, according to a forecast released in August by technology research firm IDC. Retail and banking industries spent the most this year, at more than $5 billion each. The company expects the media industry and federal and central governments will invest most heavily between 2018 and 2023 and predicts that AI will be “the disrupting influence changing entire industries over the next decade.”

“Virtually every big company now has multiple AI systems and counts the deployment of AI as integral to their strategy,” said Joseph Fuller, professor of management practice at Harvard Business School, who co-leads Managing the Future of Work, a research project that studies, in part, the development and implementation of AI, including machine learning, robotics, sensors, and industrial automation, in business and the work world.


4 higher education experts on how COVID-19 is upending the college experience

Fast Company, Ainsley Harris


from

Leaders from 2U, Global Citizen Year, Minerva, and Pearson reveal which elements of the higher-ed ecosystem are getting stronger and which are poised to collapse.


An ed-tech specialist spoke out about remote testing software — and now he’s being sued

The Verge, Monica Chin


from

In early September, [Ian] Linkletter got an unexpected call from a reporter at the Vancouver Sun. Proctorio was suing him for tweeting the videos, as well as a screenshot of its website — the company claimed he’d infringed its copyright and distributed confidential material. Linkletter was under an injunction order.


Tools & Resources



A Taxonomy of Training Data: Disentangling the Mismatched Rights, Remedies, and Rationales for Restricting Machine Learning

Info Justice, Benjamin Sobel


from

“The chapter taxonomizes different applications of machine learning according to the qualities of their training data. Four categories emerge: (1) public-domain training data, (2) licensed training data, (3) market-encroaching uses of copyrighted training data, and (4) non-market-encroaching uses of copyrighted training data.”


Behavioral Testing of NLP models with CheckList

Amit Chaudhary


from

“In this post, I will explain the overall concept of CheckList and the various components that it proposes for evaluating NLP models.”


The best Low-Code Machine Learning Libraries in Python

Medium, Spatial Data Science, Abdishakur


from

“Low-code/No-code platforms and libraries enable users to run machine learning models easily by providing a ready-to-use code and functions. You can access these functions either through a web interface or writing minimal code.”

“While no-code platforms are the easiest way to train a Machine Learning model through drag and drop interface, they lack flexibility.”

“On the other hand, the low-code ML is the sweet spot and middle ground. They offer both flexibility and easy to use code. You still have to write some code, but that is bare minimum compared to other typical machine learning libraries.”


How do the first and second wave compare in Europe?

Twitter, Max Roser


from

On @OurWorldInData
we have a new page on *hospitalization data* in European countries.


Careers


Internships and other temporary positions

Undergraduate Research Internship – Computing



Microsoft Research; Redmond, WA; New York, NY; Cambridge, MA
Postdocs

Post-doc position available on Neuropixels Ultra project



University of Washington, Department of Biological Structure; Seattle, WA
Full-time positions outside academia

Staff Economist (Data Scientist) Home Timeline



Twitter; Seattle, WA, or San Francisco, CA
Tenured and tenure track faculty positions

Faculty Candidates



University of Washington, Paul G. Allen School of Computer Science & Engineering; Seattle, WA

Leave a Comment

Your email address will not be published.