Data Science newsletter – July 27, 2021

Newsletter features journalism, research papers and tools/software for July 27, 2021


Push to Address Long-Standing Challenges for Parents in STEMM

The Scientist, Amanda Heidt


The organizers behind a Mothers in Science conference say that it’s time academia provide more support to researchers who are pregnant or looking after children.

UVA: Max Welling will lead new Microsoft Research Lab in Amsterdam

Science|Business, University of Amsterdam


As of September 2021, Max Welling Informatics Institute, will be joining Microsoft to head the new Lab. He will keep his appointment at the University of Amsterdam as full professor and continue to supervise his students and postdocs.

Chris Bishop, and Max Welling, announced that the new satellite Microsoft Research Lab in Amsterdam, which, in collaboration with researchers at the labs in Cambridge, UK, and Beijing, China, will be focused on molecular simulation using machine learning.

Machine learning, molecular simulation, and the opportunity for societal good

Unlocking the challenge of molecular simulation has the potential to yield significant breakthroughs in how we tackle such societal issues as climate change, drug discovery, and the treatment of disease, and Microsoft is ramping up its efforts in the space.

More Variants Are Coming, and the U.S. Isn’t Ready to Track Them

Bloomberg Businessweek, Cynthia Koons


In late June, [Pandemic Response Lab] started gearing up to screen samples for “delta-plus,” a mixture of mutations first identified in India and South Africa. “On a weekly basis these variant definitions can change,” says Henry Lee, director of genomics at PRL. When they do, software must be updated, samples must be reanalyzed, and authorities must figure out how to respond. It’s like watching a Covid wildfire: Find a new mutation, watch it spread, calibrate a response, find a newer mutation, and watch the process repeat, hoping the response will be more effective this time out.

Lee’s lab sends its sequencing data to New York City’s health department, then epidemiologists use it to study questions such as whether a case is a reinfection or whether it happened in a vaccinated person. In this manner, they assess how dangerous a mutation is—how fast it might spread, how severe it might be, how effective vaccines are against it. Then public-health officials can decide whether to enact safety measures such as masking and quarantining. Sequencing data can also contribute to the long-term fight to end the pandemic by helping pharmaceutical companies plan booster shots.

And yet for all of the value genomic sequencing provides, no one paid the Pandemic Response Lab to start doing it.

Maine’s colleges are ready to shape the ‘future’ with new funds

News Center Maine, Sam Rogers


Governor Janet Mills signed the plan into law that will allocate more than $1 billion to various industries around Maine. $35 million is set to go to the UMaine System.

“Develop a workforce for Maine, develop talent for Maine, is a particular focus we are able to now act on,” she added.

While the flagship campus is based in Orono, Ferrini-Mundy added these dollars will be awarded to a set of projects at campuses around Maine that will make the biggest difference in two to three years.

Tokyo Olympics are the ultimate Covid-19 experiment

POLITICO, Ryan Heath


Tokyo 2020 CEO Toshiro Muto told reporters Tuesday: “We cannot predict what the epidemic will look like in the future.” He didn’t rule out a last-minute cancellation “should there be any surge of positive cases.”

Fewer than one-third of Japan’s citizens are vaccinated, and no one else has even thought to bring together tens of thousands of people from over 200 countries during the pandemic. There’s simply no way to know whether new variants might arise and spread.

International Olympic Committee (IOC) President Thomas Bach said on July 15 that there was “zero risk” of athletes passing on the virus to local residents. His first vice president, American Anita DeFrantz disagreed, telling POLITICO in an interview: “We all have to be worried. We know that this is a highly dangerous virus that’s going around, and of course the variants are even worse.”

This Is How All Laptops Should Be Made!! – Framework Laptop Teardown

YouTube, PCWorld


Gordon unboxes and tears down the Framework Laptop – a modular laptop built around easy repair and replacement. [video, 2:58]

Beeck Center, NGA launch ‘Data Labs’ to help states recover from pandemic

StateScoop, Colin Wood


The Beeck Center for Social Impact and Innovation at Georgetown University on Monday announced a new partnership with the National Governors Association to assist state government officials working on data-driven projects as they help their communities recover from the COVID-19 pandemic.

The training and technical assistance program, called Data Labs: Roadmap to Recovery, intends to pair state governments with data scientists and subject matter experts to take on issues including housing and homelessness, small business support, higher education and workforce.

“We’re really looking to equip states with an actionable plan to address one of these four issues,” said Tyler Kleycamp, a Beeck Center fellow and director of the state chief data officers network. “I almost look at it as these are things people would be doing over email, and if we can just bring them together for some focused activity around this in developing this plan, that they’ll actually be set up for success to implement it.”

Tuition at University of California Schools Is Going Up

Los Angeles Magazine, Oren Peleg


The University of California Board of Regents voted on Thursday to approve a plan increasing tuition beginning in the fall of 2022 and continuing for five years, at which time it will be up for a revote. The plan, which has been in the works since 2019, but was sidelined by the COVID-19 pandemic, raises tuition and fees for incoming undergraduate classes by 2 percent plus inflation. Rate hikes would cap out at 5 percent and would last six years per class.

Governor Gavin Newsom has opposed tuition increases, instead allotting a 3 percent base funding increase for the UC and California State University systems to close budget gaps deepened by the pandemic. That amounts to a roughly $1.3 billion funding increase for the ten campuses of the UC system—its largest ever single-year increase.

#ICML2021 in tweets

AI Hub, Lucy Smith


The thirty eighth International Conference on Machine Learning (ICML) is now in full swing and will run for the entirety of this week (18 – 24 July), in a virtual only format. This compilation of tweets gives a flavour of the talks, workshops, socials, and other events taking place.

Big Data-Derived Tool Facilitates Closer Monitoring Of Recovery From Natural Disasters

Texas A&M University, Texas A&M Today


By analyzing peoples’ visitation patterns to essential establishments like pharmacies, religious centers and grocery stores during Hurricane Harvey, researchers at Texas A&M University have developed a framework to assess the recovery of communities after natural disasters in near real time.

They say the information gleaned from their analysis would help federal agencies allocate resources equitably among communities ailing from a disaster.

“Neighboring communities can be impacted very differently after a natural catastrophic event,” said Ali Mostafavi, associate professor in the Zachry Department of Civil and Environmental Engineering and director of the Urban Resilience.AI Lab. “And so, we need to identify which areas can recover faster than others and which areas are impacted more than others so that we can allocate more resources to areas that need them more.”

Google Maps feature lets users check how crowded public transit is to help maintain social distancing

MobiHealthNews, Laura Lovett


Google is looking to help folks socially distance themselves with a Maps feature that lets users check how crowded public transit is in their area.

The tool uses artificial intelligence, historical locations trends and information provided by crowdsourcing from Google Map users to predict how busy public transit will be. The crowdedness predictions will be available to more than 10,000 transit agencies in 100 countries.

Currently, Google is piloting a feature in New York and Sydney, with the help of local transport agencies, which will let users see how full each transit car is.

BCS, Royal Statistical Society, Alan Turing Institute combine to cement data science professionalism

Computer Weekly (UK), Brian McKenna


The BCS, the Royal Statistical Society and the Alan Turing Institute are among the bodies that have combined to set up a data science “alliance” to establish professional and ethical standards across the profession. … Paul Fletcher, CEO of the BCS, said: “We are excited to be working with our alliance partners to define and manage standards in the burgeoning area of data science. The pandemic period has further demonstrated that data science provides significant value in creating insight and improving decision making.”

‘Pressing national challenge’: WSU to lead new regional cybersecurity education and research institute | The Seattle Times

The Seattle Times, The Spokesman-Review (Spokane, WA), Greg Mason


Washington State University Professor Bernard Van Wie points to data breaches at banks and other cyberattacks to illustrate the importance of cybersecurity in today’s world.

So to help train the next line of defense, WSU will use a two-year, $1.5 million Department of Defense grant to establish the Northwest Virtual Institute for Cybersecurity Education and Research.

The cyberoperations research and teaching center, set to start this fall, will offer four-year degree and certificate programs to students, as well as ROTC and Department of Defense-skilled civilian workers, in computer science and other majors in cyberbasics, -operations and -defense.

University offers new bachelor’s degree in data science

Washington University of St. Louis, The Source


Careers in data science are among the most in-demand in today’s world, yet there are more jobs available than qualified individuals to fill them. The McKelvey School of Engineering and the College of Arts & Sciences at Washington University in St. Louis are now offering a bachelor’s degree in data science to help meet this growing need.

The program is a collaboration between the Department of Computer Science & Engineering in McKelvey Engineering and the Department of Mathematics and Statistics in Arts & Sciences. Core faculty who will oversee the program include Marion Neumann, senior lecturer in computer science and engineering; Ron Cytron, professor of computer science and engineering; José E. Figueroa-López, professor of mathematics and statistics and director of undergraduate studies, and Soumendra Lahiri, the Stanley A. Sawyer Professor in Mathematics and Statistics.

Prince Charles and Jony Ive launch design lab with Amazon backing

Fast Company, Mark Wilson


Saving the environment is one of the most pressing design challenges of our time. And with just 100 days before the UN’s COP26 climate conference—where the world will convene to discuss action on climate change—the most prominent designer of our time has made a big announcement.

Sir Jony Ive is partnering with Prince Charles to open a design lab through London’s Royal College of Art. Called the Terra Carta Design Lab, its goal is to work with students to “create small designs that can make a big impact for the world’s transition to a sustainable future,” according to the press release.

The lab stems from the Terra Carta (which means “Earth Charter” in Latin). It’s a 17-page environmental charter also backed by Prince Charles and designed by Ive. Released earlier this year, it provides an environmental innovation framework to be backed by $10 billion in planned investments raised from the private sector. The lab itself is being funded by investments from Octopus Energy, the Islamic Development Bank, and Amazon.


Registration Open Through July 29 for LLNL’s Machine Learning for Industry Forum

HPC Wire, Lawrence Livermore National Laboratory


Online August 10-12. “The forum aims to foster and illustrate the adoption of machine learning methods for practical industrial outcomes, with a strong emphasis on manufacturing. Over the course of the event, attendees will engage in dialog about applications, tools and techniques and special topics centered around machine learning’s impact and potential in industry.”



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

OpenAI Codex shows the limits of large language models

VentureBeat, Ben Dickson


In a new paper, researchers at OpenAI have revealed details about Codex, a deep learning model that generates software source code. Codex powers Copilot, an “AI pair programmer” tool developed jointly by OpenAI and GitHub. Copilot is currently available in beta test mode to a limited number of users.

The paper is a fascinating read that explains the process through which the scientists at OpenAI managed to repurpose their flagship language model GPT-3 to create Codex. But more importantly, the paper also sheds much-needed light on how far you can trust deep learning in programming.

Machine learning in a hurry: what I’ve learned from the SLICED ML competition – Variance Explained

Variance Explained blog, David Robinson


This summer I’ve been competing in the SLICED machine learning competition, where contestants have two hours to open a new dataset, build a predictive model, and be scored as a Kaggle submission. Contestants are graded primarily on model performance, but also get points for visualization and storytelling, and from audience votes. Before SLICED I had almost no experience with competitive ML, so I learned a lot!

As of today I’m 5th in the standings, short of the cutoff for the playoffs, so if you want to see me continue you can vote for me as an “Audience Choice” here! (Though one of my favorite things about the competition is that audience members can compete along with Kaggle submissions, so I expect I’ll join in in any case!)

For four of the SLICED episodes (including the two weeks I was competing) I shared a screencast of my process.



PostDoc Associate

University of Colorado Boulder, Institute of Behavioral Science; Boulder, CO
Full-time positions outside academia

Chief Data Officer

Sage Bionetworks; Seattle, WA

Leave a Comment

Your email address will not be published.