Data Science newsletter – June 2, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for June 2, 2020


Data Science News

Machine learning helps map global ocean communities

MIT News


One way scientists delineate marine communities is through satellite images of chlorophyll, the green pigment produced by phytoplankton. Chlorophyll concentrations can indicate how rich or productive the underlying ecosystem might be in one region versus another. But chlorophyll maps can only give an idea of the total amount of life that might be present in a given region. Two regions with the same concentration of chlorophyll may in fact host very different combinations of plant and animal life.

“It’s like if you were to look at all the regions on land that don’t have a lot of biomass, that would include Antarctica and the Sahara, even though they have completely different ecological assemblages,” says Maike Sonnewald, a former postdoc in MIT’s Department of Earth, Atmospheric and Planetary Sciences.

Now Sonnewald and her colleagues at MIT have developed an unsupervised machine-learning technique that automatically combs through a highly complicated set of global ocean data to find commonalities between marine locations, based on their ratios and interactions between multiple phytoplankton species.

Challenges to youth health equity during COVID-19

Public Health Newswire, Kristin Mmari and Tamar Mendelson,


The ongoing pandemic is highlighting and worsening longstanding health inequities resulting from income inequality and structural racism. For young people of color in low-income communities, the effects of COVID-19 are especially dire, as many have already been disconnected from key systems of support.

Adolescence and young adulthood are critical developmental stages, when rapid physical, cognitive and emotional changes occur. Young people learn to navigate peer and romantic relationships, evolve their personal values and identity and become increasingly independent. During the pandemic, however, many of these important milestones have been disrupted.

Social distancing measures and school closures have not only led to cancellations of big events, such as prom and in-person graduation ceremonies, but also have limited important day-to-day opportunities for peer social connection, identity development and independence. With structured in-person education not possible, many students will fall behind academically.

Stanford Medicine study details molecular effects of exercise

Stanford University, Stanford Medicine, News Center


A simple blood test may be able to determine how physically fit you are, according to a new study conducted by scientists at the Stanford University School of Medicine.

The test could complement treadmill tests, a more traditional clinical evaluation of fitness, and provide individuals with far more nuanced information about their body’s molecular response to exercise.

The blood test is an offshoot of a complex study conducted by a team of researchers that took hundreds of thousands of molecular measurements from a group of individuals before and after exercising.

“Everybody knows exercise is good for you, but we really don’t know what drives that at a molecular level,” said Michael Snyder, PhD, professor and chair of genetics. “Our goal at the outset was to conduct a highly comprehensive analysis of what’s happening in the body just after exercising.”

UVA to Lead Study of How Big Data Can Battle Future Pandemics

WVTF, Sandy Hausman


With ten million dollars from the National Science Foundation, computer scientists at the University of Virginia, Virginia Tech and 13 other schools have begun to tackle a massive problem with the power of big data and computers. Chris Barrett at UVA’s Biocomplexity Institute says now – in the 21st century – mankind is armed to confront the ancient problem of plagues.

“We have a nice small word for it – pandemic – but it’s a very big, complicated worldwide object,” he explains. “To hold it in your head is impossible. We have to use computers to do that.”

And Professor Madhav Marathe, the lead researcher on this project, says it will involve people from many disciplines. [audio, 1:20]

Duke Launches Graduate Certificate in AI for Product Innovation

Duke University, Pratt School of Engineering


Artificial intelligence (AI) and machine learning are creating immense opportunities for innovation in products and services across every industry, from health care to energy to manufacturing—and beyond.

Remarkable growth in the availability of raw data, combined with major advances in computational power and new modeling techniques and tools, have driven a boom in the AI/ML field. To remain competitive, organizations are racing to leverage the value of their data through the application of advanced analytics.

To meet this challenge, Duke University’s Pratt School of Engineering is launching in summer 2020 an online Graduate Certificate in AI Foundations for Product Innovation (AIPI) that will fuel innovation by equipping participants with AI/machine learning skills and a strong understanding of how the technologies can be applied to build new products and services.

AI Helps College Students Manage Housing Risks, Propels The ‘Wave Of The Future’ In Legal Services

90.5 WESA, An-Li Herring


Many college students don’t know when they’ll return to campus due to COVID-19, but when they do, they will need a place to live. Some might feel pressure to commit to housing despite the uncertainty, and Pittsburgh lawyer Marcy Smorey is using artificial intelligence to help students be smart about signing apartment leases.

Through Smorey’s recently launched company, CloverContracts, renters can upload their leases to be reviewed automatically. The software uses machine learning and natural language processing to scan the document and then offer guidance based on what it finds. Smorey said the tool can help tenants to decide whether to sign a rental agreement or to learn their rights under an existing one. [audio, 1:30]

Scale AI invests $10 million to create ten AI university chairs across Canada

BetaKit, Isabelle Kirkwood


Québec-based artificial intelligence (AI) supercluster Scale AI is investing $10 million to support the creation of at least ten university chairs dedicated to AI across the country. This contribution will double the amount committed by Canadian universities, bringing the total commitment to $20 million.

The goal of the university chairs is to attract and retain AI researchers, support research programs, and develop AI experts at post-secondary institutions. The supercluster said this is a part of its AI talent development strategy in Québec and across the country.

As George Floyd Protests Rock Cities, Students and Presidents Condemn Systemic Racism

The Chronicle of Higher Education, Andy Thomason


Students on campuses across the country joined in protesting. At the University of Missouri at Columbia, students marched on the quad, covering the head of a statue of Thomas Jefferson with a plastic bag.

At the University of Mississippi, someone spray-painted the words “Spiritual Genocide” on a Confederate monument on Saturday. The chancellor, Glenn Boyce, responded by saying he supported the relocation of the monument, a process that was already underway.

In Atlanta, two police officers were fired after they were shown on video dragging a student at Morehouse College and a student at Spelman College from a car and hitting them with a stun gun.

Canada waits for China to strike back after court rules against Huawei executive

The Financial Post, Gabriel Friedman


The Canadian business community is watching nervously as tensions with China, a key trading partner accounting for roughly $100 billion in annual activity, continue to deteriorate, prompting expectations of a backlash from the growing superpower.

On Wednesday, a British Columbia Supreme Court judge blocked the latest attempt by Meng Wanzhou, chief financial officer of Chinese telecom giant Huawei Technologies Co. Ltd., to halt her extradition to the U.S., where prosecutors allege she violated Iran sanctions.

In the aftermath of Meng’s arrest in December 2018, China took a number of steps, widely viewed as retaliatory, including imprisoning two Canadians, and erecting tariffs on canola and other products.

COVID-19 Has Exposed Critical Weaknesses in Global Higher Education

Quillette, Christos A. Makridis and Soula Parassidis


The traditional educational services sector in the United States, and world at large, was not prepared for the COVID-19 pandemic, including institutions of higher education, leading to significant disruptions in learning outcomes and budgets. Notified at the last minute, many students found themselves having to pack up their bags and leave campus dorms—sometimes with nowhere to go. Although the dust is still settling, four-year colleges might experience a 20 percent decline in fall enrollment, accelerating a trend already in place since 2011. In fact, 500 to 1,000 colleges might be put completely out of business.

These new challenges add to already deteriorating outcomes among college graduates, ranging from an all-time high of nearly $1.6 trillion in student debt as of 2020 to a flattening college wage premium. Moreover, a national 2018 survey of employers found that only around 40 percent say that recent college graduates exhibit professionalism, a good work ethic, and have decent oral and written communication, and only 33 percent say that recent graduates possess leadership skills. This is particularly concerning given that soft skills and their combination with technical skills are becoming more important than ever.

One of the potential factors behind these deteriorating outcomes stems from the increasing proliferation of degree programs that are disconnected from the needs of the labor market.

I’ll leave discussion of the physical protests to others for now, but I’m really worried about infiltration of (online) activist movements by domestic provocateurs and foreign agents — who will try to shape these movements towards their own objectives.

Twitter, Kate Starbird


Please vet your sources. Don’t retweet someone you don’t know without figuring out who they are why they’re there. Don’t add new follows without doing a deep dive of those accounts. We are really vulnerable to information operations during these tumultuous times.

All of Us Research Program Begins Beta Testing of Data Platform

National Institutes of Health (NIH), All of US research program


In partnership with our participants—now nearly 350,000 and counting—we’re working to build one of the world’s largest and most diverse datasets to advance health research. Today, I’m happy to announce that we’ve opened our research platform, the All of Us Researcher Workbench, for beta testing. Now, researchers can begin using our initial dataset and tools in studies and tell us what’s working and what we can improve. This moment is an important step in our effort to accelerate new discoveries.

During this beta testing phase, researchers may notice that All of Us does things a little differently than other research programs, including data access. For starters, we aren’t waiting to share data until after participant recruitment and data collection have ended. We launched national enrollment just two years ago and continue to enroll new participants each week, on our way to our goal of one million.

Speeding Up the Dissemination of Scholarly Information

Ithaka S+R, Oya Y. Rieger


We are once again on the threshold of another economic recession. Numerous open access models have emerged since the last economic downturn, but preprints continue to play an important role. Today, almost 70 platforms are branding themselves as preprint services. The brief I’m publishing today provides an overview of the current preprint landscape, describing the rapid changes to how they are both perceived and utilized and the challenges they face.

Counting During Covid-19 – Is the coronavirus pandemic impacting the 2020 census count in New York City?

Medium, Data Clinic, Stuart Lynn


The broader question of how the 2020 census is shaping up with respect to self-response is being tackled in many different ways. For a national perspective, the CUNY Mapping Service out of the Center for Urban Research, CUNY Graduate Center has created the Hard to Count Map to support an accurate count in 2020. In partnership with Hester Street, a nonprofit involved with census outreach efforts in New York City (NYC), we produced this online dashboard tailored to local data. As a product of this collaboration, we began thinking about how Covid-19 might be influencing self-response at the local NYC level and how patterns in response rates compare to what we’d expect in non-pandemic times.

While it is impossible to assess the effects of different factors (pandemic, online option, etc.) on the final census count while it remains ongoing, we can derive some insight by analyzing the daily self-response rate released through the census API. The API provides both the online/internet self-response rate and the total (internet plus mail-in) rate at the census tract level. Given the previous day’s data is overwritten daily at 3pm EST, we wrote a script to automatically download and store the daily rates for later analysis. You can access this data on our GitHub repo, and it is also being collected and stored independently in this collection.

It’s college decision day — here’s where students stand

CNBC, Jessica Dickler


Hundreds of colleges have declared June 1 National College Decision Day although there is still extreme uncertainty about the fall semester.

Some students are choosing to play it safe by enrolling in an in-state public school or community college or even taking a gap year.


Anaconda Con 2020

Anaconda, Inc.


Online June 3-5. “Join Anaconda and the growing data science community for a free, multi-day virtual experience on June 3-5, 2020. Our virtual event will feature the keynotes, breakout sessions, and networking opportunities that you’ve come to expect from AnacondaCON — with no travel required!” [registration required]

Tools & Resources

Measuring Bias in Natural Language Models

Center for Data Innovation


“Researchers from MIT, Facebook, Intel, and McGill University in Canada have released Stereoset, a dataset of 17,000 sentences that researchers can use to measure a natural language processing model’s bias towards stereotypes. The dataset tasks models to choose from options to fill in a blank for a sentence or to provide additional information after receiving an input sentence. The options include stereotypes, anti-stereotypes, and unrelated information. To score well, a model should prefer options that provide relevant info but not prefer options conveying a stereotype over those that do not.”

Ten Simple Rules for Starting Research in Your Late Teens

SocArXiv; Cameron Mura Mike Chalupa Abigail M. Newbury Jack Chalupa Philip Bourne


“These Ten Simple Rules (TSR) aim primarily at individuals in their late teenage years–late high school (LHS) or early college (EC)–who are either considering or actively searching for a first research opportunity in a university lab. The Rules are motivated by empirical observations and our own experiences. First, and most immediately and practically, as university researchers ourselves (CM & PEB) we have been approached by HS students (JC among others) interested in securing a summer research experience. The requests have become more frequent in recent years and, while the general area has been computational biology, that’s probably secondary: the Rules articulated here may apply equally well across many disciplines. Second, we note at the outset the astonishing increase in the intensity of today’s HS students, on many fronts– in terms of technical skill-sets (e.g., mastery of programming languages), academic preparation and scientific sophistication (e.g., courses in advanced math), and beyond (e.g., career-related ambitions, such as searching for research opportunities at universities!). Third, we should mention that we have had HS students in our laboratory who subsequently went on to productive and rewarding research careers, underscoring that this is a highly formative stage. We hope that this TSR supplies some useful tips on whether research is right for you, how to go about procuring a summer research position, and the broader topic of navigating the late-high school/early college (LHS/EC) stage of your own scientific trajectory.”

Argonne’s new menu of data storage software helps scientists realize findings earlier

Argonne National Laboratory, Press Release


“Anyone can imagine a custom storage system to solve a particular science problem, but it would take years to get it fully complete and ready for production,” said Phil Carns, principal software development specialist in the Mathematics and Computer Science (MCS) division at the U.S. Department of Energy’s (DOE) Argonne National Laboratory.

Carns is technical lead of a team set to solve this problem by identifying a collection of building blocks scientists can pull together to craft a data storage system designed to address their own specific needs. Rob Ross, senior computer scientist in MCS, is principal investigator for the new technology, which he and Carns call Mochi. The Mochi team includes researchers at Argonne, DOE’s Los Alamos National Laboratory, Carnegie Mellon University and The HDF Group, an Illinois-based nonprofit dedicated to advancing state-of-the-art open source data management technologies.

Code in the browser with GitHub Classroom

The GitHub Blog, Arelia Jones


We’ve heard from you that coding online will help support distance learning, so we’ve made that happen in GitHub Classroom. Classroom now integrates with two online IDEs: Microsoft MakeCode Arcade and Both allow coders to work on any device with a web browser, such as tablets or smartphones. The development environment lives online, so there’s no need for students to install code locally or worry about their software version. You and your team can get right to teaching.


Internships and other temporary positions

Visiting Assistant Professor – Disinformation

Clemson University, College of Business: John E. Walker Department of Economics; Clemson, SC
Full-time positions outside academia

Director of Sports Science & Analytics

Austin FC; Austin, TX

Project Manager

DataKind; Brooklyn, NY

Leave a Comment

Your email address will not be published.