Data Science newsletter – July 6, 2021

Newsletter features journalism, research papers and tools/software for July 6, 2021


Nearly 5 of 6 cases were undetected early in COVID pandemic

Los Angeles Times, Amina Khan


Scientists at the National Institutes of Health who studied blood samples from across the United States have discovered that for every coronavirus infection recorded during the spring and summer of 2020, nearly five more went undetected — amounting to nearly 17 million additional cases by July 2020.

The discovery, published this week in the journal Science Translational Medicine, reveals that the coronavirus was far more widespread in the early months of the COVID-19 pandemic than previously thought, and could help scientists and health officials better respond to future outbreaks.

What is it about the economics of pizza delivery, that makes it so much more viable than any other kind of food delivery?

Twitter, Anthony Lee Zhang


Evidence: pizza delivery has been a thing in the US for decades, only now do we have everything-else delivery, seemingly there’s still a fairly large price gap
– Pizza ingredients cost is nontrivially lower than other foods
– Pizza did the “cloud kitchen” model from the start, everyone else wastes money on rent and has to price that in
– Pizza stores better when transporting (boxes stack easily and stay warm)

Why Sony believes paying attention to ethics is key for its A.I. future

Fortune, Eye on A.I., Jonathan Vanian


Last year, Sony said it would implement “A.I. ethics assessments” to investigate how certain A.I.-powered products could pose societal harm, said Alice Xiang, a Sony AI senior research scientist. Sony’s camera division, for instance, has been developing sensors to power computer-vision tasks, like recognizing cars in videos or photos. Part of Xiang’s work is to help Sony study how to mitigate potential racial bias problems created by such systems, which have shown to be better at recognizing white men than women and people of color. By working with Sony’s business units “who are struggling with these issues,” Xiang hopes the company can prevent A.I. ethical disasters.

Like other tech companies, Xiang’s team plans to publish papers about what it finds, but Sony is still debating how much it wants to share about its internal work. In theory, it could lead to Sony abandoning certain products, but Xiang doesn’t want to be put in a position where she has to publicly “point a finger at someone and be like, ‘Yeah, this product was unethical.’”

It’s this kind of struggle—companies wanting to publicize their A.I. ethical research work without revealing details about their internal business decisions— that complicates matters. A lack of transparency is one reason there is skepticism about corporate A.I. ethics. It’s easy for companies to make a vague statement like “A.I. can be used for good,” but it’s difficult for them to say anything more substantial.

Top 20 Master’s Programs in AI & Data Science 2021

DataDrivenInvestor, Raymond Cheng


As somebody who just went through the entire application process, I have compiled a list of the top universities to study Master’s in AI/Data Science in the US.

Please note that these rankings are based upon my own research and thus are subjective. In no particular order, here are the top 20 Master’s Programs:

1. Carnegie Mellon University

How Data Science Can Make Hollywood More Diverse

Knowledge@Wharton, Wharton Business Daily podcast


Hollywood has a diversity problem, and Wharton operations, information, and decisions professor Kartik Hosanagar wants to use data science to fix it.

Inspired by his lifelong passion for storytelling and filmmaking, Hosanagar has launched Jumpcut, a startup to help Hollywood create more inclusive content by relying on data to show industry leaders that audiences are hungry for a wider range of representation. The business also serves as an incubator for undiscovered, diverse talent.

Need help building IKEA furniture? This robot learns your preferences and lends a hand.

University of Southern California, USC Viterbi School of Engineering


Researchers at USC created an algorithm that allows machines to learn from humans and predict their next move.

NSWC Crane, IU, Notre Dame, and Purdue team up to provide Trusted AI workforce development and research

WBIW (Indiana), Crane


“As AI becomes increasingly pervasive in the technologies the world relies upon, failure of these systems is not an option for the modern soldier. Therefore, training future leaders in this field, including current undergraduates, graduate students, and postdoctoral scholars, will ensure we are building a modern and prepared workforce to help support and meet the Trusted AI needs of a complex world,” said Christopher Sweet, Associate Director for Cyberinfrastructure Development in the Center for Research Computing and Assistant Research Professor in Computer Science and Engineering at the University of Notre Dame. “Notre Dame is committed to this mission and excited to partner with our colleagues in IU, NSWC Crane, and Purdue on this globally impactful work.”

The Trusted AI SCALE academic leads, in collaboration with the NSWC Crane AI Development Laboratory (CrAIDL) technical leads, have identified five research themes required to develop the frameworks, methodologies, and tools necessary to assess the level of trust of AI/ML-integrated systems: Trust and Verifiability, Statistical Framework for Data/Model Analysis, Knowledge Graph Enhanced Natural Language Processing, Human-Machine Pairing, and Framework Infrastructure Development.

NVIDIA and Google Cloud to Create Industry’s First AI-on-5G Lab to Speed Development of AI Everywhere

NVIDIA Newsroom


NVIDIA today announced that it is partnering with Google Cloud to establish the industry’s first AI-on-5G Innovation Lab, enabling network infrastructure players and AI software partners to develop, test and adopt solutions that will help accelerate the creation of smart cities, smart factories and other advanced 5G and AI applications.

The lab will provide enterprises with access to Google Cloud’s Anthos platform and NVIDIA accelerated computing hardware and software platforms that let them harness data and AI to drive business performance, improve operational efficiency and optimize safety and reliability. The companies will begin development in the second half of the year.

WHO Report Targets AI in Health, Cautions Against Health Disparities – The report released by WHO explains the uses of AI in health and six principles to prevent health disparities.

Health IT Analytics, Erin McNemar


The World Health Organization (WHO) released the first global report on artificial intelligence in healthcare and six principles to avoid health disparities in the field.

The report, Ethics and governance of artificial intelligence for health, is the outcome of two years of consultations held by a panel of WHO appointed international experts.

“Like all new technology, artificial intelligence holds enormous potential for improving the health of millions of people around the world, but like all technology it can also be misused and cause harm,” Tedros Adhanom Ghebreyesus, PhD, MSc, WHO director-general, said in a press release. “This important new report provides a valuable guide for countries on how to maximize the benefits of AI, while minimizing its risks and avoiding its pitfalls.”

How this Hamilton County court is using artificial intelligence to save young lives:

WKRC (Dayton, OH), Courtney Wheaton


A local court system has become the first in the nation to use artificial intelligence to save young lives. The technology will allow Hamilton County Juvenile Court to better detect suicide and other behavioral risks in children.

The Hamilton County Juvenile Court has had its fair share of cases as of late.

“That’s because of isolation. The stress they are under because of a lot of different things. You’re seeing an increase in violence in Cincinnati and all over the country, and those things are contributing to difficulties for children,” said former Juvenile Court judge John Williams.

The court said it has nine young people in custody on murder charges, a 51% increase in felonious assaults and more kids contemplating self-harm. But what if you could identify those stressors and risks factors in children before they happen?

The Rotting Internet Is a Collective Hallucination

The Atlantic, Jonathan Zittrain


This absence of central control, or even easy central monitoring, has long been celebrated as an instrument of grassroots democracy and freedom. It’s not trivial to censor a network as organic and decentralized as the internet. But more recently, these features have been understood to facilitate vectors for individual harassment and societal destabilization, with no easy gating points through which to remove or label malicious work not under the umbrellas of the major social-media platforms, or to quickly identify their sources. While both assessments have power to them, they each gloss over a key feature of the distributed web and internet: Their designs naturally create gaps of responsibility for maintaining valuable content that others rely on. Links work seamlessly until they don’t. And as tangible counterparts to online work fade, these gaps represent actual holes in humanity’s knowledge.

Mount Union Announces New Minor in Data Science and Analytics

PR Newswire, University of Mount Union


The University of Mount Union’s Board of Trustees recently approved the addition of a minor in data science and analytics to its academic offerings.

The new program reflects the increased demand for data scientists across a variety of disciplines. According to LinkedIn’s 2020 Emerging Jobs Report, “data science is a field that is seeing continued growth on a tremendous scale.” The Bureau of Labor Statistics projects the overall career growth in data related fields to be between 16-31% through 2028.

UConn’s TIP Digital Brings Disruptive Companies On Board in First Months

University of Connecticut, UConn Today


UConn’s TIP Digital in Stamford has hit the ground running, harnessing the momentum of the growing data science industry in Stamford to support the growth of 12 new companies. It is part of the larger Stamford Data Science Initiative launched earlier this year.

The startups at TIP Digital are using the promise of machine learning to address persistent problems with novel solutions. Many use technology to put users in control of issues that matter to them.

“Our companies are moving the needle on critical issues. They are disrupting the status quo and making a positive impact not only in their industries, but right here in Stamford,” says Margaret Feeney, Director of TIP Digital.

Botanists Use Machine Learning to Accelerate Research

JSTOR Daily, Samantha Drake


A team of scientists in Austria has created a new, user-friendly artificial intelligence program to speed up their research by automating the analysis of huge numbers of plant images. They made the initial version of the source code publicly available in April 2020.

The study of plants involves identifying both their genotype (genetic makeup) and their phenotype (observable physical characteristics). Accessing the genomic sequence of an organism is a fundamental part of the study of biology. It allows researchers to make connections between a certain phenotype, such as height or color, and the genes responsible for it, says Patrick Hüther, a scientist who was then at the Gregor Mendel Institute of Molecular Plant Biology (GMI) at the Austrian Academy of Sciences in Vienna (the team is now at Ludwig-Maximilians University in Munich). Hüther is co-lead author of a research article on the development of this new artificial intelligence program, dubbed “ARADEEPOPSIS.”

Machine learning algorithm predicts how genes are regulated in individual cells

University of Illinois-Chicago, UIC Today


A team of scientists at the University of Illinois Chicago has developed a software tool that can help researchers more efficiently identify the regulators of genes. The system leverages a machine learning algorithm to predict which transcription factors are most likely to be active in individual cells.

Transcription factors are proteins that bind to DNA and control what genes are turned “on” or “off” inside a cell. These proteins are relevant to biomedical researchers because understanding and manipulating these signals in the cell can be an effective way to discover new treatments for some illnesses. However, there are hundreds of transcription factors inside human cells and it can take years of research, often through trial and error, to identify which are most active — those that are expressed, or “on” — in different types of cells and which could be leveraged as drug targets.

“One of the challenges in the field is that the same genes may be turned “on” in one group of cells but turned “off” in a different group of cells within the same organ,” said Jalees Rehman, UIC professor in the department of medicine and the department of pharmacology and regenerative medicine at the College of Medicine. “Being able to understand the activity of transcription factors in individual cells would allow researchers to study activity profiles in all the major cell types of major organs such as the heart, brain or lungs.”


Platform Strategy Research Symposium

Boston University, Questrom School of Business


Boston, MA July 16, starting at 8:30 a.m. “We are delighted to host the 2021 platform research symposium at Boston University bringing together global scholars to advance the latest research.” [$$$]


Wageningen University Launches the Third Autonomous Greenhouse Challenge

“Netherlands-based Wageningen University is holding the third edition of its Autonomous Greenhouse Challenge, where international teams compete to grow crops in greenhouses using AI and automation.” Part 1 ends on July 14.

Sports Info Solutions is excited to announce its 2nd annual Football Analytics Challenge.

“For those interested in the football analytics track, we’re asking you to identify the most popular route concepts in the NFL in 2020, as well as ascertain which concepts perform the best versus the different coverage types. We will furnish all the necessary data for you to attack this problem, provided you make a minimum $1 donation to The Boys and Girls Clubs of America.” Deadline for submissions is July 25.

The Shanahan Foundation Fellowship at the Interface of Data and Neuroscience

“Announcing the Shanahan Foundation Fellowship at the Interface of Data and Neuroscience co-hosted by the Allen Institute and the University of Washington (through its eScience Institute and the Computational Neuroscience Center) and directed by Christof Koch, Ph.D., Chief Scientist for the MindScope Program at the Allen Institute. The fellowships are funded by the Shanahan Family Foundation and the Allen Institute.” Deadline for applications is August 1.



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

GitHub Copilot

GitHub, OpenAI


With GitHub Copilot, get suggestions for whole lines or entire functions right inside your editor.

Ten Ways to Apply Machine Learning in Earth and Space Sciences

Eos, Jacob Bortnik and Enrico Camporeale


Machine learning (ML), loosely defined as the “ability of computers to learn from data without being explicitly programmed,” has become tremendously popular in technical disciplines over the past decade or so, with applications including complex game playing and image recognition carried out with superhuman capabilities. The Earth and space sciences (ESS) community has also increasingly adopted ML approaches to help tackle pressing questions and unwieldy data sets. From 2009 to 2019, for example, the number of studies involving ML published in AGU journals approximately doubled.

In many ways, ESS present ideal use cases for ML applications because the problems being addressed—like climate change, weather forecasting, and natural hazards assessment—are globally important; the data are often freely available, voluminous, and of high quality; and computational resources required to develop ML models are steadily becoming more affordable. Free computational languages and ML code libraries are also now available (e.g., scikit-learn, PyTorch, and TensorFlow), contributing to making entry barriers lower than ever. Nevertheless, our experience has been that many young scientists and students interested in applying ML techniques to ESS data do not have a clear sense of how to do so.

How to integrate AI into engineering

The Engineer (UK), Jos Martin


Engineers are increasingly looking to integrate AI into projects and applications while attempting to climb their own AI learning curve. To tackle AI, engineers should start with understanding what AI is and how it fits into their current workflow, which might not be as straightforward as it seems. A simple search of “What is AI?” yields millions of results on Google, with varying degrees of technical and relevant information.

So, what is AI to engineers?

Most of the focus on AI is all about the AI model, which drives engineers to quickly dive into the modelling aspect of AI. After a few starter projects, engineers learn that AI is not just modelling, but rather a complete set of steps that includes data preparation, modelling, simulation and test, and deployment.

Leave a Comment

Your email address will not be published.