Data Science newsletter – February 23, 2022

Newsletter features journalism, research papers and tools/software for February 23, 2022


UT Austin President Jay Hartzell defends faculty tenure after Lt. Gov. Dan Patrick proposes to end it at all public universities

Texas Tribune, Kate McGee


Patrick’s suggestion to end tenure at public universities received swift criticism from faculty and higher education experts who said it would negatively impact the reputation of Texas’ colleges and universities.

$1.4M Gates Foundation Award to Study National Effect of Test-Optional Admissions

University of Maryland, Maryland Today


A $1.4 million grant from the Bill and Melinda Gates Foundation will support a University of Maryland-led study to assess the impact of a nationwide move to make the SAT and ACT tests optional for college admissions, or to eliminate their use altogether.

Associate Professor of Education Julie J. Park and colleagues will examine whether higher education institutions’ swift moves to alter their admissions processes in response to the global COVID-19 pandemic improved college access or equity for different student populations.

Why FDA Must Increase Transparency of Medical Devices Powered by Artificial Intelligence

The Pew Charitable Trusts, Health Care Products, Liz Richardson


Not all AI-enabled tools fall under the purview of the Food and Drug Administration, but the agency regulates any software intended to treat, diagnose, cure, mitigate, or prevent disease or other conditions before it can be marketed and sold commercially. In recent years, FDA has been considering an updated approach to oversight of these products, including steps to improve how developers communicate about four key factors: a product’s intended use, how it was developed, how well it performs, and the logic it uses to generate a result or recommendation.

If companies do not disclose these details, prescribers and patients may be more likely to use the products inappropriately, and that can lead to inaccurate diagnoses, improper treatment, and harm. Here’s how and why this information matters to patients and prescriber

A large-scale study on research code quality and execution

Nature, Scientific Data; Ana Trisovic, Matthew K. Lau, Thomas Pasquier & Mercè Crosas


This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.

AI Machines Have Beaten Moore’s Law Over The Last Decade, Say Computer Scientists

Discover Magazine, The Physics arXiv Blog


No equivalent ranking exists for AI systems despite deep learning techniques having led to a step change in computational performance. These machines have become capable of matching or beating humans at tasks such as object recognition, the ancient Chinese game of Go, many video games and a wide variety of pattern recognition tasks.

For computer scientists, that raises the question of how to measure the performance of these AI systems, how to study the rate of improvement and whether these improvements have followed Moore’s Law or outperformed it.

Now we get an answer thanks to the work of Jaime Sevilla at the University of Aberdeen in the UK and colleagues who have measured the way computational power in AI systems has increased since 1959. This team say the performance of AI systems during the last ten years has doubled every six months or so, significantly outperforming Moore’s Law.

Faculty express mixed feelings over move to renovated Kline Tower

Yale University, Yale Daily News student newspaper, Anika Smith and Evan Gorelick


Statistics and data science department chair Joseph Chang also sees the move as a way to better unify the department, which is currently spread across three separate buildings. Like the mathematics department, the statistics and data science department has begun to exceed its current capacity.

Basu noted that a key advantage of the new Kline Tower space is its full compliance with the Americans with Disabilities Act, or ADA. Currently, she said, only one of the three astronomy buildings on Hillhouse Avenue is ADA-compliant. [Sarbani] Basu added that the Astronomy Department’s current common room, as well as the room where the department hosts most of its events, is not ADA-compliant, potentially posing a serious barrier for students with disabilities.

But Basu added that impacted departments had limited input on space allocation within the building.

Widespread use of National Academies consensus reports by the American public

Proceedings of the National Academy of Sciences; Diana Hicks, Matteo Zullo, Ameet Doshi, Omar I. Asensio


In seeking to understand how to protect the public information sphere from corruption, researchers understandably focus on dysfunction. However, parts of the public information ecosystem function very well, and understanding this as well will help in protecting and developing existing strengths. Here, we address this gap, focusing on public engagement with high-quality science-based information, consensus reports of the National Academies of Science, Engineering, and Medicine (NASEM). Attending to public use is important to justify public investment in producing and making freely available high-quality, scientifically based reports. We deploy Bidirectional Encoder Representations from Transformers (BERT), a high-performing, supervised machine learning model, to classify 1.6 million comments left by US downloaders of National Academies reports responding to a prompt asking how they intended to use the report. The results provide detailed, nationwide evidence of how the public uses open access scientifically based information. We find half of reported use to be academic—research, teaching, or studying. The other half reveals adults across the country seeking the highest-quality information to improve how they do their job, to help family members, to satisfy their curiosity, and to learn. Our results establish the existence of demand for high-quality information by the public and that such knowledge is widely deployed to improve provision of services. Knowing the importance of such information, policy makers can be encouraged to protect it.

SMU Study Uses Drones and Machine Learning to Find ‘Infrastructure Desert’ Neighborhoods in Dallas

Dallas Innovates, David Seeley


What makes a neighborhood safe, livable, and economically viable? Good infrastructure, says a new study from Southern Methodist University—things like crosswalks, noise walls, grocery stores, hospital access, bike trails, and more.

A team of civil engineering researchers at SMU has identified 62 Dallas neighborhoods as “highly deficient” in infrastructure, resulting in “infrastructure deserts” that lessen the quality of life for their residents. As seen in the dark red areas in the map above, most of the neighborhoods are located in the southern part of the city and are home to primarily low-income, Black and Hispanic residents.

The study was supported by a five-year, $584,000 National Science Foundation grant which funds the development of open-source data management software called Clowder, a web-based content repository for finding and utilizing data, which was developed at the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign.

IRS to adopt as user authentication tool

FedScoop, John Hewitt Jones


The Internal Revenue Service has committed to as a user authentication tool after earlier this month agreeing to abandon the use of a commercial tool that featured third-party facial recognition technology.

In a statement on Monday, the Treasury Department said it is working with the General Services Administration to achieve the “security standards and scale” required to adopt the platform.

It comes after IRS earlier this month announced a plan to move away from using a third-party service for facial recognition to authenticate taxpayers creating new online accounts.

We’re headed to a quiescent phase, which is great. But the virus will continue to evolve and adapt.

Twitter, Eric Topol


Art Created By Artificial Intelligence Can’t Be Copyrighted

dot.LA, Christian Hetrick


Computers can now write poems, paint portraits and produce music better than many humans. But when it comes to the realm of intellectual property law, artwork made by machines can’t receive copyright protection, a federal agency has decided.

The U.S. Copyright Office refused to grant a copyright this month for an image made by an artificial intelligence program called Creativity Machine—ruling that “human authorship is a prerequisite to copyright protection.” The case will now head to federal court as the AI program’s owner, Stephen Thaler, plans to file an appeal, according to Ryan Abbott, a Los Angeles-based attorney representing Thaler.

Thaler, the founder of the Missouri-based AI firm Imagination Engines, tried to copyright “A Recent Entrance to Paradise,” a picture that was autonomously created by Creativity Machine’s algorithm without any human help. Thaler listed the program as the artwork’s author and sought a copyright as the machine’s owner.

Operationalizing a routine wastewater monitoring laboratory for SARS-CoV-2

PLOS Water; Rose S. Kantor, Hannah D. Greenwald, Lauren C. Kennedy, Adrian Hinkle, Sasha Harris-Lovett, Matthew Metzger, Melissa M. Thornton, Justin M. Paluba, Kara L. Nelson


Wastewater-based testing for SARS-CoV-2 is a novel tool for public health monitoring, but additional laboratory capacity is needed to provide routine monitoring at all locations where it has the potential to be useful. Few standardization practices for SARS-CoV-2 wastewater analysis currently exist, and quality assurance/quality control procedures may vary across laboratories. Alongside counterparts at many academic institutions, we built out a laboratory for routine monitoring of wastewater at the University of California, Berkeley. Here, we detail our group’s establishment of a wastewater testing laboratory including standard operating procedures, laboratory buildout and workflow, and a quality assurance plan. We present a complete data analysis pipeline and quality scoring framework and discuss the data reporting process. We hope that this information will aid others at research institutions, public health departments, and wastewater agencies in developing programs to support wastewater monitoring for public health decision-making.

UC Berkeley enrollment drama: The students who are at risk

Los Angeles Times, Justin Ray


There are many sides to the debate, but one I want to explore is a recent development that is partially fueling the demand for UC enrollment.


More high school students are meeting UC eligibility requirements due to the UC Board of Regents decision to drop the use of the SAT and ACT for admissions decisions through 2024. The board argued that the tests exacerbated disparities involving race and income. Since the tests were dropped, applications have skyrocketed.

DPI awarded $1 million for computer science education research

University of Illinois Urbana-Champaign, The Daily Illini student newspaper, Royal Shrestha


The Illinois Workforce and Education Research Collaborative as part of the Discovery Partners Institute was awarded a $1 million gift for research on computer science education in Illinois on Feb. 2.

This gift will help fund a four-year research study between DPI and the College of Education at the University.

Raya Hegeman-Davis, co-principal-investigator of this gift and part of DPI, described how it is becoming important to have knowledge in some form of computer science education.

Data scientists have access to your sensitive data—that’s driving more schools to teach ethics

Fortune, Education, Dawn Rzeznikiewicz


What obligation do data scientists have to be good stewards of the data they’re collecting and analyzing? This type of question has been asked in ethics courses in data science programs for years, though ethics remain a hot topic for the technology industry. That’s because ethical considerations are often still secondary, as companies instead prioritize technological developments to speed up the amount of data collected, analyzed and stored.

“Frankly, there’s really no reason for the industry to act differently,” says Julia Stoyanovich, an associate professor at New York University’s Tandon School of Engineering, adding that the newest generation of engineers is much more aware of the social issues and implications of emerging technologies. “There is a big, big push from these engineers to start changing the practice.”

Commit to transparent COVID data until the WHO declares the pandemic is over

Nature, World View, Edouard Mathieu


Over and over, I’ve seen governments emphasize making dashboards look good when the priority should be making data available. A simple text file would do. After all, research groups like mine and citizens with expertise in data-visualization tools are more than willing to create a useful website or mobile app. But to do so, we need the raw material in a machine-readable format.

Yann LeCun: AI Doesn​’t Need Our Supervision

IEEE Spectrum, Eliza Strickland


LeCun, VP and chief AI scientist of Meta (formerly Facebook), believes that the next AI revolution will come about when AI systems no longer require supervised learning. No longer will they rely on carefully labeled data sets that provide ground truth in order for them to gain an understanding of the world and perform their assigned tasks. AI systems need to be able to learn from the world with minimal help from humans, LeCun says. In an email Q&A with IEEE Spectrum, he talked about how self-supervised learning can create more robust AI systems imbued with common sense.

He’ll be exploring this theme tomorrow at a virtual Meta AI event titled Inside the Lab: Building for the Metaverse With AI. That event will feature talks by Mark Zuckerberg, a handful of Meta’s AI scientists, and a discussion between LeCun and Yoshua Bengio about the path to human-level AI.



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources, an Intel company, announces AI Blueprints, an open-source suite of developer friendly ready-made ML pipelines

PR Newswire,

from, an Intel company, provider of the artificial intelligence (AI) and machine learning (ML) platform, today announced the release of AI Blueprints to help developers quickly deliver AI applications, with easy-to-use ML pipelines that can run on any infrastructure. AI Blueprints are a curated, verified and open-source library of ML model APIs and customizable pipelines to enable software developers to easily enhance their applications with AI to solve any business or product problem.

Humanitarian crises are dynamic and so is the #data needed to understand them. In this new report from @humdata , learn more about the state of humanitarian data

Twitter, SDSN Trends, OCHA Centre for Humanitarian Data


OUT NOW: The State of Open Humanitarian Data 2022!

In our 3rd year of producing this report, we can see progress with closing data gaps across 27 humanitarian crises.

YData launches data-centric AI community for data scientists

VentureBeat, Chris J. Preimesberger


Chief data officers, data scientists and data analysts of all stripes may be interested in a new AI support and information community that’s debuting today.

That special-interest group is the idea of a startup, YData, a self-described “data-centric AI community” that created what it claims is the first development platform for data quality to accelerate the development of AI solutions. The new company aims to break down barriers for data science teams, researchers and beginners to create a “friendly place where data quality issues are discussed and solved,” CEO and founder Gonçalo Martins Ribeiro told VentureBeat via email.

Leave a Comment

Your email address will not be published.