Data Science newsletter – February 13, 2021

Newsletter features journalism, research papers and tools/software for February 13, 2021



The most important legacy of the NFL season may have nothing to do with football

The Week, Zach Schonbrun


… Late last month, the CDC released a study it jointly co-authored with several of the NFL’s leading medical experts based on a trove of data gathered from players and personnel over the first three months of the season. The news got buried under the avalanche of Super Bowl-related headlines. But it contained several eye-opening revelations.

Early-Decision, Early-Action Applications Fall – Despite record totals in the Ivies and other competitive colleges, they are dropping at most institutions.

Inside Higher Ed, Scott Jaschik


According to new data from Hobsons’ Naviance service, the number of students applying early decision (nationally) is down 4.6 percent. And the number of students who applied early action fell 5.7 percent. Melissa Meyer of Hobsons said the numbers suggest that yield will be even more difficult for colleges to predict — except perhaps those in the top ranks that are receiving plenty of early applications.

What’s going on?

In part, it is another piece of the picture in admissions this year, in which colleges that are wealthy and that serve many wealthy students are doing well. And other colleges are not.

Tweet round-up from the second half of #AAAI2021

ΑΙhub, Lucy Smith


You may have seen our tweet round-up from the first few days of the AAAI 35th conference on artificial intelligence. During the second part of the conference we’ve enjoyed further invited talks, interesting panel sessions, and workshops. In our second round-up, we collect tweets from Saturday 6 to Tuesday 9 February.

Why using A.I. can be risky business

Fortune, Eye on A.I., Jonathan Vanian


Some businesses are now even publicly distancing themselves from what used to be a standard practice, such as using machine learning to predict customer behavior. Alex Spinelli, the chief technologist for business software maker LivePerson, recently told Fortune, that he has cancelled some A.I. projects at his current company and at previous employers because those undertakings conflicted with his own ethical beliefs about data privacy.

For Aza Raskin, the co-founder and program advisor for the Center for Humane Technology non-profit, technology—and by extension A.I.—is experiencing a moment akin to climate change.

OpenAI and Stanford researchers call for urgent action to address harms of large language models like GPT-3

VentureBeat, Khari Johnson


Some businesses are now even publicly distancing themselves from what used to be a standard practice, such as using machine learning to predict customer behavior. Alex Spinelli, the chief technologist for business software maker LivePerson, recently told Fortune, that he has cancelled some A.I. projects at his current company and at previous employers because those undertakings conflicted with his own ethical beliefs about data privacy.

For Aza Raskin, the co-founder and program advisor for the Center for Humane Technology non-profit, technology—and by extension A.I.—is experiencing a moment akin to climate change.

Microsoft plans to make Atlanta “one of our important hubs,” says Brad Smith

The Seattle Times, The New York Times, Andy Peters


Microsoft President Brad Smith said the company plans to make metro Atlanta a major hub as it embarks on a significant expansion.

The tech giant is adding two regional data centers and could bring thousands of jobs to 90 acres of land it bought in recent months at the stalled Quarry Yards development on the city’s Westside.

“You don’t buy 90 acres if you don’t have plans to grow substantially,” Smith told The Atlanta Journal-Constitution in an interview.

WVU responds to data revolution with new major

Newswise, West Virginia University


To meet these 21st century workforce demands, West Virginia University is launching a new undergraduate data science major in fall 2021, the first of its kind in the state.

“I am delighted that we are able to launch this new and innovative degree program. The data science degree will provide an excellent opportunity for our students to learn and understand the intricacies of ‘big data’ and, more importantly, be able to utilize vast amounts of information and data to solve real-world problems,” said Gregory Dunaway, dean of the Eberly College of Arts and Sciences, home to the new major. “I am particularly excited that this degree intersects so many other areas of academic study so that students from all over the University may benefit.”

Tuscaloosa, Ala., tackles blight with garbage-truck cameras and AI

StateScoop, Colin Wood


To remove unsightly rubbish and reduce the contact between city employees and the public during the pandemic, Tuscaloosa, Alabama, has partnered with the neighboring University of Alabama to develop technology that can automatically detect blighted properties.

The university’s business college last month unveiled the technology, which uses cameras mounted on city vehicles — in particular garbage trucks — to collect images of properties throughout Tuscaloosa. A machine-learning model was trained on photos of blighted properties that had been collected by the city over more than a decade.

Why the U.S. Is Struggling to Track Coronavirus Variants

Smithsonian Magazine, Lila Thulin


There’s a reason why scientists in the United Kingdom, and not other nations, were the first to pinpoint a more transmissible variant of the virus that causes Covid-19. It wasn’t because the B.1.1.7 variant had necessarily originated from a patient in their country—scientists still don’t know that. Rather, British researchers had spotted the mutant spreading through London and southeast England because, more than any other country in the world, Britain was actively looking. Thanks to a $27 million government investment at the start of the pandemic, the country has analyzed the entire genetic makeup of more than 210,000 samples of SARS-CoV-2, the virus that causes Covid-19. That’s 43 percent of the total coronavirus genomes sequenced worldwide, and 5 percent of the country’s overall cases.

Meanwhile, the United States, home to a disproportionate one-fourth of the pandemic’s Covid-19 patients, has sequenced only about 96,000—a fraction of 1 percent—of its 27 million (and counting) cases. As of early February, this sequencing rate places the country 34th in the world, according to researchers at the Broad Institute. American scientists and public health authorities have been flying, if not blind, then at least with serious tunnel vision.

Major camera company can sort people by race, alert police when it spots Uighurs

Los Angeles Times, Johana Bhuiyan


Facial recognition software developed by China-based Dahua, one of the world’s largest manufacturers of video surveillance technology, purports to detect the race of individuals caught on camera and offers to alert police clients when it identifies members of the Turkic ethnic group Uighurs.

Dahua, though among Chinese companies sanctioned by the U.S. government, has a growing presence in the country with sales and support offices in Irvine and Houston. Despite restrictions on its business within the country, the company struck a deal, reportedly valued at $10 million, with Amazon for 1,500 thermal cameras and there are 80 public contracts to purchase the company’s equipment in California alone.

US study to broaden detection of coronavirus in wastewater

Engineering & Technology magazine


“Research has shown that people will start excreting the virus in their faeces and urine before showing symptoms of Covid-19, so the idea is to use wastewater measurements as an early warning for a potential Covid-19 outbreak in the community,” said Helena Solo-Gabriele, a professor of environmental engineering at the [University of Miami].

Solo-Gabriele, along with a team of experts across fields such as pharmacology, biomedicine, and medicine, received a major grant from the National Institutes of Health (NIH) to conduct the research, with funding starting this month. The two-year, $5m (£3.6m) grant will help broaden the university’s own pilot programme to detect the virus in wastewater.

Amsterdam displaces London as Europe’s top stocks centre after Brexit

Reuters, Huw Jones


Amsterdam has displaced London as Europe’s biggest share trading centre after Britain left the European Union’s single market, and picked up a chunk of UK derivatives business along the way, according to data published on Thursday.

Twitter CEO Jack Dorsey explains his vision for social media’s future

CNET, Queenie Wong


Twitter CEO Jack Dorsey said Wednesday he envisions his short-message service becoming less centralized over time, a development that will give social media users more control over their data and the type of content they see.

The remarks build on earlier comments Dorsey has made about the future of social media. In 2019, Twitter said it was working on Bluesky, a project focused on creating “an open and decentralized standard for social media” that could help the site better combat online abuse and misinformation. The social network also began testing a forum called Birdwatch that lets users spot misleading tweets and weigh in on why they think the information is wrong.

Artificial Intelligence gets major boost with Amsterdam University of Applied Sciences new centre of expertise

Science|Business, Amsterdam University of Applied Sciences


The Amsterdam University of Applied Sciences (AUAS) is the first knowledge institution in the Netherlands to create a Centre of Expertise on Applied Artificial Intelligence, together with partners from the business community and the public sector. This will give a major boost to knowledge on applied Artificial Intelligence (AI) in the Amsterdam region. A unique feature of this Centre of Expertise is that students from all degree programmes at the AUAS can learn through labs how to apply AI in their fields of study; from nursing and ICT to teacher training.

Currently, AI algorithms, which are really ‘self-learning’ systems, support us in things like navigation on the web or with chatbots. More and more possibilities are being developed, and this has an impact on all fields and on what employees need to be able to do. In the Netherlands, there is therefore an urgent need for more knowledge and talent in relation to AI, particularly from SMEs and the public sector.

As the largest knowledge institution in the Amsterdam region, the AUAS is responding to this need by establishing a Centre of Expertise on Applied Artificial Intelligence to help businesses and public organisations in this transition.

Tim Cook Pivots to Fitness

Outside Online, Michael Roberts


“We all know intuitively, and now with research, that physical activity is a key part of longevity and quality of life,” Cook says. His own training time is sacrosanct, the one portion of his day when he’s unreachable. “I’m off-grid for that period,” he says. “And I am religious about doing that regardless of what’s going on at the time.”

No surprise that he pays close attention to the fitness data captured by his Apple Watch. “I want to know what I’m doing, not what I think I’m doing,” he says. “Because I can always convince myself that I’m doing more than I really am. So for me, it’s a motivator.”

A few weeks before we spoke, Apple introduced the Watch Series 6 with the slogan “The future of health is on your wrist.” Now, as we walk along a pathway winding between shrubs and dry grasses, Cook makes the case that the Watch has ushered in a new era of fitness tracking, and not just for dedicated athletes.


The Learning Salon is a weekly forum in which we explore bridges and contentions in biological and artificial learning.

The Learning Salon


Online March 5, starting at 4 p.m. Eastern. Speaker: Randall O’Reilly [rsvp required]



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

Uncovering Unknown Unknowns in Machine Learning

Google AI Blog, Lora Aroyo and Praveen Paritosh


The absence of ambiguous real-world examples in evaluation undermines the ability to reliably test machine learning performance, which makes ML models prone to develop “weak spots”, i.e., classes of examples that are difficult or impossible for a model to accurately evaluate, because that class of examples is missing from the evaluation set.

To address the problem of identifying these weaknesses in ML models, we recently launched the Crowdsourcing Adverse Test Sets for Machine Learning (CATS4ML) Data Challenge at HCOMP 2020 (open until 30 April, 2021 to researchers and developers worldwide). The goal of the challenge is to raise the bar in ML evaluation sets and to find as many examples as possible that are confusing or otherwise problematic for algorithms to process. CATS4ML relies on people’s abilities and intuition to spot new data examples about which machine learning is confident, but actually misclassifies.

Why I Built Litestream

Litestream, Ben Johnson


I built Litestream to bring back sanity to application development. Litestream is a tool that runs in a separate process and continuously replicates a SQLite database to Amazon S3. You can get up and running with a few lines of configuration. Then you can set-it-and-forget-it and get back to writing code.

You might think this sounds expensive to continuously write to cloud storage that provides 99.99% uptime and 99.999999999% durability but it’s astoundingly cheap. Typical costs are only about $1 per month. Litestream is free and open-source too so there’s never a license to pay for.

Explore and Visualize Data the Apache Superset Way

The New Stack, Susan Hall


Superset enables users to explore data and build visualizations using a no-code visualization builder and SQL editor. It competes with tools like Tableau, Looker, Chartio and others.

Scaling Neuroscience Research on AWS

Amazon, AWS Architecture Blog, Konrad Rokicki


Our neuroscience research application involves image searches that are computationally intensive but have unpredictable and sporadic usage patterns. The conventional on-premises approach is to purchase a powerful and expensive workstation, install and configure specialized software, and download the entire dataset to local storage. With 16 cores, a typical search of 50,000 images takes 30 seconds. A serverless architecture using AWS Lambda allows us to do this job in seconds for a few cents per search, and is capable of scaling to larger datasets.

Leave a Comment

Your email address will not be published.