Data Science newsletter – November 30, 2020

Newsletter features journalism, research papers and tools/software for November 30, 2020



Testing sewage can give school districts, campuses and businesses a heads-up on the spread of COVID-19

The Conversation, Charles Gerba and Robert Glennon


November has brought encouraging news about several COVID-19 vaccines. But members of the general public will probably not be vaccinated before the spring or summer of 2021 at the earliest. Americans will be living with this pandemic for some time to come.

We are a microbiologist and a water policy specialist, and believe that wastewater-based epidemiology, which tests raw sewage, has an important role to play. Studies have shown that testing wastewater offers an early warning signal that the SARS-CoV-2 virus, which causes COVID-19, is present in a community.

Although this approach is less targeted than testing individuals, we believe it also has advantages. Tracking infection rates through clinical testing is effective only if tests are available and reliable and people are willing to get tested. Sewage sampling doesn’t have these problems because everyone uses the bathroom. Cities as large as Boston and Chicago are collecting sewage samples as one way to track the coronavirus.

Marketing the Future: How Data Analytics Is Changing

University of Pennsylvania, The Wharton School, Knowledge @ Wharton


Experts warn that data analytics is at an inflection point. Growing concerns about security risks, privacy, bias and regulation are bumping up against all the benefits offered by machine learning and artificial intelligence. Layer those concerns on top of worries about the coronavirus pandemic and how it has rapidly changed consumer behavior, and the challenges become clear.

“What we’re seeing is a lot of chaos in terms of what is the right answer. And what we’re seeing is a change in strategy,” said Neil Hoyne, chief measurement strategist at Google and a senior fellow at Wharton Customer Analytics.

For €9500, Nature journals will now make your paper free to read

Science, Jeffrey Brainard


The elite Nature family of journals, including the flagship Nature, today announced it is taking the plunge into open access in scientific publishing. The journals will become among the first highly selective titles to allow any author to pay a publishing fee to make articles immediately free to read when published. Such open-access arrangements are being required by some European funders and foundations that seek to eliminate subscription paywalls in order to speed the flow of scientific information.

Nature’s author fee, €9500, is thought to be the highest of any journal. But the Nature Research publishing group says it is necessary to cover the costs of the full-time editors and others who produce Nature and its 32 other primary research journals.

College Finance Data Can Help Regulators Protect Students

New America, Education Policy; Clare McCann, Sophie Nguyen, and Wesley Whistle


With another recession now underway, college enrollment remains unpredictable. Data from the National Student Clearinghouse show that undergraduate enrollment at private, non-profit colleges is down more than 2 percent from last year, with first-time student enrollment down by more than 8 percent; early data on FAFSA completion from the National College Attainment Network indicate that applications for the coming academic year may be more deeply impacted. International student enrollment— which has bolstered tuition revenue for many institutions—is down by 43 percent, according to the The Institute of International Education. In other words, colleges once struggling to make ends meet are now at even greater risk of possible closure, especially since Congress hasn’t provided relief funding for institutions since March.

Earlier this year, the founders at the consumer-information website Edmit decided students would need to take measures into their own hands, and produced a model that uses publicly available data to identify private non-profit institutions that have basic liquidity issues — their expenses are high relative to their revenue and assets, and/or enrollment isn’t keeping up enough to make the current path sustainable. In contrast to the complicated formula the federal government uses to develop a composite score (a three-pronged test that accounts for financial viability, capital resources and borrowing ability, and ability to operate within the school’s means), the metric uses a decade of data to assess which schools are trending toward unsustainability.

Precise maps of millions of bright quasars show our place in the cosmos as never before

Science, Joshua Sokol


For the past few decades, astronomers have based their celestial grid on radio observations of several thousand quasars. These radio beacons not only guide the pointing of telescopes, but they are also the bedrock of the reference frame for the spinning, bucking Earth. Without them, GPS devices would lose their accuracy and many ultraprecise studies of processes such as plate tectonics and climate change would be impossible. But observations of these beacons are costly and rely on radio telescopes.

By 2018, when New Horizons was approaching Arrokoth, Gaia had produced its own version of a reference frame, based on half a million quasars seen in the visible wavelengths most astronomers use, not radio. Buie persuaded the New Horizons team to trust the new framework. A correction based on the Gaia positions went up to the probe.

The team got it right: When the closest flyby images came back, Arrokoth was framed perfectly. “None of that would have happened if we hadn’t had the Gaia catalog,” Buie says. “It’s a fundamental rewriting of how we do positional astronomy.”

The rewriting has continued. Next week, on 3 December, Gaia will release, along with the latest data about billions of Milky Way stars, its newest reference frame, built from 1.6 million quasars scattered across the sky. “It is improved, larger, better, more beautiful,” says François Mignard, an astronomer at the Côte d’Azur Observatory in France who leads Gaia’s reference frame team.

Astronomers say they’ve solved the mystery of the ‘Blue Ring Nebula’

Science, Daniel Clery


Astronomers believe they have solved the 16-year-old mystery of a star surrounded by an enigmatic ring of ultraviolet light. If they’re right, the “Blue Ring Nebula” offers a glimpse of a fleeting phenomenon: a star still reeling from its birth via the merger of two other stars.

“Finding a bona fide merger event will be very helpful in developing our understanding of stellar mergers,” says astronomer Boris Gaensicke of the University of Warwick, who was not involved in the study. And because many, if not most, stars originate as binaries, mergers could drive the births of countless stars, says astronomer Morgan Fraser at University College Dublin, also not involved with the work. “There’s a lot we don’t know about how stellar mergers work.”

Trees are losing their leaves earlier because of climate change

CNN, Amy Woodyatt


“Previous models assumed that because autumns will get warmer and warmer over the coming century, autumn will get delayed — growing seasons will overall get longer, and autumn will get delayed by two to three weeks,” ecosystem ecologist Constantin Zohner said.

However, Zohner and a team of researchers have said their findings reverse this prediction.

“We actually predict by the end of the century, leaves might even fall off three to six days earlier,” Zohner, a corresponding author on the paper published Friday in the journal Science, added.

A group of recent Princeton grads is winning lotteries across U.S.

Indianapolis Star, Tim Evans and Tony Cook


A group of recent Ivy League graduates is making a run on lotteries across America.

So far they’ve won more than $6 million from lotteries in Indiana, Missouri, Washington and the District of Columbia.

Exactly how they’re doing it — and how much they are profiting, if any — remains a mystery.

The unusual winning streak first came to light in Indiana after the apparent leader of the group, a 27-year-old Princeton University graduate named Manuel Montori IV, cashed in 61 winning Hoosier Lottery scratch-off tickets on a single day in September.

How Arizona universities are riding the waves of COVID-19

AZ Big Media, Cronkite News


As cases of COVID-19 continue to surge in Arizona and the rest of the nation, the state’s three public universities are wrangling their approaches to the pandemic in similar but separate ways.

Arizona State University developed a saliva-based test and aims to monitor the spread through frequent mass testing.

The University of Arizona, unlike its counterparts, invested in a wastewater test to monitor the spread in highly populated places on campus and suggested a schoolwide shelter-in-place initiative.

Northern Arizona University, the smallest of the three, has changed the least. It adopted ASU’s saliva test and shares UArizona’s system for contact tracing, but it has been the most lenient with in-person education, offering classes with fewer than 45 students.

COVID-19 continues to mutate — UNLV researcher hopes to use sewage to track new strains

KSNV, 3 News Las Vegas, Phillip Moyer


With multiple COVID-19 vaccines on the horizon, it looks like months of spread and restrictions may finally have an end in sight.

An effective vaccine won’t necessarily end the spread of the disease, however. As a virus spreads, it mutates. With enough mutations, a strain of COVID-19 might come about that the new vaccines can’t stop.

“I can envision a possibility where we create the vaccine, but the virus has mutated to a point in some communities where that vaccine is less effective for people who’ve contracted a different strain,” said UNLV researcher Edwin Oh.

Algorithmic Fairness: Choices, Assumptions, and Definitions

Annual Review of Statistics and Its Application; Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, and Kristian Lum


A recent wave of research has attempted to define fairness quantitatively. In particular, this work has explored what fairness might mean in the context of decisions based on the predictions of statistical and machine learning models. The rapid growth of this new field has led to wildly inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions. This article attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decision-making. Next, we show how such choices and assumptions can raise fairness concerns and we present a notationally consistent catalog of fairness definitions from the literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision-making.

Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening now published in @ScienceAdvances .

Twitter, Dan Larremore


Its conclusion is as urgent as ever: stopping COVID is possible with repeated, rapid-turnaround tests. My thread on findings below.

Teaming for Technology – The campus and Amazon team up on new research initiative in machine learning and data science

University of California-Santa Barbara, The Current


To advance knowledge in critical next-generation technologies, Amazon has made a substantial gift to UC Santa Barbara. The investment will support graduate student researchers in computer science and in statistics and applied probability, as well as ongoing lectures series in machine learning and data science.

The gift includes underwriting one graduate student fellowship in the Department of Computer Science (CS) and four graduate fellowships in the Department of Statistics and Applied Probability (PSTAT). It also supported the second annual Responsible Machine Learning Summit recently hosted by UC Santa Barbara’s Center For Responsible Machine Learning.

This new research and workforce development effort was catalyzed by Amazon’s office in downtown Santa Barbara

Largest ever research integrity survey flounders as universities refuse to cooperate

Science, Jop de Vrieze


The world’s largest multidisciplinary survey on research integrity is in danger of falling short of its goals after two-thirds of invited institutions declined to collaborate, citing the sensitivity of the subject and fearing negative publicity. That left researchers leading the Dutch National Survey on Research Integrity on their own to scrape many email addresses and solicit responses. The survey will close on 7 December, but the team has gathered responses from less than 15% of 40,000 targeted participants.

“It was supposed to be a collaborative effort, but it ended up as a satellite on its own in the Solar System, trying to send out signals,” says Gowri Gopalakrishna, a postdoctoral researcher at the Amsterdam University Medical Center (AUMC) who is coordinating the €800,000 survey.

MSU Library partners on grant to build data science capacity

Montana State University, MSU News


In partnership with other academic institutions, Montana State University’s library is working to build data science capacity with the help of a grant from the Institute of Museum and Library Services.

The project, known as LIS Education and Data Science-Integrated Network Group, or LEADING, will involve partners MSU, the University of California San Diego Library and OCLC, a global library cooperative. It will be led by the College of Computing and Informatics at Drexel University.

LEADING is a Laura Bush 21st Century Librarian National Digital Infrastructures and Initiatives project. It launched Nov. 1 with the goal of boosting data science capacity through recruiting, training and developing the library workforce.


JournalismAi Festival 2020

London School of Economics, Journalism Ai and Google News Initiative


Online December 7-11. “The Festival will be a celebration of the most exciting developments at the intersection of journalism and AI that we have witnessed over these 24 months, and a unique occasion to bring together our global network.” [newsletter sign up required]

Pidapalooza 2021: A 24-Hour Nonstop (Virtual) PID Party

CDL, Crossref, DataCite, NISO, and ORCID


Online January 27-28. “PIDapalooza is going online in 2021! We wish we could all be together in person for the fifth (!) festival of persistent identifiers, but we’re excited to bring the world’s largest—and longest—virtual PID party directly to your desk, your couch, your balcony—really, anywhere there’s a strong WiFi signal.” [free, registration required]


Call for Applications: participate online + become a certified @datacarpentry instructor!

Deadline to apply is December 15.

Trust and Doubt in Public-Sector Data Infrastructures

“On Thursday, March 25, 2021, Data & Society Founder and President danah boyd and Executive Director Janet Haven will host an online academic workshop on trust and doubt in public-sector data infrastructures. Our goal is to bring together researchers who are thinking about government data from multiple nations/states, the infrastructures surrounding that data, and the epistemic battle over that data’s legitimacy.” Deadline for applications to participate is January 8.

Tools & Resources

Navigating ML Deployment

Towards Data Science, Ryan Dawson


We often think of ‘deployment’ as packaging software into an artifact and moving it to an environment to run on. For Machine Learning it can be better to think of deployment as “the action of bringing resources into effective action” (one of Oxford’s definitions of ‘deployment’).

There are a range of patterns for using ML models to make business decisions. Deploying machine learning models can mean different things, depending on the context. Understanding the key prediction-making patterns can help to decide on which tools apply to your use case.

We’ve released 1.2 million reviews in 6 languages in the Multilingual Amazon Reviews Corpus (MARC)!

Twitter, Phillip Keung


“w/ Y. Lu G. Szarvas @nlpnoah

USC & Amazon ‘SLADE’ Self-Training Framework Uses Unlabelled Data to Improve Information Retrieval



Let’s face it: the advanced AI systems of the future are unlikely to be built with data that’s been hand-labelled by humans. Finding ways to eliminate the time-consuming data-labelling process has been a challenge in the machine learning (ML) community for some time. A new paper from researchers at the University of Southern California and Amazon approaches the problem with a self-training framework that can improve information retrieval performance using unlabelled data.

We are pleased to announce the first official release of CAMeL Tools

Twitter, CAMeL Lab


“An open-source Python Arabic NLP toolkit for pre-processing, morphological modeling, dialect identification, named entity recognition, and sentiment analysis.”


Tenured and tenure track faculty positions

Assistant Professor (Non-tenure) | Faculty Recruiting | University of Rochester

University of Rochester, Goergen Institute for Data Science; Rochester, NY
Full-time, non-tenured academic positions

Research Fellow: Economic rents in the digital platform economy

University College London, Institute for Innovation and Public Purpose; London, England

Leave a Comment

Your email address will not be published.