Data Science newsletter – January 26, 2021

Newsletter features journalism, research papers and tools/software for January 26, 2021

GROUP CURATION: N/A

 

America’s Most Reliable Pandemic Data Are Now at Risk

The Atlantic, Alexis C. Madrigal


from

In a short time, control over this hospitalization data became one of the most hotly contested elements of the American response, as fears of Trump-administration meddling cast doubt on the Department of Health and Human Services. Now the Biden administration is poised to take over as the country faces the worst surge yet, and hospitalization data may be the most important information it will have in the fight to save lives. The administration must decide where those data will live.


I could not be more thrilled that the Biden-Harris Administration has announced an intention to create a National Center for Epidemic Forecasting and Outbreak Analytics

Twitter, Caitlin Rivers


from

This idea has a long history, with dozens of brilliant scientists, policymakers and innovators behind the wheel, including many who roll up their sleeves every day to make sure our leaders have what they need to fight outbreaks effectively @dylanbgeorge


The US National AI Initiative seal: an American eagle wrestling a patriotic octopus inside a neural net.

Twitter, Will Knight


from


Eye on A.I. podcast – Geoff Hinton on his quest to decode learning in the brain

Eye on A.I., Episode #63, Craig Smith


from

CRAIG: I don’t think I need to introduce you or that you need to introduce yourself. I do want to sort of recap what’s gone on in the last year. It’s been quite a year. Capsule networks had sort of faded from view, at least from the layman’s point of view, and resurfaced at NeurIPS last December with your introduction of stacked capsule autoencoders.
00:01:47



Then, in February at the AAAI conference, you talked about capsule networks as key to unsupervised learning. And in April you revived the idea of backpropagation as a learning function in the brain with the introduction of neural gradient representation by activity differences, or NGRADs,
00:02:08

GEOFF: I think it would be better if we started with capsules and we do three different topics.
00:02:13

We do capsules.
00:02:14

CRAIG: Okay.
00:02:14

GEOFF: Then we do SimCLR. And then we do the NGRAD stuff.


North Dakota bill would ban wastewater testing for COVID-19

Associated Press


from

A North Dakota lawmaker wants to ban testing of wastewater for the coronavirus, saying the practice violates the privacy rights of college students.

Republican Rep. Claire Cory, of Grand Forks, introduced a bill that would prohibit wastewater testing “for genetic material or evidence of disease.” She said she brought the proposal over concerns from people about wastewater testing tracing the virus on university campuses.

“They should still have privacy and things like that,” Cory said, acknowledging that the testing doesn’t trace the virus to individuals


COVID Modeling Demands Mean Big Storage Boost for IHME

The Next Platform, Nicole Hemsoth


from

Over the course of the pandemic, you have probably encountered data from IHME, the Institute for Health Metrics and Evaluation. The grant-supported organization based at the University of Washington’s School of Medicine is global in reach, providing everything from population health, death, and other statistics to help guide policymakers and worldwide health efforts.

As one might imagine, bringing together population data from massive sources (not to mention in multiple formats, which is its own data integration challenge, consuming up to 40% of IHME developer time) and analyzing it to produce reports for data laymen takes significant computational, data science, storage, and other resources. With COVID-19, those demands were sent soaring.

This new push to beef up infrastructure, which already includes a 27,000-core cluster on-prem (AMD and Intel combination with increased AMD on the horizon) includes a rethink of the storage backbone. As Serkan Yalcin, Director of IT and Technology Infrastructure/DevOps tells The Next Platform, they were not sure what needs their modelers would have but they invested in four different Qumulo appliances, from flash to high-capacity, to fit demand.

IHME is already using around 80% of the Qumulo capacity for its work but expects future projects to keep pushing limits on what they require.


How COVID-19 has changed the culture of science

Chemical & Engineering News; Bethany Halford, Laura Howes, Andrea Widener


from

Ever since the first genome sequence of the novel coronavirus was released to the world in February 2020, science has been supercharged. The speed and volume of discovery over the past year have been remarkable, with researchers managing to unravel the molecular details of the virus, understand how it spreads and who is most at risk, and invent tests, drugs, and vaccines to tackle it. The public has seen what scientists can do under pressure.

Throughout 2020, clinical trial data regularly garnered headlines, and certain academics emerged as authoritative voices of the pandemic.

“There’s a group of people who have become almost household names,” says Holden Thorp, chemist and editor in chief of the Science family of journals. Early on, the University of Sydney chemist Pall Thordarson became a viral sensation for explaining how something as simple as washing your hands can protect against infections. In Germany, podcasts about the pandemic by virologist Christian Drosten, who developed the first diagnostic test for SARS-CoV-2, became must-listens for the German-speaking public. And Anthony Fauci, director of the US National Institute of Allergy and Infectious Diseases, emerged as the steady source of advice in the US and overseas—he also became a popular icon, with his image appearing on T-shirts, socks, and coffee mugs.

Scientists who spoke to C&EN believe that the reputation of the pharmaceutical industry, in particular, has improved during the pandemic, as the public closely followed the development of vaccines. Companies’ rapid mobilization also highlighted that science is collaborative, works across borders, and is performed by diverse teams. “Hopefully, this will be seen more unambiguously as a triumph for science,” Thorp says.


Developing Algorithms That Might One Day Be Used Against You

Gizmodo, Ryan F. Mandelbaum


from

Machine learning algorithms serve us the news we read, the ads we see, and in some cases even drive our cars. But there’s an insidious layer to these algorithms: They rely on data collected by and about humans, and they spit our worst biases right back out at us. For example, job candidate screening algorithms may automatically reject names that sound like they belong to nonwhite people, while facial recognition software is often much worse at recognizing women or nonwhite faces than it is at recognizing white male faces. An increasing number of scientists and institutions are waking up to these issues, and speaking out about the potential for AI to cause harm.

Brian Nord is one such researcher weighing his own work against the potential to cause harm with AI algorithms. Nord is a cosmologist at Fermilab and the University of Chicago, where he uses artificial intelligence to study the cosmos, and he’s been researching a concept for a “self-driving telescope” that can write and test hypotheses with the help of a machine learning algorithm. At the same time, he’s struggling with the idea that the algorithms he’s writing may one day be biased against him—and even used against him—and is working to build a coalition of physicists and computer scientists to fight for more oversight in AI algorithm development.


Destruction Emerges as a Powerful Enforcement Measure for AI: FTC Requires Company to Delete Models Trained with Improperly Utilized Consumer Data

NYU School of Law, Program on Corporate Compliance and Enforcement; Jeremy Feigelson, Avi Gesser, Jim Pastore, Justin C. Ferrone, Anna R. Gressel, Paul D. Rubin, and Melissa Runsten


from

For those following emerging artificial intelligence (“AI”) regulations and enforcement closely, one issue of great interest is remedies. In particular: in what circumstances, if any, would regulators or courts find that a flawed machine learning or AI model must be scrapped entirely? A hot-off-the-press decision from the U.S. Federal Trade Commission (the “FTC”) suggests regulators will not shy away from saying “scrap it.”

The issue arises when a complex model is trained utilizing data that the model owner was not legally authorized to use for that purpose.


School, Police Data Sharing Attracts Federal Scrutiny

GovTech, Tampa Bay Times, Jeffrey S. Solochek


from

There’s been a long running debate in Florida and the nation over the collection and use of all types of student information, ranging from personal identifiers to academic performance. Schools stress that the information helps them determine which students need the most help. Some parents worry that, in the wrong hands, their children could be exposed to negative consequences. The controversy has come into focus in Pasco County, in a story that has gone viral. Read on for the latest on that item and more Florida education news.

The Pasco County school district has come under fire for sharing student data with law enforcement. Now the chair of the U.S. House education committee has called for an investigation into the practice.

The presidential inauguration is today. Palm Beach County school district officials made clear that it’s okay for classes to watch and discuss the event, the Palm Beach Post reports. More from WLRN. ‣ Across the nation, some schools are refusing to show the inauguration, Education Week reports.


‘Inspired choice’: Biden appoints sociologist Alondra Nelson to top science post

Nature, Nidhi Subbaraman


from

Scientists praise US president’s selection of the bioethics and social inequality specialist to help lead the Office of Science and Technology Policy.


Study: COVID-19 Plus the Recession is Leading to Decreased Enrollment

Diverse: Issues In Higher Education, Lois Elfman


from

Low-income students are particularly hard hit as people cancel or reduce postsecondary plans.

An analysis from the Georgetown University Center on Education and the Workforce (CEW), titled “When Back to School Meets Stay at Home,” noted that 75% of households, in which at least one person intended to take postsecondary classes, changed their plans. Some opted not to take any classes (37%) and some reduced their course load or changed programs, institutions or format.


Period apps are a privacy nightmare – should you still use them?

Business Insider, Isobel Asher Hamilton


from

When you tell an app about your period, it’s hard to know exactly where that data is going.

Period-tracking apps offer clear health benefits to users, allowing them to track and anticipate symptoms, as well as providing an aid for people hoping to conceive. They are also hugely popular – period tracker app Flo has more than 50 million downloads on the Google Play store. Its next big competitor Clue has more than 10 million. It’s a competitive market, and even Apple launched its own period-tracking app in 2019.

Unfortunately, menstruation apps also have a track record of throwing up big privacy red flags.

This manifested last week, when popular period-tracking app Flo reached a settlement with the Federal Trade Commission (FTC) after the FTC alleged it shared sensitive user data with third parties including Facebook and Google – a practice that was revealed by a Wall Street Journal investigation in 2019.


Wildfire Smoke Is Alive

VICE News, Shayla Love


from

Until recently, smoke was thought of as an inert substance. In 2018, researchers found it’s teeming with microbial life.


LAUNCHING IN FALL 2021: #NYUCUSP is pleased to announce our NEW Online Advanced Certificate in Applied #UrbanScience & Informatics!

Twitter, NYU CUSP


from

This 12 credit program can be taken from anywhere & built around your busy schedule.


Events



Building for Digital Health 2021, Tech Talks and Hackathon

MIT Hacking Health


from

Online February 1-7. Organized by MIT Hacking Medicine in partnership with Google Cloud. Register by January 29.


Deadlines



AI for Public Health Workshop at ICLR’21

Online May 8. Deadline for submissions is February 26.

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



We’re launching the new http://huggingface.co and it’s incredible

Twitter, Hugging Face


from

Play live with +10 billion parameters models, deploy them instantly in production with our hosted API, join the 500 organizations using our hub to host/share models & datasets


A new version of tidycensus, 0.11.4, is now up on CRAN!

Twitter, Kyle Walker


from

The main new features include 1) support for the just-released 2019 5-year ACS microdata in `get_pums()`, and 2) the ability to grab ZCTAs by state in the 2015-2019 ACS, which hasn’t been possible for previous years #rstats


RxR: A Multilingual Benchmark for Navigation Instruction Following

Google AI Blog, Alexander Ku and Peter Anderson


from

This challenge, referred to as vision-and-language navigation (VLN), demands a sophisticated understanding of spatial language. For example, the ability to identify the position “behind the chair at the head of the table” requires finding the table, identifying which part of the table is considered to be the “head”, finding the chair closest to the head, identifying the area behind this chair and so on. While people can follow these instructions easily, these challenges cannot be easily solved with current ML-based methods, requiring systems that can better connect language to the physical world it describes.

To help spur progress in this area, we are excited to introduce Room-Across-Room (RxR), a new dataset for VLN. Described in “Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding”, RxR is the first multilingual dataset for VLN, containing 126,069 human-annotated navigation instructions in three typologically diverse languages — English, Hindi and Telugu.


Careers


Full-time, non-tenured academic positions

Senior Lecturer on Statistics



Harvard University, Department of Statistics; Cambridge, MA

Leave a Comment

Your email address will not be published.