Data Science newsletter – November 9, 2020

Newsletter features journalism, research papers and tools/software for November 9, 2020

GROUP CURATION: N/A

 

A digital reconstruction of the 1630–1631 large plague outbreak in Venice

Nature, Scientific Reports, Marcel Salathé et al.


from

The plague, an infectious disease caused by the bacterium Yersinia pestis, is widely considered to be responsible for the most devastating and deadly pandemics in human history. Starting with the infamous Black Death, plague outbreaks are estimated to have killed around 100 million people over multiple centuries, with local mortality rates as high as 60%. However, detailed pictures of the disease dynamics of these outbreaks centuries ago remain scarce, mainly due to the lack of high-quality historical data in digital form. Here, we present an analysis of the 1630–1631 plague outbreak in the city of Venice, using newly collected daily death records. We identify the presence of a two-peak pattern, for which we present two possible explanations based on computational models of disease dynamics. Systematically digitized historical records like the ones presented here promise to enrich our understanding of historical phenomena of enduring importance. This work contributes to the recently renewed interdisciplinary foray into the epidemiological and societal impact of pre-modern epidemics.


Researchers show that computer vision algorithms pretrained on ImageNet exhibit multiple, distressing biases

VentureBeat, Kyle Wiggers


from

State-of-the-art image-classifying AI models trained on ImageNet, a popular (but problematic) dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. That’s according to new research from scientists at Carnegie Mellon University and George Washington University, who developed what they claim is a novel method for quantifying biased associations between representations of social concepts (e.g., race and gender) and attributes in images. When compared with statistical patterns in online image datasets, the findings suggest models automatically learn bias from the way people are stereotypically portrayed on the web.


How L’Oréal and Other Innovative Brands Are Reinventing the Store Locator

Street Fight, Dustin Hayes


from

A notable case in point is the cosmetics giant L’Oréal, which recently unveiled the “Make-up Not Make Waste” campaign in the UK with its Maybelline brand to enable consumers to recycle their used or unwanted cosmetics, which are notoriously difficult to recycle. In collaboration with recycling firm TerraCycle, L’Oréal established makeup recycling stations at popular drugstore chains such as Tesco, Boots, and Superdrug. The brand had previously conducted a survey of Maybelline customers that found nearly 50% were unaware that makeup products could be recycled.

As noted in the BBC’s coverage of the campaign, the primary means by which L’Oréal promoted the availability of recycling stations was through the Maybelline store locator, built for the company by Brandify. As you can see from the screenshot below, the Maybelline locator now includes a prominent filter helping customers to easily find nearby makeup recycling locations.


Will Quantum Computing Supercharge Artificial Intelligence?

Nextgov, John Breeden II


from

But for the sake of argument, let’s assume that quantum supremacy was achieved. Just like for a dog that finally catches one of those cars that it’s always been chasing, the question then becomes, what do we do now? Certainly, we will continue to improve the speed and especially the accuracy of quantum computing. One of the biggest problems right now is that quantum machines return a lot of junk answers (what the scientists refer to as noise) to almost any question mixed in with the actual solution. You can check out this Nextgov article about the specific differences between quantum and traditional computers if you want to know more about how quantum works. I have little doubt quantum will get faster and more accurate, but what can we do with that power right now?


Can contact tracing work for #COVID19? Much effort assumes contact tracing works for COVID – surveillance testing assumes so I’ve had major concerns – this paper reinforces it.

Twitter, Michael Mina, JAMA


from


Palantir in talks to help Britain with Covid contact tracing

CNBC, Sam Shead


from

The U.K. is reportedly considering using Palantir software for coronavirus contact tracing.

Officials are interested in the company’s Foundry software, which could be used to manage sensitive contact-tracing data.


BEYOND program to collaborate on drone integration into U.S. airspace

Robot Report, Virginia Tech


from

Last week, U.S. Secretary of Transportation Elaine Chao announced the launch of a new federal drone-integration initiative. The BEYOND program is intended to advance the safe integration of drones into the national airspace by creating a framework for collaboration among the diverse stakeholders committed to realizing the potential of this new form of aviation. The Virginia Polytechnic Institute and State University said it gives drone researchers another opportunity to help shape the evolution of an emerging technology.

BEYOND builds on the Unmanned Aerial Systems Integration Pilot Program (UAS IPP), which was led by the U.S. Department of Transportation and managed by the Federal Aviation Administration (FAA). That three-year program, which ended on Oct. 25, brought together companies, research organizations, state and local governments, and federal agencies on teams focused on drone-integration projects.

The FAA said the BEYOND program is intended to help make beyond visual line-of-sight (BVLOS) operations scalable and economically viable, better quantify the benefits of UAS operations, and engage the community to learn its concerns.


Artificial lighting is ‘extremely disruptive’ to the natural world

European Scientist, Siobhan Dunphy


from

Artificial night-time lighting has widespread impacts on the natural world, according to a new paper published on 3 November in the journal Nature Ecology & Evolution (1). In particular, night-time lighting disrupts the hormone levels and waking and sleeping patterns of many species. Artificial lighting should therefore be treated as another pollutant and only used where and when it is needed, the authors say.


How Test and Trace failed

Wired UK, Gian Volpicelli


from

In early July, as the rest of England eased out of its first lockdown, Leicester was just getting started. Cases were rising rapidly and the city found itself put under a new lockdown from which it is yet to escape. Leicester was a warning. As Covid-19 spread, officials realised that NHS Test and Trace did not seem to be working in the city. Only eight out of ten people who had tested positive for Covid-19 were being reached by contact-tracers. The percentage of traced contacts stood, and stands to this day, at around 53 per cent, putting Leicester amongst the ten worst performing regions in England.

On July 14, Leicester took matters into its own hands and set up its own local contact-tracing system. It works in concert with NHS Test and Trace, taking over local cases that the national system’s call centres struggle to reach, and tracking them down on the phone, or by physically going to their homes. Once it has made contact with people who are positive for Covid-19, it traces their contacts, and feeds that information back to the national system. The initiative has been hailed as a success by Leicester’s authorities: in early November, the mayor, Peter Soulsby said that the city’s local contact-tracing system managed to reach about 90 per cent of the cases it was handed; he asked for more powers to be given to local authorities in regard to the handling of the pandemic.


New Multidisciplinary Data Science Scholarship Program Now Open

University of Arkansas, News


from

The College of Engineering is pleased to announce the Multidisciplinary Data Science scholarship program. … The program aims to establish recruitment and retention programs for new data science majors and other related STEM disciplines to help meet the growing workforce needs of qualified STEM graduates with data science skills in Arkansas and the nation. The program will bring faculty and business leaders together to mentor students interested in data science and other related STEM disciplines, in career development initiatives, and will provide skillsets sought after within industry, with the aim of improving the graduation rate of students interested in careers related to data science.​


New regional collaborations will accelerate innovation in data-intensive medical science

Fred Hutchison Cancer Research Center, Fred Hutch News Service


from

The three Cascadia Collaboration Awards to cross-institutional teams at the Alliance’s member organizations — Fred Hutchinson Cancer Research Center, the University of Washington eScience Institute, BC Cancer, the University of British Columbia Data Science Institute and the Knight Cancer Institute at Oregon Health & Science University — represent more than $1.2 million in funding and credits for Microsoft’s Azure cloud computing service.


When Algorithmic Fairness Fixes Fail: The Case for Keeping Humans in the Loop

Stanford University, Stanford Institute for Human-Centered Artificial Intelligence


from

Nigam Shah, associate professor of medicine (biomedical informatics) and of biomedical data science at Stanford University and an affiliated faculty member of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), and graduate students Stephen Pfohl and Agata Foryciarz wondered whether algorithmic fixes were really the answer.

In a recent paper, the team found that the various methods that have been proposed to address algorithmic fairness indeed make algorithms fairer, but they can also make them perform more poorly. “You might actually make the algorithm worse for everybody,” Shah says.

The upshot, Shah says, is that when institutions are dealing with issues of fairness in prediction algorithms for clinical outcomes, applying an algorithmic fix is one of three options that should be on the table. The second is to keep a human in the loop to make sure subgroups are treated fairly; and the third is to ditch the algorithm altogether. Knowing which option is most appropriate will require a good understanding of the broader context in which the perceived unfairness arises, he says.


Biologists Create “Atlas” of Gene Expression in Neurons, Documenting the Diversity of Brain Cells

New York University, News Release


from

New York University researchers have created a “developmental atlas” of gene expression in neurons, using gene sequencing and machine learning to categorize more than 250,000 neurons in the brains of fruit flies. Their study, published in Nature, finds that neurons exhibit the most molecular diversity during development and reveals a previously unknown type of neurons only present before flies hatch.

“Diversity of the different cell types that make up our brains can only be fully understood in light of their developmental history,” said NYU Biology Professor Claude Desplan, the study’s senior author.


The carbon footprint of AI and cloud computing

Geographical Magazine, Jacob Dykes


from

In 2019, researchers at the University of Massachusetts Amherst explored the carbon emissions released when building and training natural processing language (NLP) models – AI systems that process human language. By converting the energy consumption in kilowatts to equivalent CO2 emissions, they showed that training a single NLP model emitted 300,000kg of CO2, equivalent to 125 round-trip flights between New York and Beijing.

Roel Dobbe, AI researcher at Delft University of Technology and formerly at the AI Now Institutev, explains that the carbon-intensiveness of many AI systems is driven by a belief in the power of what’s known as ‘big compute’. ‘In the AI field, there is a dominant but false belief that “bigger is better”, and that assumption drives the use of increased computation and bigger data sets in the development of AI models,’ he says. ‘As AI relies on more computational power, its carbon footprint increases.

‘AI and big computing are on an exponential trajectory,’ he adds. In 2018, OpenAI reported that ‘since 2012, the amount of compute used to train the largest AI systems has doubled every 3.4 months’. That equates to a 300,000-fold increase in the amount of computing power used in AI training runs. ‘It’s not just that you train these AI systems once: companies obtain data to keep training their AI systems, making some very carbon intensive,’ says Dobbe.


The Norwegian Covid-19 tracing app experiment revisited

Big Data & Society journal, Kristin B Sandvik


from

In this blog post, I want to contemplate a set of issues pertaining specifically to the legacy of Smittestopp, but of relevance to other Covid-19 tracing apps. This relates to how democratic government actors respond to criticism of digital initiatives in the context of emergencies; and the type of challenges civil society actors face in holding public and private sector actors accountable. For context, I begin by giving a recap (a longer version here) of the rise and fall of Smittestopp. All translations from Norwegian are my own.


Deadlines



Super excited to announce the NEW NSF Mid-Career Advancement (MCA) program with participation across five directorates: BIO, GEO, ENG, EHR, SBE.

“Solicitation NSF 21-516 with target date of 1 Feb 2021 (1/7 thread)

Tools & Resources



Building Netflix’s Distributed Tracing Infrastructure

Medium, Netflix Tech Blog


from

This is an example of a question our on-call engineers need to answer to help resolve a member issue — which is difficult when troubleshooting distributed systems. Investigating a video streaming failure consists of inspecting all aspects of a member account. In our previous blog post we introduced Edgar, our troubleshooting tool for streaming sessions. Now let’s look at how we designed the tracing infrastructure that powers Edgar.


The Future of Distributed Machine Learning

Coiled, Andreas Müller & Hugo Bowne-Anderson


from

“We recently chatted with Andy Müller, core developer of scikit-learn and Principal Research Software Development Engineer at Microsoft. Andy is one of the most influential minds in data science with a CV to match. He shares his thoughts on distributed machine learning with open-source tools like Dask-ML as well as proprietary tools from the big cloud providers.”


Careers


Full-time, non-tenured academic positions

Digital Solutions Developer



Harvard University, Harvard Graduate School of Education; Cambridge, MA

Associate Research Scientist, CCA (Joint with Cooper Union)



Simons Foundation/Flatiron Institute & Cooper Union; New York, NY

Leave a Comment

Your email address will not be published.