Data Science newsletter – January 29, 2022

Newsletter features journalism, research papers and tools/software for January 29, 2022

UN launches privacy lab pilot to unlock cross-border data sharing benefits

Information Age, Aaron Hurst

from January 25, 2022

The UN Committee of Experts on Big Data and Data Science for Official Statistics is launching a pilot lab programme, to make international data sharing more secure by using Privacy Enhancing Technologies (PETs)

Announced today at Dubai Expo 2020, the ‘UN PET Lab’ pilot programme will look to demonstrate how PETs can allow for fully compliant data sharing and insights between organisations, utilising publicly available trade data from UN Comtrade.

Four National Statistical Offices (NSOs) — the US Census Bureau; Statistics Netherlands; the Italian National Institute of Statistics; and the UK’s Office for National Statistics — will be involved in the project.

The PET Lab will see statistical bodies collaborate with tech providers that offer PET technologies.

It’s amazing to see how much 2i2c has grown and learned in the past year, under Chris’s thoughtful leadership.

Twitter, Ryan Abernathy, Chris Holdgraf

from January 25, 2022

The team at @2i2c_org has finished our first ~full year of operations! To celebrate, we’re writing a 3-post series describing our experiences and challenges.

Our first post: designing and piloting services for use-cases in research and education

More fun publisher surveillance: Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff between metadata from two of the same paper.

Twitter, Jonny Saunders

from January 25, 2022

Combined with access timestamps, they can uniquely identify the source of any shared PDFs.

In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions.

Twitter, Suchin Gururangan

from January 25, 2022

Web dumps are one of the key data sources for training LMs. But in their raw form, they have lots of undesirable content (e.g., boilerplate, hatespeech). Researchers typically apply tools called *quality filters* to select text considered “high quality” from these sources. /2

We argue that quality filtering implies a language ideology – a sociolinguistics term for a subjective belief about language use. These ideologies are often implicit/undocumented. What language is high quality enough to be included in the corpus? Whose language is excluded? /3

‘Be part of the change’: California data experts talk diversity

The Daily Californian student newspaper, Clara Rodas

from January 26, 2022

In an effort to make the data privacy sector more accessible to underrepresented communities, the campus Privacy Office hosted a panel with data privacy experts from across California on Wednesday.

Panelists spoke about their own experiences in the field while also giving prospective attendants advice regarding what actions to take if they wished to pursue a career in data privacy. Many panelists emphasized that having a background in data science, computer science or the law was not necessary and encouraged those in other fields to see how data privacy affected their own interests.

“There’s absolutely nothing that distinguishes me from anybody else who’s interested and wants to learn about data privacy,” said Thea Bullock, UC Irvine campus privacy official. “This field is not going to look the same in five years or 10 years — it will change and you can choose to come in the door and be part of the change.”

UC San Diego campus privacy officer Pegah Parsi recommended that prospects seek out people in the data privacy field and offer to help because privacy departments are often understaffed.

Dataverse Joins NIH in Increasing Access to Biomedical Data | Institute for Quantitative Social Science

Harvard University, Institute for Quantitative Social Science

from January 26, 2022

The Dataverse Project at IQSS is joining the Office of Data Science Strategy (ODSS) at the National Institutes of Health (NIH) and five other data repositories in launching a new data curation, sharing, and interoperability initiative. Through this collaboration, the Generalist Repository Ecosystem Initiative (GREI), the Dataverse Project plans to facilitate access to NIH-funded data by building on the existing Harvard Dataverse Repository. In order to supplement the NIH’s existing domain-specific repositories, the goal of the GREI is to expand its data ecosystem into additional repositories so that researchers can more easily and effectively find and share data from studies funded by the NIH.

Can emoji use be the key in detecting remote-work burnout?

University of Michigan, Michigan News

from January 26, 2022

[Qiaozhu] Mei and colleagues at the University of Michigan School of Information developed a strategy to not only monitor the emotional health of workers, but even predict work behaviors. In a new study in PLOS ONE, the team tracked emoji use as a marker of emotions, and tracked how the use of emoji in work communications can predict remote worker dropouts.

“We saw a report from GitHub about the status of developers at the early stage of the COVID-19 pandemic,” said study lead author Xuan Lu, research fellow at UMSI.

Developers were showing signals of burnout at the start of the pandemic. The report spurred the team to look at how to better track the state of mind of remote workers, she said.

Frontiers in Collective Intelligence: A Workshop Report

Complexity Digest;Tyler Millhouse, Melanie Moses, Melanie Mitchell

from January 23, 2022

In August of 2021, the Santa Fe Institute hosted a workshop on collective intelligence as part of its Foundations of Intelligence project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists, biologists, philosophers, social scientists, and others to share their insights about how intelligence can emerge from interactions among multiple agents–whether those agents be machines, animals, or human beings. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

AI in health and medicine

Nature Medicine; Pranav Rajpurkar, Emma Chen, Oishi Banerjee & Eric J. Topol

from January 20, 2022

Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human–AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI’s potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide.

MapLab: How to Make Fairer Congressional Maps

Bloomberg, CityLab, Laura Bliss

from January 26, 2022

With more than half of states having completed their newly redrawn congressional maps, a few trends have emerged from the U.S. redistricting process. One is there are fewer highly competitive seats up for grabs compared to the old maps, which Michael Li of the Brennan Center for Justice predicted in MapLab last November would likely advantage Republicans in coming elections.

Another is that the more politicized the redistricting process, the more partisan the resulting map. That’s according to the latest grades on the Redistricting Report Card, a project led by the Princeton Gerrymandering Project and RepresentUs that scores new maps using data-driven metrics. To learn more about that effort and what it reveals, I spoke with Joe Kabourek, senior campaign director at RepresentUs. This interview has been edited for clarity and length.

Searching for Susy Thunder

Vox Media, The Verge, Epic Magazine, Kristen Radtke and Amelia Holowaty Krales

from January 26, 2022

In the early ’80s, Susan and her friends pulled increasingly elaborate phone scams until they nearly shut down phone service for the entire city. As two of her friends, Kevin Mitnick and Lewis DePayne, were being convicted for cybercrime, she made an appearance on 20/20, demonstrating their tradecraft to Geraldo Rivera. Riding her celebrity, she went briefly legit, testifying before the US Senate and making appearances at security conventions, spouting technobabble in cowboy boots and tie-dye. Then, without a trace, she left the world behind.

I went looking for the great lost female hacker of the 1980s. I should have known that she didn’t want to be found.

Many academics and civil rights groups raised serious concerns about the Pattern algorithm back in early 2020. Yet here we are. See @civilrightsorg letter here

Twitter, Dr. Kate Crawford

from January 26, 2022

The algorithm, known as Pattern, overpredicted the risk that many Black, Hispanic and Asian people would commit new crimes or violate rules after leaving prison. https://n.pr/35rr3zJ

New dangers? Computers uncover 100,000 novel viruses in old genetic data

Science, Elizabeth Pennisi

from January 26, 2022

It took just one virus to cripple the world’s economy and kill millions of people; yet virologists estimate that trillions of still-unknown viruses exist, many of which might be lethal or have the potential to spark the next pandemic. Now, they have a new—and very long—list of possible suspects to interrogate. By sifting through unprecedented amounts of existing genomic data, scientists have uncovered more than 100,000 novel viruses, including nine coronaviruses and more than 300 related to the hepatitis Delta virus, which can cause liver failure.

“It’s a foundational piece of work,” says J. Rodney Brister, a bioinformatician at the National Center for Biotechnology Information’s National Library of Medicine who was not involved in the new study. The work expands the number of known viruses that use RNA instead of DNA for their genes by an order of magnitude. It also “demonstrates our outrageous lack of knowledge about this group of organisms,” says disease ecologist Peter Daszak, president of the EcoHealth Alliance, a nonprofit research group in New York City that is raising money to launch a global survey of viruses. The work will also help launch so-called petabyte genomics—the analyses of previously unfathomable quantities of DNA and RNA data. (One petabyte is 1015 bytes.)

UW’s new School of Computing expected to benefit multiple disciplines

Rawlins Times (WY), Abby Vander Graaf

from January 26, 2022

Somewhere between $1 million and $3 million of the university’s portion of American Rescue Plan Act money will go to the computing school, Baldwin said. This is federal money meant to help states recover economically and socially from the COVID-19 pandemic. Wyoming is set to receive about $1 billion in ARPA money over the next two years.

While the ARPA money jump started the program, the university had been planning on allocating money to the project since 2020.

The university also has been eliminating and merging other academic programs, which freed up some money for new programs. About $3 million is planned to be reallocated to the computing school from other areas in the budget.

Meta Gifts $1.75 Million to UMD to Boost Environmental Justice and Health Equity

University of Maryland, Maryland Today

from January 26, 2022

A $1.75 million gift from Meta (formerly Facebook) to the Center for Community Engagement, Environmental Justice and Health (CEEJH) in the University of Maryland’s School of Public Health will significantly bolster the group’s ability to advance environmental justice across the United States.

The gift will support new and ongoing activities led by CEEJH, including a new paid internship program, new staff hires and the annual University of Maryland Symposium on Environmental Justice (EJ) and Health Disparities, a multiday event for community organizers, policymakers and environmental health experts using innovative policy, legal and public health tools to address pressing EJ issues and help communities to advocate for themselves.

Yale researchers receive grant to develop novel epilepsy brain-computer chip treatment

Yale University, Yale Daily News student newspaper, Valentina Simon

from January 26, 2022

An interdisciplinary team of Yale researchers has designed brain-machine interface chips that, when implanted in humans, can reduce the rate of epileptic seizures.

More than three million people experience epileptic seizures in the United States, with 60 to 70 percent of patients able to successfully treat the condition with medicine. For the remaining individuals, surgically removing the parts of the brain where seizures arise, regardless of their role in everyday function, has been the only path toward mitigating the issue. A team of Yale computer scientists, engineers and surgeons have found that short-circuiting the path neurons fire during an epileptic seizure can successfully reduce the rate of seizures in patients. The Swebilius Foundation recently awarded the team a grant to continue its research.

“When the signature traits of a seizure are observed, the device stimulates that part of the brain, and it is not curative, but over time 60 percent of patients will get 50 percent fewer seizures than they had before,” said Dennis Spencer, professor emeritus of neurosurgery, who implants these brain-computer interface chips in patients.

OU Engineers Build a Molecular Framework to Bridge Experimental and Computer Sciences for Peptide-Based Materials Engineering

University of Oklahoma, Vice President for Research and Partnerships

from January 24, 2022

“In the peptide-engineering field, the general approach is to take those natural proteins and make incremental changes to identify the properties of the end aggregated products, and then find an application for which the identified properties would be useful,” [Handan] Acar said. “However, there are more than 500 natural and unnatural amino acids. Especially when you consider the size of the peptides, this approach is just not practical.”

Machine learning has great potential to counter this challenge, but Acar says the complex way peptides assemble and disassemble has prevented artificial intelligence methods from being effective so far.

“Clearly, computational methods, such as machine learning, are necessary,” she said. “Yet, the peptide aggregation is very complex. It is currently not possible to identify the effects of individual amino acids with computational methods.”

BU and Red Hat Announce First Research Incubation Awards

Boston University, BU Today

from January 25, 2022

For almost five years, Boston University and Red Hat, a leading provider of open-source computer software solutions, have collaborated to drive innovative research and education in open-source technology. Now that partnership has announced the first recipients of the Red Hat Collaboratory Research Incubation Awards. (Open source means that the original source code is made available for use or modification by users and developers.)

The awards are administered through BU’s Red Hat Collaboratory, housed within the Rafik B. Hariri Institute for Computing and Computational Science & Engineering, and Red Hat Research. “This collaborative model gives us the opportunity to increase the diversity and richness of open engineering and operations projects we undertake together, and also allows us to pursue fundamental research under one umbrella,” says Heidi Picher Dempsey, Red Hat research director, Northeast United States.

AAAS honors computer, data scientist Juliana Freire as Lifetime Fellow

EurekAlert!, NYU Tandon School of Engineering

from January 26, 2022

Freire joins four other NYU faculty in becoming 2021 Fellows, and two NYU Tandon Faculty who have received AAAS fellowships in the past.

Deadlines

Biodiversity Informatics Summary

You are being asked to voluntarily participate in a research study. We are doing this study to identify features of biodiversity informatics training valued by the biodiversity research, conservation, management, and education community. If you choose to participate, you will be asked to complete a survey that is anticipated to take up to 20 minutes. You will not be paid to take part in this study.

Sports.BradStenger.com

Data Science newsletter – January 29, 2022

Leave a Comment Cancel reply