Data Science newsletter – November 16, 2020

Newsletter features journalism, research papers and tools/software for November 16, 2020

GROUP CURATION: N/A

Largest set of mammalian genomes reveals species at risk of extinction

Broad Institute, News and Media

from November 11, 2020

An international team of researchers with an effort called the Zoonomia Project has analyzed and compared the whole genomes of more than 80 percent of all mammalian families, spanning almost 110 million years of evolution. The genomic dataset, published today in Nature, includes genomes from more than 120 species that were not previously sequenced, and captures mammalian diversity at an unprecedented scale.

The dataset is aimed at advancing human health research. Researchers can use the data to compare the genomes of humans and other mammals, which could help identify genomic regions that might be involved in human disease. The authors are also making the dataset available to the scientific community via the Zoonomia Project website, without any restrictions on use.

“The core idea for the project was to develop and use this data to help human geneticists figure out which mutations cause disease,” said co-senior author Kerstin Lindblad-Toh

Scientists Release Genomes of Birds Representing Nearly All Avian Families

Smithsonian Institution, News Release

from November 11, 2020

Since the first bird evolved more than 150 million years ago, its descendants have adapted to a vast range of ecological niches, giving rise to tiny, hovering hummingbirds, plunge-diving pelicans and showy birds-of-paradise. Today, more than 10,000 species of birds live on the planet—and now scientists are well on their way to capturing a complete genetic portrait of that diversity.

In the Nov. 11 issue of the journal Nature, scientists from the Smithsonian Institution, the University of Copenhagen, BGI-Shenzen, the University of California, Santa Cruz and approximately 100 other institutions report on the genomes of 363 species of birds, including 267 that have been sequenced for the first time. The studied species—from widespread, economically important birds such as the chicken to the lesser known Henderson crake, which lives only on one small island in the Pacific Ocean—represent more than 92% of the world’s avian families. The data from the study will advance research on the evolution of birds and aids in the conservation of threatened bird species.

AI is wrestling with a replication crisis

MIT Technology Review, Will Douglas Heaven

from November 12, 2020

Last month Nature published a damning response written by 31 scientists to a study from Google Health that had appeared in the journal earlier this year. Google was describing successful trials of an AI that looked for signs of breast cancer in medical images. But according to its critics, the Google team provided so little information about its code and how it was tested that the study amounted to nothing more than a promotion of proprietary tech.

“We couldn’t take it anymore,” says Benjamin Haibe-Kains, the lead author of the response, who studies computational genomics at the University of Toronto. “It’s not about this study in particular—it’s a trend we’ve been witnessing for multiple years now that has started to really bother us.”

Haibe-Kains and his colleagues are among a growing number of scientists pushing back against a perceived lack of transparency in AI research. “When we saw that paper from Google, we realized that it was yet another example of a very high-profile journal publishing a very exciting study that has nothing to do with science,” he says. “It’s more an advertisement for cool technology. We can’t really do anything with it.”

Privacy commissioner shares recommendations for regulating artificial intelligence

BetaKit, Meagan Simpson

from November 12, 2020

On Thursday, the Office of the Privacy Commissioner of Canada (OPC) released recommendations for regulating artificial intelligence in Canada, calling for legislation that would amend the Personal Information Protection and Electronic Documents Act (PIPEDA).

The recommendations from commissioner Daniel Therrien are the result of a public consultation that was launched earlier this year.

“Artificial intelligence has immense promise, but it must be implemented in ways that respect privacy, equality and other human rights,” said Therrien. “A rights-based approach will support innovation and the responsible development of artificial intelligence.”

GAO says AI oversight framework will help in continuously monitoring agencies

FedScoop, Dave Nyczepir

from November 11, 2020

The Government Accountability Office’s forthcoming artificial intelligence oversight framework will help auditors work with inspectors general to continuously monitor executive agencies’ progress with the technology, says the agency’s first-ever chief data scientist.

Right now GAO is developing a “wireframe” of what oversight might look like in the areas of explainability and transparency, bias and fairness, integrity and resilience, and data quality and lineage, said Taka Ariga, who also directs his agency’s Innovation Lab.

Globus Celebrates Ten Years of Connecting the Research Universe

AiThority, AiT News Desk

from November 12, 2020

This year marks the tenth anniversary of Globus, which launched at SC10 as the “Globus Online” service to connect researchers and make large-scale data transfer accessible to any researcher with an Internet connection and a laptop. It would have been difficult to predict back then that Globus would become an essential service for over 150,000 thousand researchers around the world. Users in 80 countries have moved over one exabyte of data and 100 billion files, and the service has evolved into a platform that enables universities, national laboratories, government facilities, and commercial organizations to securely manage data throughout the research lifecycle.

Wash U Professor Explains Why Political Polls Were Wrong — Again

St. Louis Public Radio, Emily Woodbury

from November 11, 2020

As Vittert wrote in a recent editorial, “How were all the pollsters so wrong, again, even after the soul searching and methodological recalibrating that followed 2016?”

The answer, as she explained on Wednesday’s St. Louis on the Air, is that pollsters repeated one of their 2016 mistakes — and made a new one as well.

“It’s the same mistake in the sense that the pollsters really underestimated the white, rural, middle-America vote for Trump,” she said. “And the new mistake is that they did a very bad job of figuring out who the Hispanic vote was going to go for. … In Florida, in Miami, they didn’t think the Hispanic vote was going to go for Trump, and in fact, he actually won significantly more of the Hispanic vote than he got in 2016.

Cambridge Centre for AI in Medicine announces its official launch

University of Cambridge, Cambridge Centre for AI in Medicine

from November 11, 2020

Today the University of Cambridge announces a five-year agreement with AstraZeneca and GSK to fund the Cambridge Centre for AI in Medicine (CCAIM). For the 5-year duration, AstraZeneca and GSK will support five new PhD studentships per year. This program will enable the best and brightest young minds in machine learning and bioscience to partner with leaders in industry and academia, wherever they may be in the world.

CCAIM has been set up as a cutting-edge research group. Its faculty of 10 University of Cambridge researchers – in addition to world-class PhD students, currently being recruited – have united to develop AI and machine learning (ML) technologies aiming to transform clinical trials, personalised medicine and biomedical discovery.

The centre’s Director is Professor Mihaela van der Schaar, a world leading researcher in machine learning (ML), and the Co-Director is researcher-clinician Professor Andres Floto (bios below).

Machine Learning and Computer Vision is Bolstering Breeding

Seed World, Treena Hein

from November 09, 2020

A revolution in soybean breeding for root traits has begun, with the presentation of a successful new “pipeline” involving machine learning and computer vision.

When crop breeders long ago learned of single nucleotide polymorphisms — SNPs, differences in a single building block/nucleotide such as cytosine in place of thymine, in a given stretch of DNA — they have wanted to be able to correlate these differences with plant traits, taking breeding to an entirely new level.

Now, building on past achievements in the field, a team of scientists in the departments of Agronomy and Mechanical Engineering at Iowa State University and at the U.S. Department of Agriculture (USDA) Agricultural Research Service have published results on their new soybean breeding framework that links genetic information with root traits using computer vision and machine learning (ML) tools.

Placing ‘Practice’ at the Center of Data Science Education

Harvard Data Science Review; Eric D. Kolaczyk, Haviland Wright, and Masanao Yajima

from November 12, 2020

Much of the current post-secondary training in core data science fields treats ‘practice’ as something to be relegated to capstone projects or other final preparations before students leave their programs. Here we argue for a paradigm shift, placing a so-called Practicum course at the center of a data science program, intentionally organized as a hybrid between an educational classroom and an industry-like environment. As a case study, we detail our experience of the past five years developing the Statistics Practicum in Boston University’s MS in Statistical Practice (MSSP) program. We describe the motivation, organization, and logistics of our Practicum, as well as both successes and challenges we have faced. In particular, the challenge of fairly and effectively assessing student achievement and program impact in this novel setting is discussed. [Just Accepted – Preview]

Google Explores New Way to Improve Google Translate: Paraphrasing by Humans

Slator, Seyma Albarino

from November 12, 2020

As machine translation (MT) improves, it becomes more critical to find a way to measure incremental gains. There is a general consensus that existing evaluation methods do not fit the bill; and, in fact, can even promote the much-maligned simplistic “translationese” that MT tends to produce. Researchers in both academia and industry, therefore, continue to explore alternatives.

Power player Google’s recent research includes work in MT evaluation, such as extending BLEURT beyond English, and the creation of a large-scale knowledge base for 18 languages, for use without reference translations.

Microsoft helps found an industry alliance to advance DNA data storage systems

GeekWire, Alan Boyle

from November 12, 2020

Microsoft is teaming up with other companies to form an alliance to advance the field of DNA data storage, which promises to revolutionize the way vital records are kept for the long haul.

The founding members of the DNA Data Storage Alliance, unveiled today at the Flash Memory Summit, include Microsoft as well as Twist Bioscience, Illumina and Western Digital. Twist Bioscience has been partnering with Microsoft and the University of Washington for years on projects aimed at harnessing synthetic DNA for data storage.

Microsoft Research and UW’s Molecular Information Systems Lab have already demonstrated a fully automated DNA-based data storage and retrieval system — and in league with Twist, they’ve shown that their system can store a gigabyte of data in a DNA-based archive.

Georgia State University Library to Lead New Public Interest Data Literacy Initiative

Georgia State University News Hub, Andrew Young School of Policy Studies

from November 12, 2020

Georgia State University Library faculty Bryan Sinclair and Mandy Swygart-Hobaugh have been awarded a $150,000 grant from the New America Public Interest Technology University Network to expand programs that promote “data literacy for the public good” and encourage a more diverse pipeline of students pursuing careers in data science.

The Public Interest Data Literacy (PIDLit) initiative, led by the Georgia State Library with multiple campus partners, will expand programs promoting data literacy and career preparedness with a focus on reaching first-year students and underrepresented groups. The Library provides support and training on a host of data analysis tools and methods along with opportunities for students to network with the broader Atlanta community around practical applications of data science.

OSU Trace Program Comes to Eugene

The Corvallis Advocate, John M. Burt

from November 12, 2020

Oregon State University brought its TRACE Community COVID-19 testing program to Eugene, sending three-member teams – one OSU student, one UO student and one professional –to city neighborhoods to collect nasal-swab samples from hundreds of residents and sewage samples from around Eugene and Springfield. This will further expand TRACE’s coverage, which includes five similar sweeps in Corvallis, as well as some study in Bend, Hermiston and Newport. TRACE will be working in tandem with UO’s Monitoring and Assessment Program (MAP).

Tweet of the Week

Twitter, Guido van Rossum

from November 12, 2020

Events

Knowledge Connections 2020

KnowCon

from November 30, 2020

Online November 30-December 2. ”
Adding context to data turns it to information. Processing information turns it to knowledge. The keys to these transformations are connections and metadata. Graph Databases, Semantic Technology, Knowledge Graphs and Graph AI are solid building blocks that support this process, Get your Building Blocks.” [$$$]

Tools & Resources

Subscribe to the New NSF CISE Newsletter

Computing Community Consortium, CCC Blog

from November 10, 2020

“The National Science Foundation (NSF) Computer and Information Science and Engineering (CISE) directorate just announced a new newsletter that will share ‘periodic updates about CISE and NSF broadly, including up-to-date information about [their] newest programs and activities.'”

Toyota’s Collaborative Safety Research Center to Launch New System Usability Research with Partners including University of Michigan and State Farm

PR Newswire, Toyota Motor North America

from November 12, 2020

Toyota’s CSRC is investing $1 million in research projects focused on creating systems that are safe and efficient. These projects will focus on enabling safer and more efficient mobility systems by exploring driver behavior in different environments, monitoring driver health and identifying driver error when interacting with advanced driver assistance systems (ADAS) technologies.

Data from each project will be shared across the institutions to help speed research, with the results made public to support the advancement of auto safety industrywide.

How private is my pay app?

CNN Underscored, The Markup, Sara Harrison

from November 12, 2020

Virtual peer-to-peer payments are the currency of the pandemic: the coin of the Covid-19 realm. In a world where no one wants to come close enough to exchange money (and possibly germs too), use of payment apps like PayPal, Venmo, Cash App and Zelle has exploded.

Between April and June, Paypal processed more than $220 billion in transactions, a nearly 30% year-over-year increase. Similarly, in the second quarter of 2020, Venmo, which is owned by PayPal, processed $37 billion in transactions, a 52% increase over the year before. And in July, Early Warning Services, the parent company of the mobile payment platform Zelle, reported that the number of users who had sent a payment through the service in the last 90 days was up 43% over the previous year.

But while payment apps can help protect users from the dangers of person-to-person Covid-19 transmission, these virtual systems introduce problems of their own. Using a payment app of any kind, no matter how privacy-preserving it is, always introduces an intermediary: an agent that uses data to execute a demand. Apps can’t send money if they don’t have some kind of access to the account from which users send and receive money. And not all apps treat that data the same way.

Careers

Postdocs

Postdoctoral Researcher, Responsible Computing

Max Planck Institute for Security and Privacy; Bochum, Germany

Full-time positions outside academia

Scientist, NGS Bioinformatics

Dana-Farber Cancer Institute; Boston, MA

Tenured and tenure track faculty positions

Assistant/Associate Professor

Dana Farber Cancer Institute, Harvard Medical School; Boston, MA

Sports.BradStenger.com

Data Science newsletter – November 16, 2020

Leave a Comment Cancel reply