An international team of researchers with an effort called the Zoonomia Project has analyzed and compared the whole genomes of more than 80 percent of all mammalian families, spanning almost 110 million years of evolution. The genomic dataset, published today in Nature, includes genomes from more than 120 species that were not previously sequenced, and captures mammalian diversity at an unprecedented scale.
The dataset is aimed at advancing human health research. Researchers can use the data to compare the genomes of humans and other mammals, which could help identify genomic regions that might be involved in human disease. The authors are also making the dataset available to the scientific community via the Zoonomia Project website, without any restrictions on use.
“The core idea for the project was to develop and use this data to help human geneticists figure out which mutations cause disease,” said co-senior author Kerstin Lindblad-Toh
Since the first bird evolved more than 150 million years ago, its descendants have adapted to a vast range of ecological niches, giving rise to tiny, hovering hummingbirds, plunge-diving pelicans and showy birds-of-paradise. Today, more than 10,000 species of birds live on the planet—and now scientists are well on their way to capturing a complete genetic portrait of that diversity.
In the Nov. 11 issue of the journal Nature, scientists from the Smithsonian Institution, the University of Copenhagen, BGI-Shenzen, the University of California, Santa Cruz and approximately 100 other institutions report on the genomes of 363 species of birds, including 267 that have been sequenced for the first time. The studied species—from widespread, economically important birds such as the chicken to the lesser known Henderson crake, which lives only on one small island in the Pacific Ocean—represent more than 92% of the world’s avian families. The data from the study will advance research on the evolution of birds and aids in the conservation of threatened bird species.
Last month Nature published a damning response written by 31 scientists to a study from Google Health that had appeared in the journal earlier this year. Google was describing successful trials of an AI that looked for signs of breast cancer in medical images. But according to its critics, the Google team provided so little information about its code and how it was tested that the study amounted to nothing more than a promotion of proprietary tech.
“We couldn’t take it anymore,” says Benjamin Haibe-Kains, the lead author of the response, who studies computational genomics at the University of Toronto. “It’s not about this study in particular—it’s a trend we’ve been witnessing for multiple years now that has started to really bother us.”
Haibe-Kains and his colleagues are among a growing number of scientists pushing back against a perceived lack of transparency in AI research. “When we saw that paper from Google, we realized that it was yet another example of a very high-profile journal publishing a very exciting study that has nothing to do with science,” he says. “It’s more an advertisement for cool technology. We can’t really do anything with it.”
On Thursday, the Office of the Privacy Commissioner of Canada (OPC) released recommendations for regulating artificial intelligence in Canada, calling for legislation that would amend the Personal Information Protection and Electronic Documents Act (PIPEDA).
The recommendations from commissioner Daniel Therrien are the result of a public consultation that was launched earlier this year.
“Artificial intelligence has immense promise, but it must be implemented in ways that respect privacy, equality and other human rights,” said Therrien. “A rights-based approach will support innovation and the responsible development of artificial intelligence.”
The Government Accountability Office’s forthcoming artificial intelligence oversight framework will help auditors work with inspectors general to continuously monitor executive agencies’ progress with the technology, says the agency’s first-ever chief data scientist.
Right now GAO is developing a “wireframe” of what oversight might look like in the areas of explainability and transparency, bias and fairness, integrity and resilience, and data quality and lineage, said Taka Ariga, who also directs his agency’s Innovation Lab.
This year marks the tenth anniversary of Globus, which launched at SC10 as the “Globus Online” service to connect researchers and make large-scale data transfer accessible to any researcher with an Internet connection and a laptop. It would have been difficult to predict back then that Globus would become an essential service for over 150,000 thousand researchers around the world. Users in 80 countries have moved over one exabyte of data and 100 billion files, and the service has evolved into a platform that enables universities, national laboratories, government facilities, and commercial organizations to securely manage data throughout the research lifecycle.
As Vittert wrote in a recent editorial, “How were all the pollsters so wrong, again, even after the soul searching and methodological recalibrating that followed 2016?”
The answer, as she explained on Wednesday’s St. Louis on the Air, is that pollsters repeated one of their 2016 mistakes — and made a new one as well.
“It’s the same mistake in the sense that the pollsters really underestimated the white, rural, middle-America vote for Trump,” she said. “And the new mistake is that they did a very bad job of figuring out who the Hispanic vote was going to go for. … In Florida, in Miami, they didn’t think the Hispanic vote was going to go for Trump, and in fact, he actually won significantly more of the Hispanic vote than he got in 2016.
University of Cambridge, Cambridge Centre for AI in Medicine
from
Today the University of Cambridge announces a five-year agreement with AstraZeneca and GSK to fund the Cambridge Centre for AI in Medicine (CCAIM). For the 5-year duration, AstraZeneca and GSK will support five new PhD studentships per year. This program will enable the best and brightest young minds in machine learning and bioscience to partner with leaders in industry and academia, wherever they may be in the world.
CCAIM has been set up as a cutting-edge research group. Its faculty of 10 University of Cambridge researchers – in addition to world-class PhD students, currently being recruited – have united to develop AI and machine learning (ML) technologies aiming to transform clinical trials, personalised medicine and biomedical discovery.
The centre’s Director is Professor Mihaela van der Schaar, a world leading researcher in machine learning (ML), and the Co-Director is researcher-clinician Professor Andres Floto (bios below).
A revolution in soybean breeding for root traits has begun, with the presentation of a successful new “pipeline” involving machine learning and computer vision.
When crop breeders long ago learned of single nucleotide polymorphisms — SNPs, differences in a single building block/nucleotide such as cytosine in place of thymine, in a given stretch of DNA — they have wanted to be able to correlate these differences with plant traits, taking breeding to an entirely new level.
Now, building on past achievements in the field, a team of scientists in the departments of Agronomy and Mechanical Engineering at Iowa State University and at the U.S. Department of Agriculture (USDA) Agricultural Research Service have published results on their new soybean breeding framework that links genetic information with root traits using computer vision and machine learning (ML) tools.
Harvard Data Science Review; Eric D. Kolaczyk, Haviland Wright, and Masanao Yajima
from
Much of the current post-secondary training in core data science fields treats ‘practice’ as something to be relegated to capstone projects or other final preparations before students leave their programs. Here we argue for a paradigm shift, placing a so-called Practicum course at the center of a data science program, intentionally organized as a hybrid between an educational classroom and an industry-like environment. As a case study, we detail our experience of the past five years developing the Statistics Practicum in Boston University’s MS in Statistical Practice (MSSP) program. We describe the motivation, organization, and logistics of our Practicum, as well as both successes and challenges we have faced. In particular, the challenge of fairly and effectively assessing student achievement and program impact in this novel setting is discussed. [Just Accepted – Preview]
As machine translation (MT) improves, it becomes more critical to find a way to measure incremental gains. There is a general consensus that existing evaluation methods do not fit the bill; and, in fact, can even promote the much-maligned simplistic “translationese” that MT tends to produce. Researchers in both academia and industry, therefore, continue to explore alternatives.
Power player Google’s recent research includes work in MT evaluation, such as extending BLEURT beyond English, and the creation of a large-scale knowledge base for 18 languages, for use without reference translations.
Microsoft is teaming up with other companies to form an alliance to advance the field of DNA data storage, which promises to revolutionize the way vital records are kept for the long haul.
The founding members of the DNA Data Storage Alliance, unveiled today at the Flash Memory Summit, include Microsoft as well as Twist Bioscience, Illumina and Western Digital. Twist Bioscience has been partnering with Microsoft and the University of Washington for years on projects aimed at harnessing synthetic DNA for data storage.
Microsoft Research and UW’s Molecular Information Systems Lab have already demonstrated a fully automated DNA-based data storage and retrieval system — and in league with Twist, they’ve shown that their system can store a gigabyte of data in a DNA-based archive.
Georgia State University News Hub, Andrew Young School of Policy Studies
from
Georgia State University Library faculty Bryan Sinclair and Mandy Swygart-Hobaugh have been awarded a $150,000 grant from the New America Public Interest Technology University Network to expand programs that promote “data literacy for the public good” and encourage a more diverse pipeline of students pursuing careers in data science.
The Public Interest Data Literacy (PIDLit) initiative, led by the Georgia State Library with multiple campus partners, will expand programs promoting data literacy and career preparedness with a focus on reaching first-year students and underrepresented groups. The Library provides support and training on a host of data analysis tools and methods along with opportunities for students to network with the broader Atlanta community around practical applications of data science.
Oregon State University brought its TRACE Community COVID-19 testing program to Eugene, sending three-member teams – one OSU student, one UO student and one professional –to city neighborhoods to collect nasal-swab samples from hundreds of residents and sewage samples from around Eugene and Springfield. This will further expand TRACE’s coverage, which includes five similar sweeps in Corvallis, as well as some study in Bend, Hermiston and Newport. TRACE will be working in tandem with UO’s Monitoring and Assessment Program (MAP).
Online November 30-December 2. ”
Adding context to data turns it to information. Processing information turns it to knowledge. The keys to these transformations are connections and metadata. Graph Databases, Semantic Technology, Knowledge Graphs and Graph AI are solid building blocks that support this process, Get your Building Blocks.” [$$$]
“The National Science Foundation (NSF) Computer and Information Science and Engineering (CISE) directorate just announced a new newsletter that will share ‘periodic updates about CISE and NSF broadly, including up-to-date information about [their] newest programs and activities.'”
Toyota’s CSRC is investing $1 million in research projects focused on creating systems that are safe and efficient. These projects will focus on enabling safer and more efficient mobility systems by exploring driver behavior in different environments, monitoring driver health and identifying driver error when interacting with advanced driver assistance systems (ADAS) technologies.
Data from each project will be shared across the institutions to help speed research, with the results made public to support the advancement of auto safety industrywide.
Virtual peer-to-peer payments are the currency of the pandemic: the coin of the Covid-19 realm. In a world where no one wants to come close enough to exchange money (and possibly germs too), use of payment apps like PayPal, Venmo, Cash App and Zelle has exploded.
Between April and June, Paypal processed more than $220 billion in transactions, a nearly 30% year-over-year increase. Similarly, in the second quarter of 2020, Venmo, which is owned by PayPal, processed $37 billion in transactions, a 52% increase over the year before. And in July, Early Warning Services, the parent company of the mobile payment platform Zelle, reported that the number of users who had sent a payment through the service in the last 90 days was up 43% over the previous year.
But while payment apps can help protect users from the dangers of person-to-person Covid-19 transmission, these virtual systems introduce problems of their own. Using a payment app of any kind, no matter how privacy-preserving it is, always introduces an intermediary: an agent that uses data to execute a demand. Apps can’t send money if they don’t have some kind of access to the account from which users send and receive money. And not all apps treat that data the same way.