Data Science newsletter – September 1, 2021

Newsletter features journalism, research papers and tools/software for September 1, 2021

 

Open Source Is Throwing AI Policymakers For A Loop Machine learning isn’t just for big companies any more

IEEE Spectrum, Ned Potter


from

Depending on whom you ask, artificial intelligence may someday rank with fire and the printing press as technology that shaped human history. The jobs AI does today—carrying out our spoken commands, curing disease, approving loans, recommending who gets a long prison sentence, and so on—are nothing compared to what it might do in the future.

But who is drawing the roadmap? Who’s making sure AI technologies are used ethically and for the greater good? Big tech companies? Governments? Academic researchers? Young upstart developers? Governing AI has gotten more and more complicated, in part, because hidden in the AI revolution is a second one. It’s the rise of open-source AI software—code that any computer programmer with fairly basic knowledge can freely access, use, share and change without restriction. With more programmers in the mix, the open-source revolution has sped AI development substantially. According to one study, in fact, 50 to 70 percent of academic papers on machine learning rely on open source.


“Rewilding Your Attention,” or, why it’s good to step away from the recommendation-feeds of big tech

Twitter, Clive Thompson


from

An essay by me, riffing off a great blog post by @CJEller3
, who introduced me to that phrase via @tomcritchlow
, tweeting about a fab essay by @ali_montag


The secret bias hidden in mortgage-approval algorithms

Associated Press, The Markup, Emmanuel Martinez and Lauren Kircher


from

An investigation by The Markup has found that lenders in 2019 were more likely to deny home loans to people of color than to white people with similar financial characteristics — even when we controlled for newly available financial factors the mortgage industry for years has said would explain racial disparities in lending.


University of Washington and Microsoft researchers develop “nanopore-tal” enabling cells to talk to computers

University of Washington, Allen School News


from

Genetically encoded reporter proteins have been a mainstay of biotechnology research, allowing scientists to track gene expression, understand intracellular processes and debug engineered genetic circuits. But conventional reporting schemes that rely on fluorescence and other optical approaches come with practical limitations that could cast a shadow over the field’s future progress. Now, thanks to a team of researchers at the University of Washington and Microsoft, scientists are about to see reporter proteins in a whole new light.

In a paper published today in the journal Nature Biotechnology, members of the Molecular Information Systems Laboratory housed at the UW’s Paul G. Allen School of Computer Science & Engineering introduce a new class of reporter proteins that can be directly read by a commercially available nanopore sensing device. The new system ― dubbed “Nanopore-addressable protein Tags Engineered as Reporters,” also known as NanoporeTERs or NTERs for short ― can perform multiplexed detection of protein expression levels from bacterial and human cell cultures far beyond the capacity of existing techniques.


The Huge Payoff Of Extreme Co-Design In Molecular Dynamics

The Next Platform, Timothy Prickett Morgan


from

When money is really no object, and the budget negotiations involve taking a small slice of your personal net worth of $7.5 billion out of one pocket and putting it into another, and you have the technical chops to understand the complexities of molecular dynamics, and have a personal mission to cure disease, then you can build any damned supercomputer you want.

So that is what David Shaw, former computer science professor at Columbia University and quant for the hedge fund that bears his name, has done once again with the Anton 3 system.

The Anton 3 processors, their interconnect, and the complete Anton 3 system, which were delivered last September to DE Shaw Research, the scientific and computer research arm of the Shaw empire, were on display at the Hot Chips 33 conference this week. The system is a shining testament to the co-design of hardware and software and finely tuning both for the modeling of very specific physical processes to better understand them.


Artist Uses AI to Create Realistic Portraits of Famous Historical Figures

My Modern Met, Margherita Cole


from

Have you ever wondered what famous historical figures like Nefertiti and Cleopatra looked like in real life? Well, Bas Uterwijk might be able to show you a pretty good guess. The Dutch photographer and digital artist creates amazing AI portraits of famous historical figures using innovative neural network reconstructions. His most recent additions to the ongoing series transport viewers to ancient Egypt, the Renaissance, and 18th-century Europe, among other time periods.

To create these portraits, Uterwijk uploads numerous references of the person’s likeness to the AI applications. Then, he makes small adjustments to the program until he is satisfied with the result. “These ‘Deep Learning’ networks are trained with thousands of photographs of human faces and are able to create near-photorealistic people from scratch or fit uploaded faces in a ‘Latent Space’ of a total of everything the model has learned,” Uterwijk explains. “I think the human face hasn’t changed dramatically over thousands of years and apart from hairstyles and makeup, people that lived long ago probably looked very much like us, but we are used to seeing them in the often distorted styles of ancient art forms that existed long before the invention of photography.”


U.S. SEC to scrutinize firms’ digital-engagement practices as investor worries grow

Reuters, Katanga Johnson and Chris Prentice


from

The U.S. Securities and Exchange Commission (SEC) will seek input on whether digital customer engagement innovations used by financial firms should be governed by existing rules or may need new ones, commission chair Gary Gensler told Reuters.

While the SEC’s thinking on the subject is at an “early stage,” its rules may need updating to account for an artificial intelligence-led revolution in predictive analytics, differential marketing and behavioral prompts designed to optimize customer engagement, he said.

The SEC plans to launch a sweeping consultation in coming days that could have major ramifications for retail brokers, wealth managers and robo-advisers, which increasingly use such tools to drive customers to higher-revenue products.

“We’re at a transformational time. I really believe data analytics and AI can bring a lot of positives, but it means we should look back and think about what does this mean for user interface, user engagement, fairness and bias,” said Gensler. “What does it mean about rules written in an earlier era?”


OSU Distinguished Professor receives $7M for wide-ranging study of biological impact of chemicals

Oregon State University, Newsroom


from

An Oregon State University toxicologist has received a $7 million grant to study the biological impacts of chemicals, which could potentially lead to reducing or eliminating the need for chemical testing on animals

Robyn Tanguay, a distinguished professor in Oregon State University’s College of Agricultural Sciences, is the recipient of the eight-year grant in the field of predictive toxicology from the National Institute of Environmental Health Sciences.

For the past 20 years, Tanguay has pioneered the use of zebrafish in toxicology research, publishing more than 200 papers and leading several large teams from her 17,000-square-foot Sinnhuber Aquatic Research Laboratory.

Zebrafish are ideal for toxicology research because they are vertebrates that grow incredibly fast — a tiny egg will become a recognizable fish in just 24 hours.


UW, Carnegie Mellon to pioneer platforms that harness astrophysical data to unravel the universe’s mysteries | UW News

University of Washington, UW News


from

The University of Washington and Carnegie Mellon University have announced an expansive, multiyear collaboration to create new software platforms to analyze large astronomical datasets generated by the upcoming Legacy Survey of Space and Time, or LSST, which will be carried out by the Vera C. Rubin Observatory in northern Chile. The open-source platforms are part of the new LSST Interdisciplinary Network for Collaboration and Computing — known as LINCC — and will fundamentally change how scientists use modern computational methods to make sense of big data.

Through the LSST, the Rubin Observatory, a joint initiative of the National Science Foundation and the Department of Energy, will collect and process more than 20 terabytes of data each night — and up to 10 petabytes each year for 10 years — and will build detailed composite images of the southern sky. Over its expected decade of observations, astrophysicists estimate the Department of Energy’s LSST Camera will detect and capture images of an estimated 30 billion stars, galaxies, stellar clusters and asteroids. Each point in the sky will be visited around 1,000 times over the survey’s 10 years, providing researchers with valuable time series data.


Purdue, Purdue Research Foundation launch ‘Lab to Life’ 6G digital innovation at Discovery Park District

Purdue University, Research Foundation News


from

The Purdue Research Foundation and Purdue University on Tuesday (Aug. 24) announced the launch of the “Lab to Life” (L2L) digital innovation platform in the Discovery Park District adjacent to the university campus. The launch is the next step in creating one of the most connected innovation communities in the nation through the evolution from 5G to 6G.

“6G will go from a domain to a foundation to many vertical domains, from one network to a heterogeneous network of networks. From dynamic sharing and new spectra to edge intelligence and open architecture, 6G research needs to be accelerated through deployment in the U.S.,” said Mung Chiang, Purdue’s executive vice president for strategic initiatives and the John A. Edwardson Dean of the College of Engineering. “This group of prominent industry partners and their leaders will turn Purdue’s Discovery Park District into the country’s first at-scale 6G deployment zone, and, along with other essential elements like learning and the arts, create a work-live-play connected community.”


New imaging, machine-learning methods speed effort to reduce crops’ need for water

University of Illinois, Illinois News Bureau


from

Scientists have developed and deployed a series of new imaging and machine-learning tools to discover attributes that contribute to water-use efficiency in crop plants during photosynthesis and to reveal the genetic basis of variation in those traits.

The findings are described in a series of four research papers led by University of Illinois Urbana-Champaign graduate students Jiayang (Kevin) Xie and Parthiban Prakash, and postdoctoral researchers John Ferguson, Samuel Fernandes and Charles Pignon.

The goal is to breed or engineer crops that are better at conserving water without sacrificing yield, said Andrew Leakey, a professor of plant biology and of crop sciences at the University of Illinois Urbana-Champaign, who directed the research.


Computer Scientist Explains Machine Learning in 5 Levels of Difficulty

YouTube, WIRED


from

WIRED has challenged computer scientist and Hidden Door cofounder and CEO Hilary Mason to explain machine learning to 5 different people; a child, teen, a college student, a grad student and an expert. [video, 26:08]


Merrill College, iSchool to Offer ‘From-Anywhere’…

University of Maryland, Maryland Today


from

The University of Maryland’s Philip Merrill College of Journalism and College of Information Studies (iSchool) will begin accepting applications this fall for a from-anywhere data journalism master’s degree launching in the Fall 2022 semester.

Courses will be offered online and in person; students can complete the degree entirely virtually, in person or via a combination of the two.


The $3.5 trillion budget bill could transform the US power sector – and slash climate pollution

MIT Technology Review, James Temple


from

In the coming weeks, Congress may pass one of the most important climate policies in US history.

The $3.5 trillion budget plan includes a provision known as the Clean Electricity Payment Program, which would use payments and penalties to encourage utilities to increase the share of electricity they sell from carbon-free sources each year. If it works as hoped, the legislation would ensure that the power sector generates 80% of its electricity from sources like wind, solar, and nuclear plants by 2030, cutting more than a billion tons of annual greenhouse-gas emissions.

The measure would mark a foundational step in President Joe Biden’s ambitious climate plan, which aims to put the nation on track to eliminate climate pollution from electricity generation by 2035—and achieve net-zero emissions across the economy by midcentury.

There are real questions, though, about whether the program will achieve its aggressive targets. How the nation’s complex electricity sector actually responds will depend heavily on how the agency that oversees the program implements it, and particu


Excellent thread. Basically no one is qualified to run a university because it’s virtually impossible to do, even ordinarily. But the logic of leadership is necessary for the corpatization of universities (making them seem ‘runnable’).

Twitter, Brendan Cantwell, Matt Johnson


from

Stress that accompanied campus positions (faculty and staff alike) turned into disillusionment. Senior leaders either weren’t up for the task, overwhelmed themselves, or unsure how to help. Their responses to the groundswell of burnout and disenfranchisement were lackluster (2/8)


The Brain Doesn’t Think the Way You Think It Does

Quanta Magazine, Jordana Cepelewicz


from

Neuroscientists are the cartographers of the brain’s diverse domains and territories — the features and activities that define them, the roads and highways that connect them, and the boundaries that delineate them. Toward the front of the brain, just behind the forehead, is the prefrontal cortex, celebrated as the seat of judgment. Behind it lies the motor cortex, responsible for planning and coordinating movement. To the sides: the temporal lobes, crucial for memory and the processing of emotion. Above them, the somatosensory cortex; behind them, the visual cortex.

Not only do researchers often depict the brain and its functions much as mapmakers might draw nations on continents, but they do so “the way old-fashioned mapmakers” did, according to Lisa Feldman Barrett, a psychologist at Northeastern University. “They parse the brain in terms of what they’re interested in psychologically or mentally or behaviorally,” and then they assign the functions to different networks of neurons “as if they’re Lego blocks, as if there are firm boundaries there.”

But a brain map with neat borders is not just oversimplified — it’s misleading.


The why, what and how of predicting evolution across biology: from disease to biotechnology to biodiversity

EcoEvoRxiv, Mieke T. Wortel et al.


from

Evolution has traditionally been a historical field of study and predicting evolution has long been considered challenging or even impossible. However, evolutionary predictions are increasingly being made and used in many situations in medicine, agriculture, biotechnology and conservation biology. Because every field uses their own language and makes predictions from their background, researchers are not always aware of the breadth of evolutionary predictions. Evolutionary predictions may be used for several purposes such as to prepare for the future, to try and change the course of evolution or simply to determine how well we understand an evolutionary system. Exactly what aspect of an evolving population we want to predict, such as the most common genotype, average or individual fitness, or population size, depends on the situation. There are many uses of evolutionary predictions that may not be recognized as such. Therefore, the main goal of this review is to increase awareness of methods and data that are used to make these predictions in different fields, by showing the breadth of situations in which evolutionary predictions are made. We describe how evolutionary predictions are highly diverse, but nevertheless share a common structure described by the predictive scope, horizon, precision and risk. Then, by using examples ranging from SARS-CoV2 and influenza to CRISPR-based gene drives and sustainable product formation by microorganisms, we discuss the methods for predicting evolution, factors that affect the predictability, and how predictions can be used to prevent unwanted evolution or promote beneficial evolution. We hope that this review will increase collaboration between fields by creating a common language for evolutionary predictions.


Events



Lisa Nakamura’s work is and has always been Star-struck and I am super proud that she is the next speaker in MSR’s (free, public, recorded, awesome) Race and Tech lecture series.

Twitter, Nancy Baym


from

Online September 22, starting at 10 a.m. Pacific.


Melissa Dell – LayoutParser: A Unified Toolkit for Deep Learning-Based Document Image Analysis

Stanford Digital Economy Lab


from

Online October 4, starting at 12 p.m. Pacific. Stanford Digital Economy Lab Seminar Series: “Professor Melissa Dell of Harvard University will join S-DEL Director Erik Brynjolfsson to discuss of LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applications.” [registration required]

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



We’re excited to open-source Mistral Rocket – a codebase for accessible large-scale LM training, built as part of Stanford’s CRFM (https://crfm.stanford.edu).

Twitter, Siddharth Karamcheti


from

We’re releasing 10 GPT-2 Small & Medium models with different seeds & 600+ checkpoints per run!


ECRcentral: A central platform for early career researchers community

ECR Central


from

By early career researchers (ECRs) for ECRs to find and discuss opportunities, share experiences, mentor peers, and create impact through community engagement.


Making Sense of Satellite Data, Open Source Workflow part 1

NIghtingale, Robert Simmon


from

We are living in a golden age of access to information about our home planet. Just twenty years ago, access to satellite data was limited to the world’s biggest militaries, wealthiest corporations, and a handful of government scientists with access to scientific workstations, boutique software, and (most importantly) postdocs. Now, petabytes of data are available to just about anyone—we can see the whole Earth every day, in unprecedented detail, such as in the image above. This is what the Earth looked like on March 21, 2018. One instrument — VIIRS — collected this data over 24 hours while circling the globe 16 times, crossing the equator from south to north at 13:30 local time on each orbit.

Or drill down to Los Angeles International Airport and count planes on the tarmac


Workshop on Foundation Models

YouTube, Stanford HAI


from

The Center for Research on Foundation Models (CRFM), a new initiative of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), invites you to the Workshop on Foundation Models from August 23-24, 2021. By foundation model (e.g. BERT, GPT-3, DALL-E), we mean a single model that is trained on raw data, potentially across multiple modalities, which can be usefully adapted to a wide range of tasks. These models have demonstrated clear potential, which we see as the beginnings of a sweeping paradigm shift in AI. They represent a dramatic increase in capability in terms of accuracy, generation quality, and extrapolation to new tasks, but they also pose clear risks such as use for widespread disinformation, potential exacerbation of historical inequities, and problematic centralization of power. [video, 5:13:05]

Leave a Comment

Your email address will not be published.