Data Science newsletter – September 2, 2021

Newsletter features journalism, research papers and tools/software for September 2, 2021

 

Judge freezes UC Berkeley’s student enrollment at 2020-21 levels.

Berkeleyside, Frances Dinkelspiel


from

An Alameda County judge has ordered UC Berkeley to freeze its enrollment at 2020-21 levels going forward until the university redoes a supplemental environmental impact report (SEIR) for a complex on Hearst Avenue that includes housing and a new academic building for the Goldman School of Public Policy. The freeze will take effect in 2022-23.

The order to freeze enrollment comes a little more than a month after Judge Brad Seligman ruled that UC Berkeley abused its discretion when it failed to study the impacts of increasing its enrollment by 33.7%, or 11,285 students, from 2005 to 2020. That was just one of the deficiencies in the supplemental environmental impact report for what is called the Upper Hearst Project, the judge ruled.

Seligman also ordered the UC Board of Regents to void its 2018 approval of the Upper Hearst project and to decertify the supplemental environmental impact report.


Teaching the Stories of Science

Issues in Science and Technology, Erin Lavik


from

In “Shining a Light on the Impacts of Our Innovations” (Issues, Spring 2021), Ainissa Ramirez makes a compelling argument for bringing the stories of scientists and the impact of their research into our textbooks and classrooms. When I went through my degrees, I was taught about science. I learned about polymers—their synthesis, properties, and uses. I learned about interfacial polymerization and the “nylon rope trick” to demonstrate the preparation of a synthetic polymer. It became one of my favorite demonstrations at outreach events. It wasn’t until many years after I was teaching that I realized Stephanie Kwolek, the DuPont chemist known for inventing a family of synthetic fibers of exceptional strength and stiffness, developed the nylon rope trick as a teaching tool. I wished I had known more about her much earlier in my career. Being able to see someone like me in science—someone who had always been there but whose story was rarely told in classes—would have been transformational.

We owe it to our students to teach science in context not only so they might see themselves in the scientists who came before, but also so they might understand the impact science can have. When the Scottish inventor John Boyd Dunlop developed pneumatic tires, he may not have considered the impact on the raw materials’ sources or on the disposal of tires. But the desire for natural rubber exploded with the pneumatic tire, and the exploitation, torture, enslavement, and murder of Black people in the Congo and Indigenous tribes in the Amazon grew out of this invention. What started as a better tricycle tire for Dunlop’s child became the mutilation and murder of a five-year-old named Boali in the Congo. Alice Seeley Harris, a missionary, tried to stop such atrocities from continuing by documenting this horror with one of the first consumer cameras.


From Tracking to Classroom Instruction, Ed Tech Based on Biased Data Can Make Inequities Worse. What Designers and Engineers Can Do

The 74, Ryan Baker


from

It’s essential for learning engineers and ed tech designers to create and research interventions that are sensitive to the differences among learners and the needs of a range of disadvantaged student populations. This means getting away from a top-down way of designing that simply identifies abstract learning principles and integrates them into the newest technology. The impetus for ed tech design must come from the needs on the ground, in the particular learning environment in which the tools will be used.

By moving away from “the fallacy of the ‘average’” student, data-driven educational technologies can locate struggling kids quickly and personalize interventions. For example, new writing feedback technology can give students personalized responses and help direct teacher attention where it’s needed, so students recover from setbacks quickly. Since underserved students tend to face more barriers and are often forced to adjust to new environments, these tools can become important contributors to a more equitable education system.

Similarly, algorithms based on convenience samples — information taken from easy-to-reach student populations — can be less effective for specific groups of learners. And concerns around privacy can often encourage learning platform developers not to collect the types of demographic data necessary to be sure that technology is not biased.


Dust collected from campus buildings will help track COVID-19

The Ohio State University, Ohio State News


from

Researchers are collecting dust from 50 buildings on The Ohio State University campus this fall to monitor the prevalence of COVID-19 and track the virus’s variants. Their analyses and experiments are designed to help the university understand where COVID-19 pockets might exist as the campus opens to near-pre-pandemic levels this fall.

Their study is built on previous research that showed that RNA — part of the genetic material from inside the COVID-19 virus — can persist up to a month in dust. For that study, researchers collected dust from residence hall rooms on campus that housed students known to be infected with COVID-19.

Additional research showed viruses do not remain infectious in dust, leaving only the genetic material behind. Analyzing that genetic material offers a noninvasive, low-cost way of monitoring for COVID-19.


Seasonal Arctic sea ice forecasting with probabilistic deep learning

Nature Communications, Tom R. Andersson et al.


from

Anthropogenic warming has led to an unprecedented year-round reduction in Arctic sea ice extent. This has far-reaching consequences for indigenous and local communities, polar ecosystems, and global climate, motivating the need for accurate seasonal sea ice forecasts. While physics-based dynamical models can successfully forecast sea ice concentration several weeks ahead, they struggle to outperform simple statistical benchmarks at longer lead times. We present a probabilistic, deep learning sea ice forecasting system, IceNet. The system has been trained on climate simulations and observational data to forecast the next 6 months of monthly-averaged sea ice concentration maps. We show that IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. This step-change in sea ice forecasting ability brings us closer to conservation tools that mitigate risks associated with rapid sea ice loss. [full text]


RNA Structures Predicted with Uncanny Accuracy by Unbiased AI

Genetic Engineering News, Camille L.L. Townsend


from

The researchers have shown that their AI approach succeeds even when it must learn from only a few known structures. The researchers hope that their approach will help scientists to explain how different molecules work, with applications ranging from fundamental biological research to informed drug design practices.

“Proteins are molecular machines that perform all sorts of functions,” Eismann said. “To execute their functions, proteins often bind to other proteins. If you know that a pair of proteins is implicated in a disease and you know how they interact in 3D, you can try to target this interaction very specifically with a drug.”

Instead of specifying what makes a structural prediction more or less accurate, the researchers let the algorithm discover these molecular features for itself. They did this because they found that the conventional technique of providing such knowledge can sway an algorithm in favor of certain features, thus preventing it from finding other informative features.


NSF grant will support research to address waste plastics

Washington State University, WSU Insider


from

A Washington State University and University of Washington team has received a $2 million grant from National Science Foundation Emerging Frontiers in Research and Innovation (NSF EFRI) program for research to recycle and create valuable products out of municipal waste plastics. … As part of the new project, the researchers are developing an innovative waste plastics chemical upcycling process that will chemically rather than mechanically “sort” plastics in a step-by-step process, converting the co‑mingled waste plastics to monomers, which are the building blocks for plastics, and valuable chemicals.


I just noticed that the giant spreadsheet of tech ethics syllabi is really close to 300 ENTRIES!!!

Twitter, Casey Fiesler


from

I think it would be neat to hit that! Do you know of any classes you could add to this list? Or share with someone who teaches such a class? You can edit!


Excited to share my newest article @BrookingsInst “Tax not the robots”

Twitter, Rob Seamans


from

I argue against “robot taxes” as a way to help labor. Two main arguments against a robot tax are: (1) all the existing firm-level research shows that robot-adopting firms increase employment (so, if anything consider a robot subsidy!) …

the other argument: (2) it is really hard to define a “robot.”


Liquidity is key to unlocking the value in data, researchers say

MIT Sloan, Ideas Made to Matter, Tam Harbert


from

To increase data liquidity, organizations need to decontextualize the data, divorcing it from a specific condition or context. The authors suggest using best practices in data management — including metadata management, data integration and taxonomy/ontology — to ensure each data asset is accurate, complete, current, standardized, searchable, and understandable throughout the enterprise.

Such data management practices build key enterprise capabilities like data platform, data science, acceptable data use, and customer understanding, which increases data’s monetization potential.

“As a company’s strategic data assets become more highly liquid and their number grows, data is made increasingly available for conversion to value, and the company’s data monetization accelerates,” write the authors.


How Can Wristbands Monitor Pollution, PAHs, and Prenatal Care?

Eos, Alka Tripathy-Lang


from

Wildfires, vehicle emissions, petroleum by-products, and even cooking can conjure images of climate change. Each category also produces polycyclic aromatic hydrocarbons, or PAHs, which are products of incomplete combustion. This group of hundreds of chemical species is toxic to human health, and as the world warms, more extreme weather will further exacerbate their presence in the atmosphere, said Natalie Johnson, an environmental toxicologist at Texas A&M University. Monitoring human exposure to these air pollutants, she said, is a public health issue.

In a new study published in the Journal of Exposure Science and Environmental Epidemiology, Johnson and her colleagues used silicone wristbands—like the ones worn by people supporting various causes—to track pregnant women’s exposure to PAHs. Their study took place in McAllen, Texas, which has high rates of premature births and childhood asthma—adverse health outcomes associated with poor air quality.


In Berlin, a new WHO center aims to keep an eye on emerging diseases

Science, Kai Kupferschmidt


from

Germany and the World Health Organization (WHO) have teamed up to launch a new hub here that aims to accelerate efforts to detect and respond to new disease outbreaks. The German government pledged $100 million to stand up the WHO Hub for Pandemic and Epidemic Intelligence, which was formally inaugurated yesterday.

One goal is to bring together, in real time, information on emerging public health crises. But exactly how the new hub will operate is still under discussion, says director Chikwe Ihekweazu, a public health specialist who has led Nigeria’s Centre for Disease Control for the past 5 years. “I am actually very comfortable with the vagueness in the current ideas,” he says. “It does need a lot of initial brainstorming … then building up the consensus, the political will, and then mobilizing resources to invest in this area.”


The New Pathways for Partnership for Solar and Artificial Intelligence

Solar Magazine, Ed Kennedy


from

A key area where solar power and AI integration offers immense promise surrounds the solar panels themselves. Even solar systems that operate in locales with a high rate of average annual sunshine can still see a significant amount of available energy not acquired each day due to a less than ideal positioning of the panels relative to the sun.

Because many solar systems have static panels that will not change position at any time, ultimately—even when a system may have a good positioning such as a southerly facing in the Northern Hemisphere and a northerly facing in the Southern Hemisphere, that would see the panels receive a higher amount of sunshine daily relative to other facings—it still means a lot of optimal exposure to sunshine is missed out on by the panels.

Yet AI can help address this shortfall by using advanced tracking to more effectively position the panels.


The 30 Universities With the Largest Endowments

Newsweek, Tom Fish


from

The data, sourced from the Department of Education’s IPEDs data center using the most recently reported endowment data (2018-2019), shows how all but one institution in the top ten largest endowments (Northwestern University) has grown in the past year.

Read on to discover the American universities with the largest endowments, according to College Raptor’s analysis.


Georgia Tech Study Seeks to Bring More Diverse Voices into Computing Ethics Education

Georgia Institute of Technology, Georgia Tech Research Horizons


from

Jason Borenstein of the School of Public Policy, Ellen Zegura of the School of Computer Science, and Charles Isbell, dean of the College of Computing, will lead a three-year, National Science Foundation-funded study seeking to “better understand and amplify the diverse range of voices that may have been absent during the development of a traditional computing ethics curriculum.”

Borenstein is the project’s principal investigator. “The main goal of this grant is to enable groups historically underrepresented in computing to have more of a direct say in what’s offered in the computing ethics curriculum,” said Borenstein, who teaches ethics in the Ivan Allen College of Liberal Arts and is the director of graduate research ethics programs for Georgia Tech. [$398,288]


Events



Introducing the Bay Area Open Science Group

University of California-San Francisco, Library


from

The Bay Area Open Science Group brings together students, faculty, and staff from the Stanford, UCSF, and Berkeley community to learn about open science, discuss the application of open science practices in a research context, and meet other members of the community who are interested in (or already are) incorporating open science practices into their work.

First meetup is September 28 with Dr. Biftu Mengesha from UCSF discussing Innovating Education in Reproductive Health.


Deadlines



Become a UW Data Science Postdoctoral Fellow

“Receive associated benefits like research funding and community engagement opportunities such as weekly activities and annual data science conferences. Incoming UW postdocs should apply by Sept 15th.”

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



Scaling Laws for Deep Learning

arXiv, Computer Science > Machine Learning;Jonathan S. Rosenfeld


from

The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws — for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We then continue to analyze the sources of the scaling laws, offering an approximation-theoretic view and showing through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit. We conclude by building on the gained theoretical understanding of the scaling laws’ origins. We present a conjectural path to eliminate one of the current dominant error sources — through a data bandwidth limiting hypothesis and the introduction of Nyquist learners — which can, in principle, reach the generalization error lower limit (e.g. 0 in the noiseless case), at finite dataset size.


Research Spotlight: CDS and Grid AI researchers propose new method of self-supervised learning

Medium, NYU Center for Data Science


from

In the realm of self-supervised learning, state-of-the-art algorithms can broadly be divided into two groups: contrastive approaches and non-contrastive approaches. But what if there could be a third, unique approach? This is the main question that was answered in a recent publication authored by CDS PhD Student William Falcon, along with researchers at Grid AI and CDS faculty Kyunghyun Cho. The paper, titled “AAVAE: Augmentation-Augmented Variational Autoencoders” introduces the titular augmentation augmented variational autoencoders or “AAVAE.”

The research team looked at quite a few techniques from different domains like vision, NLP and reinforcement learning, and identified that data augmentation was crucial in all of them. With this in mind, the team went back to a somewhat outdated technique, variational autoencoders or VAEs, as a foundation from which to build a new self supervised learning methodology. They identified the KL divergence in VAEs to be a weak point and decided to replace it with a denoising criterion and domain-specific data augmentations. The result of this change is the AAVAE.


The dos and don’ts of machine learning research — read it, nerds

The Next Web, Ben Dickson


from

In a paper recently published on the arXiv preprint server, Michael Lones, Associate Professor in the School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, provides a list of dos and don’ts for machine learning research.

The paper, which Lones describes as “lessons that were learnt whilst doing ML research in academia, and whilst supervising students doing ML research,” covers the challenges of different stages of the machine learning research lifecycle. Although aimed at academic researchers, the paper’s guidelines are also useful for developers who are creating machine learning models for real-world applications.

Here are my takeaways from the paper, though I recommend anyone involved in machine learning research and development to read it in full.


Careers


Postdocs

Postdoctoral position in computational chemistry/QM/ML



Carnegie Mellon University, Department of Chemistry; Pittsburgh, PA
Full-time positions outside academia

Research Scientist



OpenAI, Policy Research Team; San Francisco, CA

Leave a Comment

Your email address will not be published.