Data Science newsletter – January 21, 2022

Newsletter features journalism, research papers and tools/software for January 21, 2022

 

‘Community of the eager’: Pitt’s Year of Data and Society plans for spring

University of Pittsburgh, The Pitt News


from

The Office of the Provost identifies Pitt’s “Year of” theme for the upcoming academic year at the end of every school year. Provost Ann Cudd announced last April that the 2021-22 academic year would be the Year of Data and Society, and after a semester of activities, more initiatives are planned for the spring semester.

Eleanor Mattern, chair of the Year of Data and Society steering committee — which helps plan events related to this year’s theme — said the Year of Data and Society initiative hosts events that explore socially responsible data practices and the societal implications of data and its uses.


BioNTech and London A.I. firm create ‘early warning system’ to spot dangerous new COVID-19 variants before they spread

Fortune, Jeremy Kahn


from

BioNTech, the German biotech company that pioneered the messenger RNA technology behind the Pfizer COVID-19 vaccine, has teamed up with London-based A.I. company InstaDeep to create what the two firms say is an effective “early warning system” for spotting potentially dangerous new coronavirus variants.

In tests, the two companies said that their early warning system was able to pick up 12 of the 13 coronavirus variants that the World Health Organization has so far designated as potentially dangerous, doing so on average two months before the WHO reached that conclusion. For the Omicron variant, the system identified it as potentially dangerous on the same day its genetic sequence was first made available, according to a paper BioNTech and InstaDeep published on the non-peer-reviewed academic repository bioRxiv.org on Wednesday.


Machine Learning Improves Fusion Modeling

EE Times, Maurizio Di Paolo Emilio


from

Researchers at MIT are employing machine learning techniques to better understand turbulent plasma phenomena in fusion devices. According to MIT News, a new deep learning framework was developed that leverages artificial neural networks to represent a reduced turbulence theory.

The research is described in two papers, published in Physical Review E and Physics of Plasmas.

If researchers hope to control fusion for energy production, they need a better understanding of the turbulent motion of ions and electrons in plasmas moving through fusion reactors. The field lines of toroidal structures known as tokamaks force the plasma particles; the intent is to confine them long enough to produce significant net energy gains, but that’s a challenge with extraordinarily high temperatures but also small spaces.


Foundation awards $9.7M to democratize data and harness the power of AI

Patrick J. McGovern Foundation


from

Also included in the portfolio is The Coleridge Initiative, which works with US federal and state governments to ensure that their data is leveraged most effectively to make evidence-based decisions for the public good. Coleridge builds partnerships with government agencies to create value for the taxpayer: new technologies enable approved secure access to and sharing of confidential microdata; new approaches to data science training ensure that agency staff can use the data to create evidence to inform decisions.

“At this moment, we have a unique opportunity to support the capacity of many federal, state, and local governments to use data in making their decisions,” said Julia Lane, co-founder of the Coleridge Initiative and a Professor and a Provostial Fellow at NYU. “This opportunity to support government agencies can have transformative effects on their ability to serve the public good.”


Sports Tech Leader SMT to Launch Analytics at University of Pacific

SportTechie, Andrew Cohen


from

The program’s course topics will span math, business and computer science for students to learn how to apply analytics to evaluate and predict team and athlete performance. Each student will also be assigned to work on real-time data collection and analysis for one of Pacific’s 17 sports teams, most of which compete in the Division-I West Coast Conference.


Schmidt Futures and Leading Research Universities Launch Software Engineering Centers to Accelerate Scientific Discovery

Schmidt Futures Foundation


from

Schmidt Futures announced today the establishment of the Virtual Institute of Scientific Software (VISS), starting with a network of four inaugural centers based at the University of Cambridge, Georgia Institute of Technology, the Johns Hopkins University and the University of Washington. This interdisciplinary virtual institute will address the growing demand for software engineers with backgrounds in science, complex-data and mathematics who can build dynamic, scalable, open software to facilitate accelerated scientific discovery across fields.


$317K grant helps launch ‘desperately’ needed data science training program

University of Hawaii System News


from

A $317,955 National Institutes of Health (NIH) grant awarded to the University of Hawaiʻi at Mānoa will help launch a Hawaiʻi data science training program based at the John A. Burns School of Medicine (JABSOM). The grant is part of JABSOM’s NIH-funded Ola HAWAII – Health and Wellness Attained by Impacting Inequities in Hawaii, a designated Research Center for Minority Institutions (RCMI) specialized center. It will allow the UH medical school to offer lectures, seminars, bootcamps and workshops to RCMI trainers, investigators, students and community partners.


Why games may not be the best benchmark for AI

VentureBeat, Kyle Wiggers


from

In 2019, San Francisco-based AI research lab OpenAI held a tournament to tout the prowess of OpenAI Five, a system designed to play the multiplayer battle arena game Dota 2. OpenAI Five defeated a team of professional players — twice. And when made publicly available, OpenAI Five managed to win against 99.4% of people who played against it online.

OpenAI has invested heavily in games for research, developing libraries like CoinRun and Neural MMO, a simulator that plops AI in the middle of an RPG-like world. But that approach is changing. According to a spokesperson, OpenAI hasn’t been using games as benchmarks “as much anymore” as the lab shifts its focus to other domains, including natural language processing.


CDC Announces Plan To Send Every U.S. Household Pamphlet On Probabilistic Thinking

The Onion


from

Stressing that the effort represented the best chance of ensuring American make responsible choices around the pandemic, the Centers for Disease Control and Prevention announced Thursday that it planned to send every U.S. household a pamphlet on probabilistic thinking and decision-making. “What we’re hoping to do is give every American a quick refresher on how to use statistical analysis to assess their priors and make Bayesian inferences, thereby ensuring they overcome their innate psychological biases—simple stuff, but important nonetheless,” said CDC director Rochelle Walensky, estimating that the pamphlets’ lessons on the baseline fallacy alone would save far more lives than mask-wearing, handwashing, and the Covid-19 vaccine combined. “Obviously, most everyone in the nation has already at least skimmed the seminal studies of psychologist Danny Kahneman on the fallibility of human thinking. So the pamphlet will skip over some of the basic stuff like the availability heuristic and head straight to prospect theory. Hopefully none of this feels too patronizing.” Walensky added that if Americans took away one easy lesson from the pamphlet, she hoped it would be P(H|E) = (P(E|H) *P(H))/P(E).


AI spots antibiotic resistance 24 hours faster than old methods

Futurity, ETH Zurick


from

Researchers in Basel have developed a new method that extends the uses of mass spectrometry to include the identification of antibiotic resistance. For this dataset, the teams extracted more than 300,000 mass spectra of individual bacteria from four laboratories in North-Western Switzerland and linked these to the results of the corresponding clinical resistance tests. The result is a new, publicly available dataset covering around 800 different bacteria and over 40 different antibiotics.

“Our next step was to train artificial intelligence algorithms with this data such that they could learn to detect antibiotic resistance on their own,” says Karsten Borgwardt, professor in the biosystems science and engineering department at ETH Zurich in Basel, who led the study with [Adrian] Egli.


Who should pursue a master’s degree in data science?

Fortune, Education, Rich Griset


from

People often conflate data science with data analytics. While both fields work with data, data analysts look at trends in large data sets to create charts and visual presentations to help organizations make strategic decisions.

Data scientists, on the other hand, design and build new processes for data modeling by using algorithms, prototypes, predictive models, and custom analysis. By asking questions and creating algorithms and statistical models, data scientists work to estimate the unknown, using multiple tools simultaneously to arrange undefined sets of data and build their own automation systems and frameworks. People with data science degrees often branch out into technical roles, including data engineer and data architect.

“Data science has a more technical focus,” says Arthur Spirling, deputy director of New York University’s Center for Data Science and a professor of politics and data science. “Data science is going to train you in cutting edge statistics, machine learning, and programming, and to do those things the second you are out of the program.”


How can deep reinforcement-learning contribute to our understanding of complex social interactions like social norms and the evolution of culture?

Twitter, DeepMInd


from


Introduction to Explainable ML

Twitter, The Institute for Ethical AI & Machine Learning


from

The team behind the Alibi Explain project has put together a comprehensive introducion and deep dive into the key concepts of explainability in machine learning, including intuition, background, examples and references.


DARPA Releases GARD Program to Help Strengthen AI Against Attacks

Computing Community Consortium, The CCC Blog


from

The Defense Advanced Research Projects Agency (DARPA) released a new program Guaranteeing AI Robustness against Deception (GARD) to aid Artificial Intelligence (AI) developers in testing their models’ defenses against attacks. A team of large technology players from academia and industry, including IBM, MITRE, University of Chicago and Google Research collaborated to make a set of open source testing tools. The tools, ranging from a virtual evaluation testbed, a benchmark dataset, and “test dummies” help identify vulnerabilities in AI systems and make systems more robust against an increasingly complex range of attacks.


College of Education Researcher: Rethink Our Focus on Weight

University of Oregon, Around the O


from

At a recent back-to-school checkup, a nurse practitioner greeted Nichole Kelly and her daughter and glanced at the girl’s weight and height measurements. Then she pointed to a chart on a wall and began talking about thesix-year-old’s “body mass index” or BMI.

Kelly, an Evergreen Professor in the College of Education, studies weight stigma, eating behaviors, and chronic disease risk. She noticed her daughter listening intently to the provider’s focus on body mass; instead, Kelly would have preferred discussing eating habits, exercise, and sleep—all better predictors of health, she notes.

“[My daughter] was getting the message that whatever this BMI thing is, it’s important,” Kelly says. “I don’t want to reinforce any belief that her BMI is central to her health or has anything to do with her value.”


UC San Diego Receives $14M to Drive Precision Nutrition with Gut Microbiome Data

University of California, San Diego, UC San Diego News


from

The National Institutes of Health (NIH)’s All of Us Research Program is a national effort to build a large, diverse database of 1 million or more people whom researchers can use to study health and disease.

The NIH is now awarding $170 million in grant funding to centers across the country to create a new consortium known as Nutrition for Precision Health, powered by the All of Us Research Program®. The consortium will recruit a diverse pool of 10,000 All of Us Research Program participants to develop algorithms to predict individual responses to food and inform more personalized nutrition recommendations.


California to pay low income students for community service

Los Angeles Times, Collen Shalby


from

Kaelyn Hughes, 21, has wanted to teach since she was in third grade. For her, school has always been a place of solace — and she wants to share what education has meant to her with other children.

“I would like to show to other Black girls that you can do this — you can go to college. I came from a low-income area, and I didn’t think I would go to college. I didn’t think I would go far,” she said. “But you use education to help you get somewhere better.”

The UC Merced senior spends time in elementary school classrooms working with students struggling with math or English as a volunteer — service that would otherwise be a luxury she could not afford. But through her participation in the state’s pilot Civic Action Fellowship program, she has been paid a stipend for tuition and basic needs, such as rent, allowing her to pursue service instead of another job to make ends meet.


Deadlines



‘Building Trust Together: Integrating, Collaborating & Sharing’ is the wonderful theme of the Open Repositories 2022.

Submit your proposals by February 28, 2022 and hopefully see you in person in Denver in early June to build trust together!

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



McKinsey donates open-source tool Kedro to the Linux Foundation

McKinsey & Company, McKinsey Blog


from

In 2019, QuantumBlack, our AI firm, launched Kedro, its first open-source software tool for data scientists and data engineers. It’s a library of code that can be used to create data and machine-learning pipelines, the building blocks of any analytics project.

In the two-and-a-half years it has been available on the open-source platform GitHub, the Kedro community and user base continues to grow, with more than 200,000 monthly downloads, over 100 contributors, and a growing number of enterprises that choose Kedro as their standard for data-science code. For example, a team at NASA used Kedro to model air-traffic patterns and Telkomsel, Indonesia’s largest wireless network provider, uses Kedro as a standard across their data science organization.

Today, we are taking the next step in our open-source journey and donating Kedro to the Linux Foundation.


Microsoft has released new and updated building footprints

Microsoft Bing Blogs, Maps Blog


from

Microsoft continues to make significant investments in deep learning, computer vision, and AI. The Microsoft Maps Team has been leveraging that investment to identify map features at scale and produce high-quality building footprint data sets with the overall goal to add to the OpenStreetMap and MissingMaps humanitarian efforts.

As of this post, the following locations are available and Microsoft offers access to this data under the Open Data Commons Open Database License (ODbL).

Leave a Comment

Your email address will not be published.