Data Science newsletter – February 28, 2022

Newsletter features journalism, research papers and tools/software for February 28, 2022


UTEP Launches Systems Modeling and Simulation Concentration with $1M Grant from U.S. Dept. of Education

University of Texas at El Paso, UTEP Newsfeed


The University of Texas at El Paso’s College of Engineering will establish a concentration in Systems Modeling and Simulation under the Industrial, Manufacturing and Systems Engineering (IMSE) Department, thanks to a $1 million grant from the U.S. Department of Education.

The new degree path will allow graduate engineering students to enhance their knowledge in data analytics, computer simulation, augmented and virtual reality (AR/VR), and machine learning for dynamic production and operations systems. These are systems that tend to be complex in nature due to rapid changes in customer needs, operational parameters, technology involvement and interactive cyber-physical systems.

A Conversation with Margo Seltzer and Mike Olson

Communications of the ACM, Kirk Mckusick


Kirk Mckusick sat down with Margo Seltzer and Mike Olson to discuss the history of Berkeley DB, for which they won the ACM Software System Award in 2021. McKusick has spent his career as a BSD and FreeBSD developer. Margo Seltzer has spent her career as a professor of computer science and as an entrepreneur of database software companies. Mike Olson started his career as a software developer and later started and managed several open source software companies. Berkeley DB is a production-quality, scalable, NoSQL, open source platform for embedded transactional data management.

Foundational blueprint sought to expand computer science education in schools

Minnesota House of Representatives, Pratik Joshi


In 2018, there were only 1,296 computer science graduates in the state. And Minnesota currently ranks last among the 50 states for having the lowest percentage (24%) of high schools offering a foundational computer science course.

The state needs digitally skilled computational thinkers, said Jeff Tollefson, president and CEO of the Minnesota Technology Association, and Minnesota’s tech-enabled companies need talent that can reflect the diversity of the community.

What learning how to code taught me as a UC Berkeley student

The Daily Californian student newspaper, Irina Sakharova


The first week of classes made me realize how much coding was not for me. All of the assignments required not just effort, but devotion to the topic, which explained why CS majors do tend to spend the majority of their time studying. Unfortunately for me, I realized that the course was not for me, and perhaps the major wasn’t either.

The charm of the tech world lures in many students at UC Berkeley, but they are never reminded that being in STEM isn’t necessary. What is most important is to keep up the good work in whatever department you truly feel involved in.

Studies examine effects of California’s push for computer science education

University of Illinois Urbana-Champaign, Illinois News Bureau, Research News


New studies of computer science education at California high schools found that a greater emphasis on computer science education did not produce the anticipated spillover effects, neither improving or harming students’ math or English language arts skills, according to school-level test scores.

However, one trade-off of increased enrollments in computing courses may be that students are taking fewer humanities courses such as the arts and social studies, researchers at the University of Illinois Urbana-Champaign found.

Paul Bruno and Colleen M. Lewis examined the implications of California’s recent state policies promoting computer science education and the proliferation of these courses in the state’s high schools. Bruno is a professor of education policy, organization and leadership, and Lewis is a professor of computer science in the Grainger College of Engineering, both at Illinois.

MSU announces a $50 million gift from the Gianforte Family Foundation to further computer science, electrical and computer engineering, cybersecurity, optics and photonics, and creative industries

Montana State University, MSU News


Montana State University today announced a $50 million gift from the Gianforte Family Foundation, continuing more than 20 years of philanthropy to the university. The Gianforte Family Foundation’s gift is dedicated to constructing a new building to house the Gianforte School of Computing and computing-related fields such as cybersecurity, optics and photonics, electrical and computer engineering, and creative industries. The gift ties for the second largest in the university’s history and is one of the largest philanthropic gifts in the history of Montana.

The giant plan to track diversity in research journals

Nature, News Feature, Holly Else & Jeffrey M. Perkel


In the next year, researchers should expect to face a sensitive set of questions whenever they send their papers to journals, and when they review or edit manuscripts. More than 50 publishers representing over 15,000 journals globally are preparing to ask scientists about their race or ethnicity — as well as their gender — in an initiative that’s part of a growing effort to analyse researcher diversity around the world. Publishers say that this information, gathered and stored securely, will help to analyse who is represented in journals, and to identify whether there are biases in editing or review that sway which findings get published. Pilot testing suggests that many scientists support the idea, although not all.

The effort comes amid a push for a wider acknowledgement of racism and structural racism in science and publishing — and the need to gather more information about it. In any one country, such as the United States, ample data show that minority groups are under-represented in science, particularly at senior levels. But data on how such imbalances are reflected — or intensified — in research journals are scarce. Publishers haven’t systematically looked, in part because journals are international and there has been no measurement framework for race and ethnicity that made sense to researchers of many cultures.

The Culture of Engineering Overlooks the People It’s Supposed to Serve

Scientific American, Grace Wickerson


I seemed to be the only one concerned with subjectivities around data collection and interpretation, and I worried about my peers’ lack of awareness of how bias could creep into their designs and exact real harm on people. The process through which facts are developed matters.

I left the class disillusioned and even more uncertain about my identity as an engineer.

Feeling like they don’t fit into the culture is one of the primary reasons why women, people of color, LGBTQ+ individuals and other systemically marginalized people drop out of STEM fields. We feel out of place because of the disconnect between our desires to use technology to address inequities, and engineering curricula with little representation of social consciousness or awareness of social issues. If we do not change the way we teach engineering, we will continue to lose talented people from a diversity of backgrounds whose ideas could solve a host of the world’s problems.

Science faculty share enthusiasm about University investment plan

Yale University, Yale Daily News student newspaper, Anika Smith and Evan Gorelick


Since the University’s announcement of significant investments in STEM-related faculty and the formation of the School of Engineering and Applied Sciences as a distinct budgetary unit, faculty expressed excitement about the future of the sciences at Yale.

On Tuesday, the University announced a major wave of investments — primarily into science and engineering — over the next 10 years, including new faculty and facilities in both the Faculty of Arts and Sciences and SEAS, which is set to gain autonomy from the FAS this July. In response, faculty members across scientific disciplines shared their excitement, noting that individual departmental gain will depend somewhat on the distribution of faculty hires.

“The increased emphasis on state-of-the-art facilities of the science departments is appreciated, and we hope it will allow us to compete with our peers,” Astronomy Department Chair Sarbani Basu wrote in an email to the News.

My thoughts(rant) on data science consulting

Reddit, r/datascience


This is gonna be mostly a rant but may make someone think twice if they are thinking of joining a consulting firm as a data scientist.

So, last year I completed my masters and joined one of the big 4 firms as a data scientist. As excited as I was in the beginning, 6 months down the line I’ve started to hate my job.

I always thought working a data science job would make my knowledge base grow, but it seems like in consulting no one gives a damn about your knowledge because no one cares if you’re right, they just want to please the client. [162 comments]

Comparing Machine Learning Models for Earthquake Detection

Eos, Kate Wheeler


Machine learning is transforming data-heavy fields across the sciences, and seismology is no exception. Several machine learning methods have emerged for earthquake detection, phase identification, and phase picking. However, choosing which method to use is still a challenge because it’s not always clear how these deep learning models will respond to data that differ from the data sets they were trained on.

To provide some insight, Münchmeyer et al. compared six deep learning models (and one classical picking model) to find out which perform best across various data sets. The team looked at BasicPhaseAE, CNN-RNN Earthquake Detector (CRED), DeepPhasePick (DPP), Earthquake Transformer (EQTransformer), Generalized Phase Detection (GPD), and PhaseNet. They evaluated each model’s performance on three common tasks: event detection, phase identification, and onset time picking.

The researchers found that when the models were trained and evaluated using data sets with identical characteristics, EQTransformer performed best on all three tasks, followed closely by PhaseNet and GPD. For event detection, CRED showed excellent performance as well.

Three universities receive $20 million to make fuel management data more usable for managers

Wildfire Today, Bill Gabbert


The award will be made to the Southwest Ecological Restoration Institutes (SWERI) which includes the Colorado Forest Restoration Institute (CFRI), Highlands University’s New Mexico Forest and Watershed Restoration Institute, and Northern Arizona University’s Ecological Restoration Institute. The SWERI were created through congressional legislation passed in 2004 which charged the three institutes with promoting adaptive management practices to restore the health of fire-adapted forest and woodland ecosystems of the Interior West.

The Colorado Forest Restoration Institute is housed in the Department of Forest and Rangeland Stewardship in CSU’s Warner College of Natural Resources.

New $1M grant to bridge educational gaps between modeling and simulation and machine learning

University of Iowa, College of Engineering


For decades, much of engineering research has focused on developing models and simulations based on physics and real-world properties that would produce data to confirm or reject a hypothesis. More recently, many engineering scholars have begun utilizing machine learning which relies on vast quantities of data and algorithms to eventually produce a hypothesis. Now, the United States Department of Education has awarded a $1 million-grant to Ching-Long Lin, Edward M. Mielnik and Samuel R. Harding Professor and chair of the Department of Mechanical Engineering at the University of Iowa, to develop artificial intelligence, modeling and simulation (AIMS) programs that will bridge the gap between these two research approaches.

“As we train the next generation of engineers, we want to ensure that they have the full suite of research tools available to them,” said Lin. “By integrating modeling and simulation work with machine learning, we can apply the physical principles that are central to modeling and simulation with smart, intelligent machines that do not have access to real-world interactions.”

BDI researchers create largest ever human family tree

University of Oxford, Big Data Institute


BDI researchers have taken a major step towards mapping the entirety of genetic relationships among humans: a single genealogy that traces the ancestry of all of us. The study has been published today in Science.

The past two decades have seen extraordinary advancements in human genetic research, generating genomic data for hundreds of thousands of individuals, including from thousands of prehistoric people. This raises the exciting possibility of tracing the origins of human genetic diversity to produce a complete map of how individuals across the world are related to each other.

Until now, the main challenges to this vision were working out a way to combine genome sequences from many different databases and developing algorithms to handle data of this size. However, a new method published today by BDI researchers can easily combine data from multiple sources and scale to accommodate millions of genome sequences.

Need to catch some Zzzz’s? Try taking a trip.

Nature, Behavioral & Social Science blog; James Bagrow, Sigga Svala Jonasdottir, and Sune Lehmann.


Using a large-scale dataset describing the sleep of wearables users, along with their approximate home location, we studied the typical sleep pattern of individuals at home and away from home. At first, expecting to find that travel generally lowered sleep quantity, to our surprise, we found something more nuanced: for people who usually sleep excessively, yes, less sleep was achieved when traveling. But for people who usually get less than the recommended quantity of sleep, we found that more sleep was had when on the road. In other words, travel tends to have a balancing effect on the quantity of sleep.

Let’s get into the nitty-gritty. Our study explores the change in sleep duration during travel relative to typical sleep duration at home. Overall we find a clear dependence between the change in sleep duration due to travel and the quantity of typical nighttime sleep individuals obtain at home.

The New Rules of Data Privacy

Harvard Business Review, Hossein Rahnama and Alex "Sandy" Pentland


The data harvested from our personal devices, along with our trail of electronic transactions and data from other sources, now provides the foundation for some of the world’s largest companies. Personal data also the wellspring for millions of small businesses and countless startups, which turn it into customer insights, market predictions, and personalized digital services. For the past two decades, the commercial use of personal data has grown in wild-west fashion. But now, because of consumer mistrust, government action, and competition for customers, those days are quickly coming to an end.

For most of its existence, the data economy was structured around a “digital curtain” designed to obscure the industry’s practices from lawmakers and the public. Data was considered company property and a proprietary secret, even though the data originated from customers’ private behavior. That curtain has since been lifted and a convergence of consumer, government, and market forces are now giving users more control over the data they generate. Instead of serving as a resource that can be freely harvested, countries in every region of the world have begun to treat personal data as an asset owned by individuals and held in trust by firms.

Moderating Online Content by Highlighting High-Quality Comments

Medium, Technically Social, Yixue Wang


Inspired by this question, we analyzed the correlation between NYT Picks and commenters’ behaviors in our recent paper. Specifically, our results show that NYT Picks are correlated with an improvement/increase in

  • First-time receivers’ next approved comment quality
  • Newcomer receivers’ commenting frequency
  • Observers’ comment quality
  • To come to these conclusions, we collected and analyzed more than 13 million approved NYT comments (~2.5% were NYT Picks) via the NYT Community API from 2007 to 2015. To quantify comment quality, we defined a comment’s quality score as the probability of being selected as an NYT Pick and used a machine learning model to predict the score. We then validated our model’s scores by comparing them against human ratings of comment quality.

    Artificial intelligence recreates Andy Warhol’s voice for Netflix series

    The Boston Globe, Matthew Walker


    Ryan Murphy has produced a six-part documentary series about Andy Warhol that premieres on Netflix on March 9. And that’s great, right? If you’re interested in art, culture, or the American temperament, you’ll probably find something in “The Andy Warhol Diaries,” which is directed by Andrew Rossi. Among those interviewed: John Waters, Spike Lee, Rob Lowe, Julian Schnabel, and Debbie Harry.

    But there’s a twist. The series has constructed Warhol’s voice with the help of artificial intelligence, and we will hear that voice reading passages from his diaries.

    Liquid Crystal smart antennas to tackle 5G roll-out challenges

    New Electronics (UK), Neil Tyler


    Building on research undertaken at Darmstadt University, ALCAN has developed a Liquid Crystal (LC) based phased array smart antenna. Liquid Crystal – most commonly used in TV and smart phone screens – is used to design low- profile antennas.

    This technology offers electronically reconfigurable beam-steering technology at a lower cost compared to typical silicon-based options and can be integrated into a range of forms including Customer Premises equipment (CPE), repeater, and Reconfigurable Intelligent Surface-Based antennas (RIS), along with land mobile solutions.

    A patented technology, it has previously been used to serve the satellite market, offering a low-cost option for satellite ground terminals. The similarities between satellite and 5G frequencies, combined with the convergence between telecoms and satellite over the past 12 months, made it a natural evolution for ALCAN systems.

    A Silicon Valley love triangle: Hiring algorithms, pseudo-science, and the quest for auditability

    Cell, Patterns journal; Mona Sloane, Emanuel Moss, Rumman Chowdhury


    Automated hiring tools are increasingly subjected to technical audits for their performance across legally protected groups to ensure that they do not inject additional biases into the hiring process, but these audits often fail to examine the assumptions that underpin the claims made by such tools. This paper outlines a socio-technical approach to audit automated hiring tools. It introduces a matrix that provides a method for inspecting the assumptions that underpin a system and how they are operationalized technically. These assumptions often rest on contradictory or pseudo-scientific theories about job applicants. We offer this matrix to facilitate holistic audits that go beyond technical performance. [full text]

    Challenges and Opportunities for Wearable Sensing Systems

    ACS Sensors journal, Shana O. Kelly


    There is a tremendous amount of activity in Chemistry and related fields focused on wearable sensor development at present. In ACS Sensors alone, >200 papers have been published in this area over the last five years. An editorial published in the first volume of the journal (1) summarized the promise and challenges in this field, and significant progress has been made since this piece appeared. Many different material systems, analytes, and devices have been featured in ACS Sensors. Markers relevant to depression, (2) metabolism, (3) stress, (4) circadian rhythm, (5) breath biochemistry, (6) neurodegenerative diseases, (7) and many other conditions have been tracked with interesting new sensors and devices.

    In a recent review published by the Wang group at UCSD, (8) wearable and mobile sensors for nutrition were covered. Devices that can track nutrients, chewing, and swallowing, while also tracking intake of different foods, hold promise as a means to allow individuals to understand their nutritional needs on the basis of real-time data. The review highlights that many of the sensors needed to realize this type of capability are available, but also brings to light that the sensors available are quite idiosyncratic and specific for particular analytes. Given the complexity of manufacturing sensor-based products, significant challenges will likely be encountered as different sensor chemistries are integrated.


    Microbiome Symposium to explore relationship between data science and the microbiome – Purdue University News

    Purdue University, News


    “Purdue University on May 9-11 will host the second annual Microbiome Symposium. The symposium will investigate the application of data science in the study of microbiomes and how experimental data and computational data can be leveraged to learn more about microbiome systems.” [registration required]

    Women in Data Science (WiDS) Cambridge

    WiDS Cambridge


    Cambridge, MA Friday, March 11, starting at 10 a.m. “For the sixth year in a row, Harvard, MIT, and Microsoft Research New England are proud to collaborate with Stanford University to bring the Women in Data Science (WiDS) conference to Cambridge, Massachusetts.” [registration required]


    Applications for Bloomberg’s #DataScience Ph.D. #Fellowship for the 2022-2023 academic year are now being accepted

    submission deadline: April 15



    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


    Tools & Resources

    10 ways to use fewer colors in your data visualizations

    Datawrapper, Lisa Charlotte Muth


    How can we avoid using that many colors in a data visualization? There are a few options. You can use this article as a checklist the next time you’re building a too-colorful chart:

    Leave a Comment

    Your email address will not be published.