Data Science newsletter – July 12, 2021

Newsletter features journalism, research papers and tools/software for July 12, 2021


Mate Selection for Modernity

Quillette, Vincent Harinam


The underlying structure of modern mate selection is fundamentally mathematical. For us to truly understand the causes and consequences of the modern sexual marketplace, a bit of math is required.

The Atlanta Fed and Georgia State University Partner to Address Online Financial Fraud

Georgia State University, News Hub


The Federal Reserve Bank of Atlanta and Georgia State University’s Evidence-Based Cybersecurity Research Group (EBCS) are forming a three-year partnership to address online payments-related financial fraud.

The EBCS will surveil the darknet and encrypted web channels to better understand how online payments-related criminal activity affect the payments landscape and promote safer, more-efficient payment solutions and practices.

The Future of Deep Learning Is Photonic

IEEE Spectrum, Ryan Hamerly


My description has made it sound as though each of these light beams must be held steady. In fact, you can briefly pulse the light in the two input beams and measure the output pulse. Better yet, you can feed the output signal into a capacitor, which will then accumulate charge for as long as the pulse lasts. Then you can pulse the inputs again for the same duration, this time encoding two new numbers to be multiplied together. Their product adds some more charge to the capacitor. You can repeat this process as many times as you like, each time carrying out another multiply-and-accumulate operation.

Using pulsed light in this way allows you to perform many such operations in rapid-fire sequence. The most energy-intensive part of all this is reading the voltage on that capacitor, which requires an analog-to-digital converter. But you don’t have to do that after each pulse—you can wait until the end of a sequence of, say, N pulses. That means that the device can perform N multiply-and-accumulate operations using the same amount of energy to read the answer whether N is small or large. Here, N corresponds to the number of neurons per layer in your neural network, which can easily number in the thousands. So this strategy uses very little energy.

Department of Administration Hires Chief Data Officer

State of Montana Newsroom


The Department of Administration Director Misty Ann Giles announced the hiring of Adam Carpenter to serve as Department of Administration’s Chief Data Officer for the State of Montana.

“Adam’s experience identifying key performance indicators by utilizing data and analytics to help guide business processes is a welcome addition to our team.” Director Giles said. “His skill set complements our goal of improving the customer experience for the State of Montana.”

Most recently, Adam served as a Machine Learning Engineer Manager at Sogeti USA in Texas. In that role he was responsible for guiding clients across multiple industries through the data maturity journey so organizations could make data and insight driven decisions quickly and efficiently.

Surface Activity-Tuned Metal Oxide Chemiresistor: Toward Direct and Quantitative Halitosis Diagnosis

ACS Sensors, Kak Namkoong and Il-Doo Kim et al.


Continuous monitoring of hydrogen sulfide (H2S) in human breath for early stage diagnosis of halitosis is of great significance for prevention of dental diseases. However, fabrication of a highly selective and sensitive H2S gas sensor material still remains a challenge, and direct analysis of real breath samples has not been properly attempted, to the best of our knowledge. To address the issue, herein, we introduce facile cofunctionalization of WO3 nanofibers with alkaline metal (Na) and noble metal (Pt) catalysts via the simple addition of sodium chloride (NaCl) and Pt nanoparticles (NPs), followed by electrospinning process. The Na-doping and Pt NPs decoration in WO3 grains induces the partial evolution of the Na2W4O13 phase, causing the buildup of Pt/Na2W4O13/WO3 multi-interface heterojunctions that selectively interacts with sulfur-containing species. As a result, we achieved the highest-ranked sensing performances, that is, response (Rair/Rgas) = 780 @ 1 ppm and selectivity (RH2S/REtOH) = 277 against 1 ppm ethanol, among the chemiresistor-based H2S sensors, owing to the synergistic chemical and electronic sensitization effects of the Pt NP/Na compound cocatalysts. The as-prepared sensing layer was proven to be practically effective for direct, and quantitative halitosis analysis based on the correlation (accuracy = 86.3%) between the H2S concentration measured using the direct breath signals obtained by our test device (80 cases) and gas chromatography. This study offers possibilities for direct, highly reliable and rapid detection of H2S in real human breath without the need of any collection or filtering equipment.

[2107.02185] Growing Urban Bicycle Networks

arXiv, Physics > Physics and Society; Michael Szell, Sayat Mimar, Tyler Perlman, Gourab Ghoshal, Roberta Sinatra


Cycling is a promising solution to unsustainable car-centric urban transport systems. However, prevailing bicycle network development follows a slow and piecewise process, without taking into account the structural complexity of transportation networks. Here we explore systematically the topological limitations of urban bicycle network development. For 62 cities we study different variations of growing a synthetic bicycle network between an arbitrary set of points routed on the urban street network. We find initially decreasing returns on investment until a critical threshold, posing fundamental consequences to sustainable urban planning: Cities must invest into bicycle networks with the right growth strategy, and persistently, to surpass a critical mass. We also find pronounced overlaps of synthetically grown networks in cities with well-developed existing bicycle networks, showing that our model reflects reality. Growing networks from scratch makes our approach a generally applicable starting point for sustainable urban bicycle network planning with minimal data requirements.

Wayne Community College launches one of the nation’s first Artificial Intelligence programs

WRAL, Education, Keenan Willard


Wayne Community College is set to become the second community college in the country to launch an artificial intelligence program for students.

Program leaders hope the cutting-edge track will help WCC students and the Goldsboro community capitalize on the state’s burgeoning tech boom.

“It is literally the future,” incoming AI student Brittini Bryant said. “It’s going to be all around us, and I figured why not throw my hat in the ring?”

Scientists use artificial intelligence to detect gravitational waves

Argonne National Laboratory, Press Release


Recently, computational scientist and lead for translational artificial intelligence (AI), Eliu Huerta of the U.S. Department of Energy’s (DOE) Argonne National Laboratory, in conjunction with collaborators from Argonne, the University of Chicago, the University of Illinois at Urbana-Champaign, NVIDIA and IBM, has developed a new production-scale AI framework that allows for accelerated, scalable and reproducible detection of gravitational waves.

This new framework indicates that AI models could be as sensitive as traditional template matching algorithms, but orders of magnitude faster. Furthermore, these AI algorithms would only require an inexpensive graphics processing unit (GPU), like those found in video gaming systems, to process advanced LIGO data faster than real time.

TikTok parent ByteDance has begun selling the video app’s AI to other clients

The Verge, Kim Lyons


ByteDance, the Beijing-based parent company of video sharing app TikTok has started selling TikTok’s AI to other companies, the Financial Times reports (may require subscription). The company has a new division called BytePlus, and according to its website, its client list already includes US fashion app Goat, Singapore travel site WeGo, Indonesian shopping app Chilibeli, and India-based social gaming platform GamesApp.

TikTok’s recommendation algorithm is a huge part of what has made it so popular. Essentially, the way you interact with the app affects what videos it will recommend to you. The company explained in a blog post last year how its ForYou feed determines what videos to serve a given user. Its recommendations are based on user interactions including which videos you like, share comment on, or create; video information, such as captions and hashtags; and device and account settings including what device you’re using, your language preferences, and your location settings.

Clarfeld & Gramling’s Study Describes Tool for Improving Serious Illness Conversations

University of Vermont, Larner School of Medicince


Conversations between seriously ill people, their families and palliative care specialists lead to better quality-of-life. Understanding what happens during these conversations – and particularly how they vary by cultural, clinical, and situational contexts – is essential to guide healthcare communication improvement efforts. To gain true understanding, new methods to study conversations in large, inclusive, and multi-site epidemiological studies are required. A new computer model offers an automated and valid tool for such large-scale scientific analyses.

Research results on this model were published July 1, 2021 in PLOS ONE.

Developed by a team of computer scientists, clinicians and engineers at the University of Vermont, the approach – called CODYM (COnversational DYnamics Model) analysis – uses simple behavioral state-based models (Markov Models) to capture the flow of information during different conversations, based on patterns in the lengths of alternating speaker turns.

When does mathematical tractability predict human behavior?

Statistical Modeling, Causal Inference, and Social Science, Jessica Hullman


… I wonder: What is the landscape of assumptions behind statements like this, which might seem to suggest that a solution concept is a more valid predictor of human behavior if an algorithm can efficiently solve for an exact solution? As someone who often thinks about how human cognition compares to different kinds of statistical processing, I find this idea intriguing but also kind of murky.

For instance, the first example above reads to me like an assertion that ‘if it’s computable, then it’s likely people will achieve it’ whereas the second reads more like, ‘if it’s computable, then it’s within the realm of possibility that people can achieve it.’ But is this difference an artifact of different writing styles or a real difference in assumptions?

Top 10 Ideas in Statistics That Have Powered the AI Revolution

Columbia University, Columbia News


If you’ve ever called on Siri or Alexa for help, or generated a self-portrait in the style of a Renaissance painter, you have interacted with deep learning, a form of artificial intelligence that extracts patterns from mountains of data to make predictions. Though deep learning and AI have become household terms, the breakthroughs in statistics that have fueled this revolution are less known. In a recent paper, Andrew Gelman, a statistics professor at Columbia, and Aki Vehtari, a computer science professor at Finland’s Aalto University, published a list of the most important statistical ideas in the last 50 years.

Below, Gelman and Vehtari break down the list for those who may have snoozed through Statistics 101. Each idea can be viewed as a stand-in for an entire subfield, they say, with a few caveats: science is incremental; by singling out these works, they do not mean to diminish the importance of similar, related work. They have also chosen to focus on methods in statistics and machine learning, rather than equally important breakthroughs in statistical computing, and computer science and engineering, which have provided the tools and computing power for data analysis and visualization to become everyday practical tools. Finally, they have focused on methods, while recognizing that developments in theory and methods are often motivated by specific applications.

Top 10 Ideas in Statistics That Have Powered the AI Revolution – MORE

Statistical Modeling, Causal Inference, and Social Science blog, Andrew Gelman


Kim Martineau at Columbia, who suggested making [the first] list, also had the idea that youall might have suggestions for other important articles and books; tweet your thoughts at @columbiascience of put them in comments below and we can discuss at a future date.

Each idea below can be viewed as a stand-in for an entire subfield. We make no claim that these are the “best” articles and books in statistics and machine learning, we’re just saying they’re important in themselves and represent important developments. By singling out these works, we do not mean to diminish the importance of similar, related work. We focus on methods in statistics and machine learning, rather than equally important breakthroughs in statistical computing, and computer science and engineering, which have provided the tools and computing power for data analysis and visualization to become everyday practical tools. Finally, we have focused on methods, while recognizing that developments in theory and methods are often motivated by specific applications

Machine learning tool sorts the nuances of quantum data

Cornell University, Cornell Chronicle


An interdisciplinary team of Cornell and Harvard University researchers developed a machine learning tool to parse quantum matter and make crucial distinctions in the data, an approach that will help scientists unravel the most confounding phenomena in the subatomic realm.

The Cornell-led project’s paper, “Correlator Convolutional Neural Networks as an Interpretable Architecture for Image-like Quantum Matter Data,” published June 23 in Nature Communications. The lead author is doctoral student Cole Miles.

The Cornell team was led by Eun-Ah Kim, professor of physics in the College of Arts and Sciences, who partnered with Kilian Weinberger, associate professor of computing and information science in the Cornell Ann S. Bowers College of Computing and Information Science and director of the TRIPODS Center for Data Science for Improved Decision Making.

New 2020 voter data: How Biden won, how Trump kept the race close, and what it tells us about the future

The Brookings Institution, William A. Galston


As we saw in 2016 and again in 2020, traditional survey research is finding it harder than it once was to assess presidential elections accurately. Pre-election polls systemically misjudge who is likely to vote, and exit polls conducted as voters leave the voting booths get it wrong as well.

Now, using a massive sample of “validated” voters whose participation has been independently verified, the Pew Research Center has
published a detailed analysis of the 2020 presidential election. It helps us understand how Joe Biden was able to accomplish what Hillary Clinton did not—and why President Trump came closer to getting reelected than the pre-election surveys had predicted.


We Robot 2021 Preliminary Program

University of Miami School of Law


Miami, FL Sep 23-25. “We Robot is the most exciting interdisciplinary conference on the legal and policy questions relating to robots.” [$$$]


Call for Papers – Turing Conference on Trustworthy Digital Identity

“This conference brings together prominent academics and key players in the field of digital identity from government and industry sectors to focus on opportunities in the assessment and development of trustworthy digital identity systems. Outlined themes and key questions cover the driving influences behind the development of these systems, alongside the characteristics that can attest to whether these systems are deserving of trust.” Deadline for submissions is July 30.



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

The Road to MLOps: Machine Learning as an Engineering Discipline

Built In, Cristiano Breuel

from reports “only 22 percent of companies using machine learning have successfully deployed a model.” What makes it so hard? And what do we need to do to improve the situation?

Let’s start by looking at the root causes.

Behind the scenes improvements to the REST API

Crossref, Patrick Polischuk


We’re pleased to announce that we’ll be cutting over to the Elasticsearch-backed version of the REST API over the next few weeks, beginning July 13. This cutover will occur one pool at a time–the public pool will be migrated first, followed by the polite pool on August 3, and the plus pool on August 10 (see ‘etiquette’ link above if you’re unfamiliar with our different pools).

We’ve thoroughly tested the functionality and performance of the new REST API, and we’d like to invite you to test it out before we move production traffic to the new service. Try out your favorite API queries at

Leave a Comment

Your email address will not be published.