Data Science newsletter – July 20, 2021

Newsletter features journalism, research papers and tools/software for July 20, 2021


Wall Street’s Math Whizzes Are Racing to Wire Up the Bond Market

Bloomberg Markets, Justina Lee


Quants have been saying that for years of course, only to have their math-based models frustrated by the cumbersome and complex debt world. The difference now is that they may have a market liquid and transparent enough to accommodate their constant churn.

Electronic venues like MarketAxess and Tradeweb accounted for 37% of investment-grade and 26% of high-yield trading in May, 8 percentage points higher than the year before, Coalition Greenwich data show.

That sets up a virtuous circle, where banks roll out more algorithms to price more bonds. Throw in a year of record flows into credit exchange-traded funds, and a broad swath of securities is becoming easier to trade — vital for a cohort which typically holds hundreds of positions and trades more often than an average fund.

From Data Processes to Data Products: Knowledge Infrastructures in Astronomy

Harvard Data Science Review, Christine L. Borgman and Morgan F. Wofford


We explore how astronomers take observational data from telescopes, process them into usable scientific data products, curate them for later use, and reuse data for further inquiry. Astronomers have invested heavily in knowledge infrastructures—robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds. Drawing upon a decade of interviews and ethnography, this article compares how three astronomy groups capture, process, and archive data, and for whom. The Sloan Digital Sky Survey is a mission with a dedicated telescope and instruments, while the Black Hole Group and Integrative Astronomy Group (both pseudonyms) are university-based, investigator-led collaborations. Findings are organized into four themes: (1) how these projects develop and maintain their workflows; (2) how they capture and archive their data; (3) how they maintain and repair knowledge infrastructures; and (4) how they use and reuse data products over time. We found that astronomers encode their research methods in software known as pipelines. Algorithms help to point telescopes at targets, remove artifacts, calibrate instruments, and accomplish myriad validation tasks. Observations may be reprocessed many times to become new data products that serve new scientific purposes. Knowledge production in the form of scientific publications is the primary goal of these projects. They vary in incentives and resources to sustain access to their data products. We conclude that software pipelines are essential components of astronomical knowledge infrastructures, but are fragile, difficult to maintain and repair, and often invisible. Reusing data products is fundamental to the science of astronomy, whether or not those resources are made publicly available. We make recommendations for sustaining access to data products in scientific fields such as astronomy.

Wearable wellness: Fitness trackers keep faculty mindful of well-being

Penn State University, Penn State News


During the height of the COVID-19 pandemic’s winter surge in cases, a handful of physicians at Penn State Health Milton S. Hershey Medical Center put on more than just personal protective equipment to keep themselves healthy. Wearable fitness trackers allowed some of the Department of Medicine faculty at Penn State College of Medicine to be more mindful of their physical wellness.

The faculty wellness initiative was inspired by an ongoing resident research study that is changing the way participating physicians think about their own health. Dr. Alex Hajduczok, a third-year resident in internal medicine and principal investigator for the project, is studying whether wearable fitness devices can measure wellness and predict burnout.

“What we saw very early on is that people liked using the device and used it to make small but meaningful improvements in their lifestyle,” Hajduczok said.

As the COVID-19 pandemic continued to worsen and required an increasingly demanding schedule for inpatient care teams, the COVID-19 Faculty Well-being Mini-Grants from the Office of Faculty and Professional Development surfaced at the perfect time. Hajduczok’s gears began to turn: Would a similar model work just as well for faculty?

Renewables make it into the grid better with AI

Cosmos Magazine (Australia), Ellen Phiddian


In a highly competitive market, all energy generators rely on highly accurate predictions of how much electricity they’ll be able to make. Australian researchers have figured out a way to improve these predictions for wind and solar farms, using artificial intelligence.

The National Energy Market – “the grid” – requires automatic forecasts every five minutes from electricity generators. This ensures that electricity generation meets demand. It can be very costly if those five-minute forecasts prove to be incorrect.

“The market operates on these five-minute windows,” says Dr Christoph Bergmeir, a senior lecturer in data science and AI at Monash University, and lead researcher on a recently completed ARENA project.

“Every five minutes, the generator has to bid for how much they’ll generate.”

Computational scientists look for lessons learned from the COVID-19 pandemic

Chemical & Engineering News, Sam Lemonick


The COVID-19 pandemic was an all-hands-on-deck event for scientists. Researchers of all stripes looked for ways they could help treat or stop the spread of the SARS-CoV-2 virus, and computational molecular scientists were no exception.

Now, after nearly 18 months, the pandemic has entered a new phase in some parts of the world where vaccination rates are rising and the spread of the virus is slowing. Computational scientists are using the opportunity to take stock of their contributions to fighting the virus. They can point to times when the field’s efforts led to breakthroughs and also when those efforts fell short of expectations. But most importantly, they are trying to understand how their field can be more helpful when the next pandemic arrives.

One computational achievement stands out from the rest—models of the virus’s proteins. In February 2020, with the virus spreading rapidly around the world, structural biologist Jason S. McLellan at the University of Texas at Austin and colleagues at the National Institutes of Health used cryo-electron microscopy (cryo-EM) to make detailed structures of SARS-CoV-2’s spike protein. The virus uses the spike protein to attach to and enter human cells. This protein is a major target for drugs and vaccines. Within weeks of McLellan’s team publishing the cryo-EM data, Rommie Amaro’s group at the University of California San Diego used those structures to create the first computer models of the protein using artificial intelligence and other computational techniques.

A doctor, a researcher and an activist: Perspectives on how sharing data is advancing health care

Microsoft, On the Issues blog, Katharine Rooney


Data science never mattered so much as it does now,” [Lance] Baldo says. “It’s pretty amazing the way that we’ve seen the confluence of biology, biochemistry and data science kind of all coming together, really, arguably, with equal importance in terms of how we’re going to innovate in the future.”

The result was a diagnostic test that determines previous exposure to the virus and that has now been authorized by the U.S. Food and Drug Administration under the Emergency Use Authorization program. Its rapid development underscores how combining biological knowledge, data analysis and machine learning is paving the way for the future of medicine.

Why business and academia need each other for better A.I.

Fortune, Eye on A.I., Jonathan Vanian


The best way to accelerate the needed innovation is cooperation between academics and business, explained Martial Hebert, Carnegie Mellon University’s (CMU) dean of computer science.

Researchers have “200 years of engineering science” to draw from when developing complicated machinery like automobiles. This rich history helps researchers certify and explain how their technologies work. This is important so that when people “take an elevator,” it’s not a mystery to engineers, and by extension, the general public, as to why they are moving up or down, Hebert said.

But with machine learning, “we have basically none of that,” said Herbert said, creating a big challenge for companies developing self-driving cars.

“How do you validate the performance of a system whose performance depends not just on the correctness of the code or the hardware, but all of the data it used for training,” Herbert said about proving how A.I. systems work. “How do you do that when the data can evolve over time in ways that you cannot predict ahead of time?”

Facebook stops funding for brain reading computer interface

MIT Technology Review, Antonio Regalado


The spring of 2017 may be remembered as the coming-out party for Big Tech’s campaign to get inside your head. That was when news broke of Elon Musk’s new brain-interface company, Neuralink, which is working on how to stitch thousands of electrodes into people’s brains. Days later, Facebook joined the quest when it announced that its secretive skunkworks, named Building 8, was attempting to build a headset or headband that would allow people to send text messages by thinking—tapping them out at 100 words per minute.

The company’s goal was a hands-free interface anyone could use in virtual reality. “What if you could type directly from your brain?” asked Regina Dugan, a former DARPA officer who was then head of the Building 8 hardware dvision. “It sounds impossible, but it’s closer than you realize.”

Now the answer is in—and it’s not close at all. Four years after announcing a “crazy amazing” project to build a “silent speech” interface using optical technology to read thoughts, Facebook is shelving the project, saying consumer brain-reading still remains very far off.

Decoding crop genetics with artificial intelligence

Michigan State University, MSU Today


“In terms of plant science, there are major questions that we’re trying to answer: How does a particular gene sequence work? What’s its molecular function? What does it do?” said lead investigator Shin-Han Shiu. Shiu is a professor in the College of Natural Science’s Department of Plant Biology and in Computational Mathematics, Science and Engineering, a department jointly administered by the College of Natural Science and the College of Engineering.

“In the cases where we don’t know, it’s not because we haven’t tried to find out,” Shiu said. “It’s because it becomes harder and harder to find the answers through experiments.”

The Spartan team believes that AI can provide the assistance researchers need to crack those tough cases, which represent a sizable fraction of plant genes.

AI2 spinout MajorBoost uses artificial intelligence to help medical providers deal with insurance calls – GeekWire

GeekWire, Kurt Schlosser


MajorBoost was founded by three tech veterans, including CEO Lekshmi Venu, who has been an entrepreneur in residence at AI2 since January 2020. Venu previously spent six years at Amazon, where she led machine learning for Amazon Payments. She helped start MajorBoost last September. … MajorBoost received $350,000 in pre-seed funding from AI2 and is on the verge of raising a seed round. According to AI2, the startup has several customers in private beta and dozens more on a waitlist. The company is hiring for several roles.

Science Is Political, and We Must Deal with It

ACS Journal of Physical Chemistry Letters, Philip Ball


The issue is not, then, whether and how science can resist being “politicized”, but how the political and ideological dimensions of science can best be managed to make it most effective and beneficial both as an intellectual quest and as a means of, as Bacon put it, relieving (hu)mankind’s estate.

Chemical reactions break free from energy barriers using flyby trajectories

University of Illinois, Illinois News Bureau


A new study shows that it is possible to use mechanical force to deliberately alter chemical reactions and increase chemical selectivity – a grand challenge of the field.

The study led by University of Illinois Urbana-Champaign researcher Jeffrey Moore and Stanford University chemist Todd Martinez demonstrates how external mechanical forces alter atomic motions to manipulate reaction outcomes. The study findings are published in the journal Science.

“We think of chemical reactions as molecules moving on a surface of potential energy in the way hikers follow the contour map of mountains and valleys along a trail,” said lead author Yun Liu, a post-doctoral researcher in Moore’s research group. “A mountain along a reaction path is a barrier that needs to be traversed before the molecules can descend into their final product. Therefore, the relative height of barriers control which path the molecules will most likely choose, allowing chemists to make predictions about what a particular chemical reaction will produce – an outcome called selectivity.”

AED Weighs In on Right to Repair

Construction Equipment magazine


The Associated Equipment Distributors (AED) has reacted to President Biden’s executive order, part of which encourages the Federal Trade Commission (FTC) to “limit equipment manufacturers from restricting people’s ability to use independent repair shops or do DIY repairs.”

Specifically, the executive order urges the FTC Chair to consider working with the rest of the Commission to exercise the FTC’s statutory rulemaking authority to address “unfair anti-competitive restrictions on third-party repair or self-repair of items, such as the restrictions imposed by powerful manufacturers that prevent farmers from repairing their own equipment.”

Following the release of the order, AED’s president & CEO Brian P. McGuire issued the following statement:

“Unfortunately, the right to repair directive is a solution in search of a problem. Currently, consumers of heavy equipment can diagnose and repair their machinery and tractors. They don’t have the ability to alter the safety, security, and environmental protections on the equipment. AED strongly urges the FTC to consider the significant differences between repairing heavy machinery and modifying or tampering with it, as the agency ponders future action.”

“A Real S*** Show”: Soldiers Angrily Speak Out about Being Blocked from Repairing Equipment by Contractors:

Substack , BIG newsletter, Matt Stoller


Louis Rossmann is an important YouTube personality who talks about, among other things, the fact that big firms block their customers from repairing equipment so they can extract after-market profits with replacement parts. And he’s very much noticed the Biden executive order, which calls for agencies to curtail this practice (as well as the FTC report on it). Rossmann did a series of videos on this order, one of which focused on the order calling for the Pentagon to stop contracting with firms that block soldiers from being able to repair equipment. He cited Elle Ekman’s New York Times piece from 2019 on the problem.

What’s even more interesting than the video are the comments on it, from soldiers angry that they keep encountering this problem in the field. I pulled some of them and published them here.

Harvard’s Wyss Institute, Brigham and Women’s Hospital launch Diagnostic Accelerator

MobiHealthNews, Nathan Eddy


The Wyss Institute for Biologically Inspired Engineering at Harvard University and Boston’s Brigham and Women Hospital announced the launch of a Diagnostic Accelerator that aims to create new diagnostic technologies.

The goal is to compress the time frame for introducing diagnostic technologies specifically developed to solve high-value clinical problems.

The Brigham clinical community will identify these unmet needs, which will then be matched to technologies created by the engineering community at Wyss. The resulting diagnostic assays will be jointly validated and optimized in clinical settings.


Next Generation Genomics 2021

Next Generation Genomics


Online September 27-28. “In this conference, we aim to showcase the ‘next-generation’ of genomic methods and technologies, and in particular to provide a platform for junior faculty to present recent discoveries and innovations. This virtual conference will feature advances in both computational and experimental methods, including single cell analysis, genetic engineering, population genetics, and statistical and deep learning.” [registration required]


Call for Papers on Machine Learning and Earth System Modeling

“To accelerate this important application and communication within the fields of atmosphere, ocean, and land, a new special collection in Journal of Advances in Modeling Earth Systems (JAMES), entitled “Machine Learning Application to Earth System Modeling” aims to bring together new research that uses machine learning to advance Earth system modeling.

The collection is open to manuscripts covering use of new machine learning methodologies developed for advancing Earth system science.”



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

How To Learn Stuff Quickly

Josh Comeau


… If you only follow guided resources, you’ll wind up in tutorial hell. You won’t develop the problem-solving skills needed to succeed as a developer. When you try to build your own project, you won’t know where to start. It will feel like you’ve spent so much time practicing without developing any tangible, practical skills.

On the other hand, if you focus entirely on unguided learning, it’ll take forever. Without an experienced guide, you’ll need to reinvent every wheel, spending days or weeks solving already-solved problems. This is a long and frustrating road. In the worst case, you might wind up quitting altogether, convinced (incorrectly!) that you just aren’t smart enough for this stuff.

We want to walk the tightrope between these two extremes, using guided and unguided learning together. There are lots of ways to do this.



Postdoctoral researcher: human mobility, social networks, and COVID-19

Northeastern University, College of Social Sciences and Humanitites; Boston, MA

Leave a Comment

Your email address will not be published.