Data Science newsletter – February 22, 2022

Newsletter features journalism, research papers and tools/software for February 22, 2022


How the Dutch use big data to keep their country afloat

The Next Web, Alejandro Tauber


What’s challenging here is that bridges and locks are sometimes tendered by the responsible province or municipality, explains Therry van der Burgt, solution architect at Rijkswaterstaat.

“There’s no one-size-fits-all approach to extract data from public infrastructure, so we had to come up with a standardized way to tell if a bridge is open or not.”

Which they did, by attaching internet-connected sensors to the road barriers on those bridges.

The slippery science of Olympic curling: we still don’t know how it works

The Conversation, Shane Keating


Australia’s first ever Olympic curling team scored an historic win but missed the medal podium at the 2022 Winter Olympic Games in Beijing. It was a remarkable performance for a team lacking any dedicated curling facilities at home.

And that’s important, because it is the special properties of curling ice that allow the heavy curling stones to glide and curve in ways that seem to defy physics. In fact, scientists are still not sure what puts the “curl” in curling.

The state of preprints at PLOS

The Official PLOS Blog


In the weeks since announcing our expanded partnership with medRxiv, we’ve received lots of thoughtful questions from our authors, editors, and the scientific community. In this post, we aim to answer some of them.

But first, a quick recap of the role preprints have played at PLOS. While PLOS journals have always welcomed submissions with associated preprints, it wasn’t until 2016 that we formalized our support for preprints in journal policy and began accepting incoming direct transfers from bioRxiv. In 2018, we expanded that partnership and made it a 2-way street, giving authors the option to have their manuscripts forwarded to bioRxiv as part of the submission process at select PLOS journals. In 2019 came direct transfers from medRxiv to PLOS, and just this month, facilitated posting from PLOS to medRxiv.

Big-data approaches lead to an increased understanding of the ecology of animal movement

Science, Ran Nathan et al.


Movement is ubiquitous across the natural world. All organisms move, actively or passively, regularly or during specific life stages, as a result of varied proximate drivers including energetic demands, social interactions, competition or predation. Movement closely interacts with individual fitness, affects a myriad of ecological processes, and is crucial for animals’ ability to cope with human-induced rapid environmental changes. Driven by advances in analytical methods and technologies for tracking mammals, birds, fish, and other free-ranging vertebrates (hereafter, wildlife), movement ecology is rapidly transforming into a data-rich discipline, following previous developments in fields such as genomics and environmental monitoring. This ongoing revolution is facilitated by cost-effective automated high-throughput wildlife tracking systems that generate massive high-resolution datasets across scales relevant to the ecological context in which animals perceive, interact with, and respond to their environment.

Toward data-driven, dynamical complex systems approaches to disaster resilience

Proceedings of the National Academy of Sciences; Takahiro Yabe, P. Suresh C. Rao, Satish V. Ukkusuri, Susan L. Cutter


With rapid urbanization and increasing climate risks, enhancing the resilience of urban systems has never been more important. Despite the availability of massive datasets of human behavior (e.g., mobile phone data, satellite imagery), studies on disaster resilience have been limited to using static measures as proxies for resilience. However, static metrics have significant drawbacks such as their inability to capture the effects of compounding and accumulating disaster shocks; dynamic interdependencies of social, economic, and infrastructure systems; and critical transitions and regime shifts, which are essential components of the complex disaster resilience process. In this article, we argue that the disaster resilience literature needs to take the opportunities of big data and move toward a different research direction, which is to develop data-driven, dynamical complex systems models of disaster resilience. Data-driven complex systems modeling approaches could overcome the drawbacks of static measures and allow us to quantitatively model the dynamic recovery trajectories and intrinsic resilience characteristics of communities in a generic manner by leveraging large-scale and granular observations. This approach brings a paradigm shift in modeling the disaster resilience process and its linkage with the recovery process, paving the way to answering important questions for policy applications via counterfactual analysis and simulations.

Sold on the new machine – Anticipating an eventual proof of concept, drugmakers commit to R&D-wide artificial intelligence

Chemical & Engineering News, Rick Mullin


In December, Roche announced a deal with Recursion, an artificial intelligence technology developer with its own drug discovery program. Roche will access Recursion’s AI-driven drug discovery platform to identify and develop up to 40 new medications for cancer and neurological conditions. Hailed in the business press as a multibillion-dollar AI deal—a perspective heavily influenced by potential milestone payments— the engagement will begin with the payment of $150 million to Recursion.

Within weeks, Sanofi announced a similar deal with Exscientia, another drug-hunting AI technology developer. Focused on as many as 15 novel small-molecule drug candidates in oncology and immunology, the deal carries an up-front payment of $100 million and the potential for $5.2 billion more if all future milestones are hit.

By any measure, these deals are notable for their scope and the level of investment on the part of major pharmaceutical companies in AI-enabled drug discovery and development. Several other deals, smaller in scale, were announced in the first 2 weeks of 2022, seeming to make good on claims by AI platform developers that Big Pharma is ready to go big on AI.

Researchers are using machine learning to understand how brain cells work

University of Wisconsin-Madison, News


For something so small, neurons can be quite complex — not only because there are billions of them in a brain, but because their function can be influenced by many factors, like their shape and genetic makeup.

A research team led by Daifeng Wang, a Waisman Center professor of biostatistics and medical informatics and computer sciences at the University of Wisconsin–Madison, is adapting machine learning and artificial intelligence techniques to better understand how a variety of traits together affect the way neurons work and behave.

Called manifold learning, the approach may help researchers better understand and even predict brain disorders by looking at specific neuronal properties. The Wang lab recently published its findings in two studies.

AI enables strategic hydropower planning across Amazon basin

Cornell University, Cornell Chronicle


The idea was to start with the many proposed hydropower dams in the Marañón River basin, in Peru and Ecuador. At nearly 1,100 miles long, the Marañón is one of the largest free-flowing rivers in South America, and one of the main headwaters of the 4,345-mile-long Amazon River. Weeding through massive portfolios involving the placement of dozens of dams, with each affecting the other as well as the river’s flow, was a gargantuan request.

Gomes’ response? Why not consider the entire Amazon basin – approximately 2.4 million square miles, more than a third of the continent?

“As we got more excited, I thought, ‘Let’s keep pushing, let’s go big,’” said Gomes, the Ronald C. and Antonia V. Nielsen Professor in the Cornell Ann S. Bowers College of Computing and Information Science. “I felt that we could really develop these AI techniques to scale to the entire Amazon basin.”

How many words does it take to make a mistake?

London Review of Books, William Davies


It isn’t difficult to understand why plagiarism happens, given the omnipresence of the internet. The students I see at the hearings are often struggling to keep up with their studies for reasons that may have to do with family, paid work or mental health. Their attendance has been dropping, they’re behind on their reading, when suddenly an essay deadline looms into view and panic sets in. The suspension of face to face teaching for more than a year, combined with limited social interaction with other students and worries about loved ones, appears to have caused many more students than usual to struggle. Loneliness and anxiety combine to produce a form of helplessness.

The experience of lockdown has also made me acutely aware of another not unrelated issue: the difficulty of sustaining traditional humanistic notions of authorship in an online-only environment. It is hardly news that the internet has put a huge strain on analogue conventions of copyright and intellectual property: paranoia about digital piracy and plagiarism is as old as the world wide web. The academic insistence on using bibliographic citation techniques developed for the printing press feels increasingly eccentric now that reading materials and essays exist in a digital (and therefore interconnected) form. The norms concerning what counts as a credible source, or a legitimate quotation or paraphrase, have been under pressure for some time – and never more so than during the last two years, when non-digital avenues for teaching, discussion and reading have been closed.

Maryland Bill Offers Strong Privacy Protections Against Biometric Data Collection

Electronic Frontier Foundation, Hayley Tsukayama


We are encouraged to see Maryland recognize the harms that unconsented collection can inflict on people as they go about their daily lives. And we were particularly encouraged to see Finance chair, Sen. Delores Kelley, and the bill’s sponsor, vice-chair Sen. Brian Feldman, push back on those advocating to eliminate perhaps the most important piece of this bill: the private right of action.

As we said in our testimony, laws are often only as good as their enforcement. This is why it is a top priority for the Electronic Frontier Foundation to include private rights of action in privacy laws, including those that protect biometric privacy. Consumer enforcement is part of EFF’s “bottom-up” approach to public policy. Ordinary technology users should have the power to decide for themselves whether to bring a lawsuit to enforce their statutory privacy rights.

UI to propose two new academic programs to the state Board of Regents

The Daily Iowan, Kate Perez


The University of Iowa will request approval to add two new programs, a doctorate of philosophy in criminology program in the UI Graduate College and a master’s of science in data science program in the Graduate College.

History offers a cautionary tale for biometric covid tracking systems

The Washington Post, Made By History, Michelle Spektor


Places around the world are using biometric technologies for coronavirus contact tracing and surveillance. For example, a municipality in South Korea is expected to roll out a program that uses facial recognition to track infected people. Other countries plan to implement, or have already implemented, similar systems, and some U.S. states are moving in that direction. Companies are developing facial recognition systems equipped with body temperature sensing capabilities, and Seychelles International Airport just implemented such a system for traveler health screening.

Legal experts and advocacy groups have said these systems raise concerns about privacy, data collection without consent, expanding government surveillance and discrimination against marginalized groups. Studies show that facial recognition technologies are more likely to misidentify women, the elderly and Black, Asian and Indigenous people. Biometric systems used for contact tracing could also expand beyond their original purposes. After all, biometric coronavirus tracking systems often rely on preexisting infrastructures, like closed-circuit television (CCTV) systems, originally created for other reasons.

And history indicates that these fears are well-founded.

Information and Credible Sanctions in Curbing Online Cheating Among Undergraduates: a Field Experiment

National Bureau of Economic Research, Bulletin on Entrepreneurship, Daniel L. Dench & Theodore J. Joyce


The rapid increase in online instruction in higher education has heightened concerns about cheating. We use a randomized control design to test whether informing students that we can detect plagiarism reduces cheating. We further test whether informing students they have been caught cheating reduces subsequent cheating. We find informing students about our capability to detect plagiarism has little effect on cheating. Notifying students that they have been caught cheating and are on a watch list reduces subsequent cheating attempts by at least 65 percent depending on the class and sample. We test for peer effects but conclude we cannot credibly identify peer effects distinct from own-cheating propensities.

UCI Invents Fabric That ‘Enables Digital Communication Between Wearers’

University of California, Irvine; New University campus newspaper


The Tseng Research Group of UCI’s Henry Samueli School of Engineering invented a highly flexible, body-motion-tolerant fabric capable of replacing smart devices. This innovative technology allows the wearer to perform various digital tasks, such as making wireless payments, starting a car and opening locks — all with just a wave of a hand.

Advances in near-field communication (NFC), which are sets of communication protocols that enable communication between two electronic devices over short distances, have resulted in useful everyday functions. This technology has allowed for functions such wireless device charging and powering of battery-free sensors.

One drawback of NFC has been its limited range — the technology only works within a couple of inches. The Peter Tseng group extended the signal reach to more than 4 feet using passive magnetic synthetic composite materials (metamaterials) based on etched foils of copper and aluminum.

Short thread: Strongly-worded editorial from @ScienceMagazine EIC criticizing Biden’s decision to appoint Francis Collins as science adviser rather than Alondra Nelson.

Twitter, Peter Aldhous


“tone deaf move” … “the decision to keep Nelson out of this role reflects a chronic ill in America—inequity at the highest levels of leadership.”

It’s time … to commit to a vision of the future where … equity for women and people of color is prioritized over protecting processes that reinforce injustices of the scientific enterprise, and where leaders … represent the future, not the past.

Let’s make our reviews open, starting now

Upstream, Iratxe Puebla


Over the last few years, there has been a push for more transparency in journal peer review and many journals now publish the review reports for published articles (Wolfram et al.). This is a very welcome development, but it is not yet the norm across scholarly journals. Publishing reports with journal articles also means that we only get to see the reviews for those papers that made it to publication, and only the reviews for the submission to that particular journal – any review activity for earlier versions of the work is still lost.

In this context, peer reviews are still very much focused on informing an editorial decision. Most reviewers contribute their time to peer review out of a motivation to give back to the community, to support this scholarly activity as they expect others will do for their own manuscripts. This is a laudable motivation, but the focus on journal gatekeeping means that many authors feel disempowered in the peer review process, where they feel they should bend to every request by reviewers to get the acceptance decision they long for.

The good news is that we do not need to wait for a protracted system change, there are ways in which researchers can already make their reviews both more open and useful for the community

Deadlines Ethics Fellowship

“The Stanford Center for Biomedical Ethics (SCBE) and announce two to three post-doctoral fellowship positions focused on exploring the ethical considerations associated with using artificial intelligence and machine learning (AI/ML) to discover transformational medicines and to deliver improved clinical outcomes.” [rolling deadline]



The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.


Tools & Resources

Are You Still Using Real Data to Train Your AI?

IEEE Spectrum, Eliza Strickland


It may be counterintuitive. But some argue that the key to training AI systems that must work in messy real-world environments, such as self-driving cars and warehouse robots, is not, in fact, real-world data. Instead, some say, synthetic data is what will unlock the true potential of AI. Synthetic data is generated instead of collected, and the consultancy Gartner has estimated that 60 percent of data used to train AI systems will be synthetic. But its use is controversial, as questions remain about whether synthetic data can accurately mirror real-world data and prepare AI systems for real-world situations.

Nvidia has embraced the synthetic data trend, and is striving to be a leader in the young industry. In November, Nvidia founder and CEO Jensen Huang announced the launch of the Omniverse Replicator, which Nvidia describes as “an engine for generating synthetic data with ground truth for training AI networks.” To find out what that means, IEEE Spectrum spoke with Rev Lebaredian, vice president of simulation technology and Omniverse engineering at Nvidia.

Overview of MLOps Definition; As the field of MLOps continues to mature, so does its definition.

Twitter, The Institute for Ethical AI & Machine Learning


This article provides a great revisit to the term of MLOps including the sub-themes it encompasses, as well as a definition and examples of each of these.

Leave a Comment

Your email address will not be published.