Data Science newsletter – March 18, 2021

Newsletter features journalism, research papers and tools/software for March 18, 2021

 

The 10 most innovative companies in data science

Fast Company


from

Data continues to be a competitive differentiator for companies, and the businesses that amass unique, often proprietary datasets are well poised to win A-list customers and succeed wildly. This year’s list of the most innovative companies in data science are from a wide array of disciplines, from climate analysis to streaming video recommendations to trucking. On top of that are the companies like Snowflake and LeapAnalysis helping businesses get the most from their own datasets.


New method could democratize deep learning-enhanced microscopy – Salk researchers teach AI to improve microscope image quality—by taking high-resolution images and artificially degrading them

The Salk Institute, Salk News


from

Deep learning is a potential tool for scientists to glean more detail from low-resolution images in microscopy, but it’s often difficult to gather enough baseline data to train computers in the process. Now, a new method developed by scientists at the Salk Institute could make the technology more accessible—by taking high-resolution images, and artificially degrading them.

The new tool, which the researchers call a “crappifier,” could make it significantly easier for scientists to get detailed images of cells or cellular structures that have previously been difficult to observe because they require low-light conditions, such as mitochondria, which can divide when stressed by the lasers used to illuminate them. It could also help democratize microscopy, allowing scientists to capture high-resolution images even if they don’t have access to powerful microscopes. The findings were published March 8, 2021, in the journal Nature Methods.


Digital proximity tracing on empirical contact networks for pandemic control

Nature Communications; G. Cencetti, G. Santin, A. Longa, E. Pigani, A. Barrat, C. Cattuto, S. Lehmann, M. Salathé & B. Lepri


from

Digital contact tracing is a relevant tool to control infectious disease outbreaks, including the COVID-19 epidemic. Early work evaluating digital contact tracing omitted important features and heterogeneities of real-world contact patterns influencing contagion dynamics. We fill this gap with a modeling framework informed by empirical high-resolution contact data to analyze the impact of digital contact tracing in the COVID-19 pandemic. We investigate how well contact tracing apps, coupled with the quarantine of identified contacts, can mitigate the spread in real environments. We find that restrictive policies are more effective in containing the epidemic but come at the cost of unnecessary large-scale quarantines. Policy evaluation through their efficiency and cost results in optimized solutions which only consider contacts longer than 15–20 minutes and closer than 2–3 meters to be at risk. Our results show that isolation and tracing can help control re-emerging outbreaks when some conditions are met: (i) a reduction of the reproductive number through masks and physical distance; (ii) a low-delay isolation of infected individuals; (iii) a high compliance. Finally, we observe the inefficacy of a less privacy-preserving tracing involving second order contacts. Our results may inform digital contact tracing efforts currently being implemented across several countries worldwide. [full text]


Monitoring epidemics: Lessons from measuring population prevalence of the coronavirus | PNAS

Proceedings of the National Academy of Sciences; Samuel J. Clark and Abigail Norris Turner


from

For the United States, data available from the Centers for Disease Control and Prevention (CDC) on 22 January 2021 (1⇓–3) describe at least 442,000 additional deaths beyond what was expected in 2020 (Fig. 1). The bulk—roughly 336,000—can be attributed directly to COVID-19, and many of the remainder are related to the general disruption wrought by the pandemic. For a sense of scale, there were 291,000 American battle deaths in World War II (4). Adding to the catastrophic excess deaths, many of the hundreds of thousands of people who have survived COVID-19 require months to recover and suffer ongoing disabilities, and everyone is affected by myriad disruptions to daily life. The cumulative human suffering related to COVID-19 is staggering. [full text]


Facebook’s next big AI project is training its machines on users’ public videos

The Verge, James Vincent


from

Teaching AI systems to understand what’s happening in videos as completely as a human can is one of the hardest challenges — and biggest potential breakthroughs — in the world of machine learning. Today, Facebook announced a new initiative that it hopes will give it an edge in this consequential work: training its AI on Facebook users’ public videos.

Access to training data is one of the biggest competitive advantages in AI, and by collecting this resource from millions and millions of their users, tech giants like Facebook, Google, and Amazon have been able to forge ahead in various areas. And while Facebook has already trained machine vision models on billions of images collected from Instagram, it hasn’t previously announced projects of similar ambition for video understanding.


Low-Cost Drones Learn Precise Control Over Suspended Loads

IEEE Spectrum, Evan Ackerman


from

I’ll admit to having been somewhat skeptical about the strategy of dangling payloads on long tethers for drone delivery. I mean, I get why Wing does it— it keeps the drone and all of its spinny bits well away from untrained users while preserving the capability of making deliveries to very specific areas that may have nearby obstacles. But it also seems like you’re adding some risk as well, because once your payload is out on that long tether, it’s more or less out of your control in at least two axes. And you can forget about your drone doing anything while this is going on, because who the heck knows what’s going to happen to your payload if the drone starts moving around?

NYU roboticists, that’s who.


How Dating Apps Changed Our Love Lives, for Better or Worse

Discover Magazine, Anamaria Silic


from

As described by a scientist at Kinsey Institute for Research in Sex, Gender, and Reproduction, internet dating is the one of the most significant events in the evolution of human reproduction in human history (second only to that time when Homo sapiens became a non-migratory species, something like ten thousand years ago). And according to a Stanford study, in 2017, about 40 percent of heterosexual couples and 60 percent of same-sex couples in the US met online. That makes online dating the most common way that American couples now meet, even before social distancing-related spikes in dating app signups happened.


DNA databases are too white, so genetics doesn’t help everyone. How do we fix that?

Science News, Tina Hesman Saey


from

It’s been two decades since the Human Genome Project first unveiled a rough draft of our genetic instruction book. The promise of that medical moon shot was that doctors would soon be able to look at an individual’s DNA and prescribe the right medicines for that person’s illness or even prevent certain diseases.

That promise, known as precision medicine, has yet to be fulfilled in any widespread way. True, researchers are getting clues about some genetic variants linked to certain conditions and some that affect how drugs work in the body. But many of those advances have benefited just one group: people whose ancestral roots stem from Europe. In other words, white people.

Instead of a truly human genome that represents everyone, “what we have is essentially a European genome,” says Constance Hilliard, an evolutionary historian at the University of North Texas in Denton. “That data doesn’t work for anybody apart from people of European ancestry.”


How Google’s New Career Certificates Could Disrupt the College Degree (Exclusive)

Inc., Justin Bariso


from

Get a first look at Google’s new certificate programs and a new feature of Google Search designed to help job seekers everywhere.


Scientists hustle to create new tests, drugs and expand telemedicine to battle COVID-19 over the long term

The San Diego Union-Tribune, Gary Robbins


from

“We’re seeing the flip side of the dark year we’ve had with COVID-19,” said Larry Smarr, a professor of computer science and engineering at UC San Diego.

“There’s an explosion of new knowledge and technology around the world’s that’s built on decades of deliberate investment to fight problems like this.”


A technique to track Earth’s subtle movements with orbiting radars is heating up

Science, Megan Cantwell


from

When you imagine satellite imagery, you probably picture crisp photos of Earth taken from orbit. But another technology used to image the planet, synthetic aperture radar (SAR), is heating up, following a surge in public and private satellite launches. SAR satellites use radars, not cameras, to take snapshots of Earth. Comparing two SAR images taken at different times from the same position, a technique known as interferometric SAR (InSAR), can reveal surface movements of just a few millimeters. Watch the video to learn how InSAR is transforming earth science—and what to expect from the technique in the coming years. [video, 4:28]


Why pirates attack: Geospatial evidence

The Brookings Institution, Raj M. Desai and George E. Shambaugh


from

International organizations have long argued that
poverty and
unemployment in coastal communities are underlying
causes of piracy. Others are skeptical that problems facing local fisheries are connected to piracy, based on reports that many pirates are actually members of inland nomadic clans or criminal gangs. In
new research, we explore these links—not only in the context of Indian Ocean piracy, but globally. We segment the world’s oceans into 1 degree-by-1 degree cells, and analyze the spatial links between harmful fishing practices and piracy incidents between 2005 and 2014 (see map). Previous research has tended to focus on country-specific variables (e.g., poverty, per-capita income, conflict, etc.) and has not adequately addressed the location-specific factors that influence piracy. By contrast, our data-driven spatial analysis is based on the geographic locations of actual pirate incidents. What did we find?


New Algorithm Could Reduce Complexity Of Big Data

Texas A&M University, Texas A&M Today


from

Whenever a scientific experiment is conducted, the results are turned into numbers, often producing huge datasets. In order to reduce the size of the data, computer programmers use algorithms that can find and extract the principal features that represent the most salient statistical properties. But many such algorithms cannot be applied directly to these large volumes of data.

Reza Oftadeh, doctoral student in the Department of Computer Science and Engineering at Texas A&M University, advised by Dylan Shell, faculty in the department, developed an algorithm applicable to large datasets that can directly order features from most salient to least.

“There are many ad hoc ways to extract these features using machine learning algorithms, but we now have a fully rigorous theoretical proof that our model can find and extract these prominent features from the data simultaneously, doing so in one pass of the algorithm,” Oftadeh said.


IBM’s AI may lead to new antimicrobials, drugs, and materials

VentureBeat, Kyle Wiggers


from

In a new study published in the journal Nature Biomedical Engineering, researchers at IBM say they’ve developed an AI model that can assist in the rapid design of antimicrobial peptides — the building blocks of proteins. The researchers say that the model outperforms other AI methods at designing such peptides and increases the success rate of identifying a viable candidate by 10%.

Antibiotics have transformed the world of medicine over the past century or so, but they’ve also been overused, leading to the emergence of bacteria with powerful resistance. According to the Centers for Disease Control and Prevention (CDC), antibiotic resistance is one of the biggest public health challenges of our time. In fact, in the U.S. alone, nearly 3 million people die annually as a result of antibiotic-resistant infections.


The Internet Archive on the future of the web

Protocol, Anna Kramer


from

“There are more and more walled gardens where you can’t go. We just have crawlers going at a crazy scale, and they can get blocked just like anybody can get blocked,” said Jefferson Bailey, the Archive’s director of web archiving and data services.

But even still, until someone or something fundamentally changes the rules of the web, the Internet Archive will keep doing what it’s been doing since 1996: preserving every fragment of text you or I are ever likely to read. Tech’s walled gardens might make it harder to get a perfect picture, but the small team of librarians, digital archivists and software engineers at the Internet Archive plan to keep bringing the world the Wayback Machine, the Open Library, the Software Archive, etc., until the end of time. Literally.


Deadlines



Statistical Inference for Network Models

Online June 23. “The Statistical Inference for Network Models symposium is a satellite of the Networks 2021 conference. SINM2021 will be held online on June 23, 2021. SINM will feature a mix of invited and contributed talks.” Deadline for abstracts submissions is April 15.

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



A developers guide to HIPAA compliance and application development.

GitHub – truevault


from

… This guide will give you enough information to give you a strong understanding of HIPAA without getting bogged down in the legalese. We’ve tried to keep it straight forward, written in plain language.


VoterFraud2020 – a Twitter Dataset of Election Fraud Claims

Cornell Tech; Anton Abilov, Yiqing Hua, Hana Matatov. Mor Naaman


from

We are making publicly available VoterFraud2020, a multi-modal Twitter dataset with 7.6M tweets and 25.6M retweets from 2.6M users that includes key phrases and hashtags related to voter fraud claims between October 23rd and December 16th. The dataset also includes the full set of links and links to YouTube videos shared in these tweets, with data about their spread in different Twitter sub-communities. Key takeaways from our initial analysis of the data are listed below.


I made changes to my Git tutorial based on questions and comments that I received at the Portland R User Group meeting on 2021-02-23.

Twitter, Jim Tyhurst


from

The context for when to use the Git commands should be clearer now.


Careers


Full-time, non-tenured academic positions

Scholarly Assistant Professor



Washington State University, School of Electrical Engineering and Computer Science; Pullman, WA
Full-time positions outside academia

Manager Basketball Data Scientist



New York Knicks; New York, NY

Leave a Comment

Your email address will not be published.