Data Science newsletter – March 22, 2021

Newsletter features journalism, research papers and tools/software for March 22, 2021

 

Smart mask purifies the air and monitors how you breathe

Innovation Origins, Arnoud Cornelissen


from

Narvalo wants face masks to transform from a protective product into an informative one, offering important insights into the user’s breathing and the surrounding air quality.


Seattle’s surveillance contractor has history of illegal sales, bribery, worrying privacy advocates

The Seattle Times, Patrick Malone


from

When a cellphone passes within Wi-Fi range, sensors atop traffic signals detect it and in a flash send its digital fingerprint to the little dish — an antenna. Instantaneously, the data is piped through Ethernet cables to a hub where Seattle city traffic planners can track movements in real-time, sifting massive tranches of data to analyze traffic patterns.

A web of more than 300 of these sensors spans the city, mostly downtown, along the Mercer Street corridor and in South Lake Union. Their data helped Seattle’s traffic planners to cut average delays during peak evening commuting hours through downtown in half between 2017 and late 2019, through simple adjustments to traffic signal times, according to the city.

But the data comes with a cost beyond this year’s $300,000 expense to the city. The data collection raises civil liberties questions for anyone who passes through Seattle’s core with a cellphone, and concerns about the city’s association with a company with a troubling history.

A group of privacy advocates from a Seattle nonprofit who’ve studied the technology worry the city’s contractor, Acyclica, could resell data to law enforcement and federal agencies, exposing individuals’ movements and personal associations.


Smart cities need an audit plan for IoT

Stacy on IoT blog, Stacy Higginbotham


from

Acyclica shares only the time it takes for a vehicle to get from point to point with the City of Seattle. It does so by using the random value generated to replace the leftover bits of the MAC address, so it’s effectively tracking a device in a vehicle, although the city doesn’t have a way to see anything that identifies the device. Moreover, every 12 hours the random value associated with a device is changed.

And after 24 hours, the entire random value and route of the phone that supplied the original MAC address had taken is permanently deleted, according to Tim McDowd, a spokesman for FLIR. This is a good system, no matter what The Seattle Times story might imply.

I was concerned at first, but after digging into the way the monitoring, randomization, and data retention work, I feel pretty comfortable. I suppose one could argue for a shorter retention time, but keeping the random number (which is the weakest link that could perhaps be used to identify someone) for half a day is pretty good. Permanent deletion after a day is even better.


DOE Announces $24.5 Million for Manufacturing Innovation to Build a Clean, Resilient Electric Grid

U.S. Department of Energy


from

The U.S. Department of Energy (DOE) today announced up to $24.5 million to support improvements in domestic manufacturing to build resilient, modern electricity infrastructure and address the climate emergency. The two funding opportunities will back research and development (R&D) for the materials and technologies needed to expand the grid with new, clean-energy sources, deliver affordable electricity to disadvantaged communities, and help reach the Biden Administration’s goal of net-zero carbon emissions by 2050. … “The key to unlocking the full potential of solar and wind energy is to store it for use around the clock,” said U.S. Representative Diana DeGette. “Flow battery technology can help us utilize the full potential of these clean-energy resources, and investing in this important new technology now is vital to our overall effort to combat the climate crisis.”


‘Total game changer’: Pinpointing gene activity in tissues is aiding studies of COVID-19, Alzheimer’s

Science, Elizabeth Pennisi


from

As with real estate, location matters greatly for cells. Douglas Strand confirmed that truth last year when he used a new technique to map gene activity in bladder cancers. Until recently, scientists wanting to know all the genes at work in a tissue could analyze single cells without knowing their position, or they could measure average activity levels of genes across thousands of cells. Now, an emerging technology called spatial transcriptomics combines precision and breadth, mapping the work of thousands of genes in individual cells at pinpoint locations in tissue. That, Strand says, has been a “total game changer” for his research.

The virtual Advances in Genome Biology and Technology (AGBT) meeting this month was a big coming-out party for the technique, which is revealing whole new landscapes of gene expression.


Michigan epidemiologist: New COVID-19 outbreaks tied to high school and youth sports

Detroit Free Press, Dave Boucher and Kristen Jordan Shamus


from

School-related COVID-19 outbreaks are rising in Michigan, and many are tied to youth and high school sports, Michigan’s leading epidemiologist said Wednesday.

“The largest number of outbreaks are in K-12 school settings at 162, with 54 new outbreaks reported this week,” said Sarah Lyon-Callo, the director of the Bureau of Epidemiology and Population Health at the state health department.

That comes as the majority of pandemic metrics in the state — including case rates, test positivity rates and hospitalizations — are trending in the wrong direction. Some of these figures mirror rates seen in late October, when trends spiked, prompting the state to issue a ban on indoor contact sports in mid-November.


Angela Radulescu: Can data include personal narrative?

Medium, NYU Center for Data Science


from

Computational psychiatry brings quantitative tools to bear on the way mental illness is studied, classified and treated. A common aim across different cultures of computational psychiatry is to provide a set of features that, when inferred over time at the individual level, can predict both symptom outcomes, and the effect of different therapeutic interventions. The underlying assumption is that various behavioral and neurophysiological measures (e.g. choices, reaction times, speech patterns, gaze trajectories, etc.) can be summarized as individual data points in a useful feature space. The hope is that the structure of this space will map onto a diagnostic taxonomy; that process models can bridge the gap between behavior and its biological determinants; and that the dynamics of data points in this space can inform therapeutic interventions.

A question I always come back to when thinking about my own work in this area is, how can a computational approach account for the heterogeneity of biological and environmental factors specific to each individual’s individual story and subjective experience? Engaging with this question in a precise way is quite challenging, as it would require a record of a person’s significant past experiences, psychosocial context, subjective interpretation, and so on. Together, these form a personal narrative, and are often key pieces of information in clinical assessments.


Amazon And Microsoft Must Face Claims Over Biometric Privacy

MediaPost, DigitalNewsDaily, Wendy Davis


from

Illinois residents may proceed with a lawsuit alleging that Amazon and Microsoft violated a state biometric privacy law by amassing a database of fingerprints, a federal judge has ruled.


Google Suffers Another Defeat In Privacy Battle Over Chrome

MediaPost, DigitalNewsDaily, Wendy Davis


from

Siding against Google, a federal judge has refused to dismiss a privacy lawsuit claiming that the Chrome browser collects personal information from logged-out users — including their IP addresses, identifiers stored on cookies, and data about web-browsing activity.

In a decision issued Wednesday, U.S. District Court Judge Lucy Koh in San Jose rejected the company’s argument that Chrome users consent to the data collection.


How Different Are Americans’ Facebook Feeds?

The Markup, Citizen Browser


from

Snapshots from the Facebook feeds of our Citizen Browser panelists illuminate how Facebook’s recommendation algorithm siloes information on the platform


Does “AI” stand for augmenting inequality in the era of covid-19 healthcare?

BMJ, David Leslie


from

Among the most damaging characteristics of the covid-19 pandemic has been its disproportionate effect on disadvantaged communities. As the outbreak has spread globally, factors such as systemic racism, marginalisation, and structural inequality have created path dependencies that have led to poor health outcomes. These social determinants of infectious disease and vulnerability to disaster have converged to affect already disadvantaged communities with higher levels of economic instability, disease exposure, infection severity, and death. Artificial intelligence (AI) technologies—quantitative models that make statistical inferences from large datasets—are an important part of the health informatics toolkit used to fight contagious disease. AI is well known, however, to be susceptible to algorithmic biases that can entrench and augment existing inequality. Uncritically deploying AI in the fight against covid-19 thus risks amplifying the pandemic’s adverse effects on vulnerable groups, exacerbating health inequity.


Bringing Intel Inside

University of California-San Diego, UC San Diego News Center


from

The Halıcıoğlu Data Science Institute (HDSI) at the University of California San Diego is at the vanguard of developing data science talent for the tech industry and sharing the classroom with original equipment manufacturers (OEM) and independent software vendor (ISV) leaders.

HDSI students and Intel staff members, along with Associate Professor and Data Science Fellow Bradley Voytek, Ph.D., started collaborating on a rich dataset provided by Intel.

“The main goal was to bring a rich data set into the academic domain – safely and securely – and work with faculty and students to gain new insights, as well as provide a new generation of Data Science talent with the domain expertise that can benefit Intel and the computing industry,” says Bijan Arbab, director of system telemetry at Intel.


Google to invest $50M in Texas and shares new details on its upcoming Houston office

InnovationMap, John Egan


from

Mark it on your calendars: In May, Google is expected to complete the buildout of its first office in Houston.

Google will occupy one floor totaling 11,000 square feet in the One Buffalo Heights building at 3663 Washington Ave. The tech giant announced the Houston office last June.


JPMorgan now has 50 people working in its rarefied AI labs

eFinancialCareers, Sarah Butcher


from

If you want an artificial intelligence job in an investment bank that combines elements of working in academia, you probably want to work in one of the “labs” that leading banks have set up to push the boundaries of AI applications in finance.

JPMorgan has one such lab, which it founded in 2018. It’s led by Manuela Veloso, who’s also the head of machine learning research at Carnegie Mellon University. In the past year, it’s been adding staff to a new lab in London.

Speaking at this week’s AI & Data Science in Trading and Technology event, Naftali Cohen, a VP in AI research at JPMorgan who himself is also an adjunct professor at New York’s Tandon School of Engineering, said the banks’ labs now employ 50 people globally.


What the drive for open science data can learn from the evolving history of open government data

The Conversation; Stefaan G. Verhulst, Andrew J. Zahuranec, Andrew Young


from

Nineteen years ago, a group of international researchers met in Budapest to discuss a persistent problem. While experts published an enormous amount of scientific and scholarly material, few of these works were accessible. New research remained locked behind paywalls run by academic journals. The result was researchers struggled to learn from one another. They could not build on one another’s findings to achieve new insights. In response to these problems, the group developed the Budapest Open Access Initiative, a declaration calling for free and unrestricted access to scholarly journal literature in all academic fields.

In the years since, open access has become a priority for a growing number of universities, governments, and journals. But while access to scientific literature has increased, access to the scientific data underlying this research remains extremely limited. Researchers can increasingly see what their colleagues are doing but, in an era defined by the replication crisis, they cannot access the data to reproduce the findings or analyze it to produce new findings. In some cases there are good reasons to keep access to the data limited – such as confidentiality or sensitivity concerns – yet in many other cases data hoarding still reigns.

To make scientific research data open to citizens and scientists alike, open science data advocates can learn from open data efforts in other domains. By looking at the evolving history of the open government data movement, scientists can see both limitations to current approaches and identify ways to move forward from them.


Events



HIVE Supercomputer Symposium

Georgia Institute of Technology, Institute for Data Engineering and Science


from

Online March 24, starting at 1 p.m. Eastern. “This symposium will introduce the Hive supercomputer and some of the research being done on it, including simulations of how the coronavirus binds to human cells, how to represent atomistic systems for surface science and catalysis with machine learning models, and how weak interactions influence chemical reactions.” [registration required]

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



A growing body of data science and statistics methods can help fill critical race and ethnicity data gaps

Twitter, Alena Stern


from

a new @urbaninstitute
brief outlines 5 ethical risks to consider when using these tools


More Data Science Cheatsheets

KDnuggets, Matthew Mayo


from

What makes a cheatsheet worthy of being singled out as a particularly good one? It’s difficult to put your finger on precisely what makes a good cheatsheet, but obviously one which conveys essential information concisely — whether that information is of a specific of general nature — is definitely a good start. And that is what makes our candidates today noteworthy. So read on for four curated complementary cheatsheets to assist you in your data science learning or review.


Careers


Full-time, non-tenured academic positions

Associate Research Scholar



Columbia University, Center for Spatial Research; New York, NY

Leave a Comment

Your email address will not be published.