Data Science newsletter – August 4, 2021

Newsletter features journalism, research papers and tools/software for August 4, 2021

Apple Watch’s data ‘black box’ poses research problems

The Verge, Nicole Wetsman

from July 27, 2021

A Harvard biostatistician is rethinking plans to use Apple Watches as part of a research study after finding inconsistencies in the heart rate variability data collected by the devices. He found that the data collected during the same time period appeared to change without warning.

“These algorithms are what we would call black boxes — they’re not transparent. So it’s impossible to know what’s in them,” JP Onnela, associate professor of biostatistics at the Harvard T.H. Chan School of Public Health and developer of the open-source data platform Beiwe, told The Verge.

We’ve just added built-in citation support to GitHub so researchers and scientists can more easily receive acknowledgments for their contributions to software.

Twitter, Nat Friedman

from July 27, 2021

Just push a CITATION.cff file and we’ll add a handy widget to the repo sidebar for you.

New Purdue research building will offer a world’s first in hypersonic testing, materials development

Purdue University, News

from July 27, 2021

Imagine an aircraft flying 2,800 miles across the United States in only 15 minutes. A state-of the-art building ready for construction at Purdue University will provide the facilities to explore that idea through advanced hypersonic research.

The planned 65,000-square-foot Hypersonic Applied Research Facility (HARF) will house two cutting-edge wind tunnels, enhancing Purdue’s world-leading capabilities in hypersonics evaluation and testing. The $41 million facility will house the only Mach 8 quiet wind tunnel in the world as well as a hypersonic pulse (HYPULSE) shock tunnel.

Elon Musk’s Brain-Computer Startup Neuralink Raises $205 Million

Bloomberg Technology, Sarah McBride

from July 29, 2021

The funds will help Musk’s Austin, Texas-based Neuralink work on its brain implant for quadriplegics, a device which in time could also assist other people with neurological disorders. The technology could help provide patients “digital freedom” by allowing them to interact with computers and phones “in a high bandwidth and natutralistic way,” according to the announcement.

The next big step for Neuralink will be clinical trials in humans of its device, which so far has been shown off in pigs and a Pong-playing monkey.

In some California privacy cases, analytics trackers are in the crosshairs

Digiday, Kate Kaye

from July 27, 2021

As enforcement letters stream out to advertisers, social media sites, data brokers and ad tech firms from the California Attorney General’s office, it is clear that California Consumer Privacy Act enforcement is not just about data breaches. It’s about cookies and tracking technologies — including analytics trackers. And the penalties for violations could be steep.

What Matter means for digital assistants and the smart home

Stacey Higginbotham, Stacey on IoT

from July 26, 2021

I used to say that the smart home was made up of devices and a hub. Now, it’s probably fair to say that a smart home is made up of connected devices and a digital assistant. Most people want a full-blown digital assistant like Alexa, Siri, or Google, not just a cluster of automation software like SmartThings or Hubitat.

In the wake of the Alexa Live conference last week, we now know how Amazon plans to implement the upcoming Matter smart home interoperability protocol. We’re also beginning to understand how the smart home industry will evolve after this protocol — which represents a huge shift for the smart home — is adopted. Here’s what I’m seeing and how the big three players in the smart home are selling their digital assistants.

More than 500 colleges and universities across the country are requiring vaccines for at least some students or employees.

Twitter, Jay Van Bavel

from July 29, 2021

At the same time, only about 29% of adults aged 18-24 have received at least one dose, the lowest rate of any eligible age group

Gordian Partners with APPA to Estimate Higher Education Infrastructure Backlog Need at $112.3B

Gordian

from July 20, 2021

Higher Education institutions in the United States spend about $36.8B annually on facilities operations, maintenance and utilities, and an average of $27.8B annually for the construction of new facilities and/or the renovation of existing buildings. Colleges and universities manage more than 6.2B square feet of floor space in 210,000 buildings. These buildings have an average age approaching 50 years and a Current Replacement Value (CRV) estimated to exceed $2TR (including utilities infrastructure, roads and landscaping).

Wharton is first top MBA program to welcome class with more women than men

Fortune, Education, Sydney Lake

from July 28, 2021

For the first time in its 140-year history, the Wharton School of the University of Pennsylvania will welcome an MBA class composed of more women than men this fall, making it the first M7 school to reach this threshold.

New State Laws Hamstring Public Health Officials

The Pew Charitable Trusts, Stateline, Christine Vestal

from July 29, 2021

Starting last year, mostly GOP state legislators in nearly every state proposed hundreds of bills to either specifically preclude the use of proven public health measures or more broadly tie the hands of state and local health officials, and governors, in a public health crisis.

At least 20 states have approved laws sharply curtailing health agencies’ emergency powers to invoke restrictions such as quarantines, mask wearing, business closures and vaccine mandates.

Many of the new laws also limit the duration of public health emergencies, require legislative approval to declare or continue emergency status, and prohibit state and local governments from issuing public health safeguards without oversight by elected officials.

Now at #ISMBECCB21, @ProfKateJones is talking about the link between environment, climate change and disease.

Twitter, Dr Chloé Azencott

from July 28, 2021

… Species that make us sick thrive in degraded (intensely-used, human-managed, urban) land. Uh-oh. [thread]

The Question Medical AI Can’t Answer

Scientific American, Jason H. Moore

from July 29, 2021

The application of AI in the health care domain has also revealed many of its weaknesses, outlined in a recent guidance document from the World Health Organization (WHO). The document covers a lengthy list of topics, each of which are just as important as the last: responsible, accountable, inclusive, equitable, ethical, unbiased, responsive, sustainable, transparent, trustworthy and explainable AI. These are all vital to the delivery of health care and consistent with how we approach medicine when the best interest of the patient is paramount.

It is not hard to understand how an algorithm might be biased, exclusive, inequitable or unethical. That could be explained by the possibility of its developers not giving it the ability to discern good data from bad, or that they hadn’t been aware of data problems because these often arise from discriminatory human behavior. One example is unconsciously triaging emergency room patients differently based on the color of their skin. Algorithms are good at exploiting these kinds of biases, and making them aware of them can be challenging. As the WHO guidance document suggests, we must weigh the risks of AI carefully with the potential benefits.

Examining the Modeling Framework of Crime Hotspot Models in Predictive Policing

Journal of Humanistic Mathematics, Heidi Goodson and Alanna Hoyer-Leitzel

from July 29, 2021

Predictive policing has its roots in crime hotspot modeling. In this paper we give an example of what goes into mathematical crime hotspot modeling and show that the modeling assumptions perpetuate systemic racism in policing. The goal of this paper is to raise objections to this field of research, not on its mathematical merit, but on the scope of the problem formation. We do not make any suggestions about how to incorporate race into predictive policing models in order to make them less biased.

How Americans spend their free time, in charts

Popular Science, Eleanor Cummins

from July 28, 2021

Is Data Science the New Coding?

EdSurge News, Nadia Tamez-Robledo

from July 28, 2021

There’s been a big push in recent years to get students coding. Before that, the big skill was learning to use word processing software. And before that, typing.

Now there’s a movement to bring the latest futuristic skill into classrooms: data science. It’s one that relies not just on technology but on shaking up math curriculums that have gone unchanged since the Space Race. And advocates argue that schools that teach data science will equip students to navigate a world awash in information like never before.

“We feel like this is a good foundational education that prepares people for life,” says Jeffrey Severts, co-founder and executive director of the nonprofit Center for Radical Innovation for Social Change at the University of Chicago. The center houses Data Science for Everyone, an initiative that provides teaching resources on and promotes data science curriculum in K-12 schools.

Deadlines

FSF-funded call for white papers on philosophical and legal questions around Copilot

The Free Software Foundation has received numerous inquiries about our position on these questions. We can see that Copilot’s use of freely licensed software has many implications for an incredibly large portion of the free software community. Developers want to know whether training a neural network on their software can really be considered fair use. Others who may be interested in using Copilot wonder if the code snippets and other elements copied from GitHub-hosted repositories could result in copyright infringement. And even if everything might be legally copacetic, activists wonder if there isn’t something fundamentally unfair about a proprietary software company building a service off their work.

With all these questions, many of them with legal implications that at first glance may have not been previously tested in a court of law, there aren’t many simple answers. To get the answers the community needs, and to identify the best opportunities for defending user freedom in this space, the FSF is announcing a funded call for white papers to address Copilot, copyright, machine learning, and free software.

Deadline for submissions is August 23.

SPONSORED CONTENT

The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

Tools & Resources

What the Heck is a Data Mesh?!

Chris Riccomini

from June 08, 2021

I got sucked into a data mesh Twitter thread this weekend (it’s worth a read if you haven’t seen it). Data meshes have clearly struck a nerve. Some don’t understand them, while others believe they’re a bad idea. Yet, “Demystifying Data Mesh” and “Putting Data Mesh to Work” articles abound.

To understand the confusion, I re-read Zhamak Dehghani ’s original and follow-on posts. Zhamak is the creator of the data mesh. In her second post she identifies four data mesh principles:

Domain-oriented decentralized data ownership and architecture

Data as a product

Self-serve data infrastructure as a platform

Federated computational governance

Recognizing People in Photos Through Private On-Device Machine Learning

Apple, Machine Learning Research blog

from July 27, 2021

Recognizing people in libraries consists of two interwoven phases. One phase involves constructing a gallery of known individuals progressively as the library evolves. The second phase consists of assigning a new person observation to either a known individual in the gallery or declaring the observation as an unknown individual. The algorithms in both of these phases operate on feature vectors, also called embeddings, that represent a person observation.

[2103.05823] Fast and flexible: Human program induction in abstract reasoning tasks

arXiv, Computer Science > Human-Computer Interaction; Aysja Johnson, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis

from March 10, 2021

The Abstraction and Reasoning Corpus (ARC) is a challenging program induction dataset that was recently proposed by Chollet (2019). Here, we report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000). Although this subset of tasks contains considerable variation, our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 80% of tasks solved per participant, and with 65% of tasks being solved by more than 80% of participants. Additionally, we find interesting patterns of behavioral consistency and variability within the action sequences during the generation process, the natural language descriptions to describe the transformations for each task, and the errors people made. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution. Future modeling work could incorporate these findings, potentially by connecting the natural language descriptions we collected here to the underlying semantics of ARC.

Learning To Rank Restaurants

Swiggy, Ashay Tamhane

from May 05, 2021

Millions of customers order food via AI-powered restaurant recommendations shown on the Swiggy homepage. We typically observe a strong correlation between the quality of ranking and conversion rates. It is, therefore, crucial for us to constantly keep improving our ranking models in order to provide the best experience to our customers.

In a previous post, we described the evolution of our ranking models from a simple utility function to a GBT based ranking model. In this blog, we continue our journey towards building Deep Learning models for a personalized ranking of restaurants. While the pointwise GBT based ranking performed well in practice, it had a few limitations that we address in this blog

Sports.BradStenger.com

Data Science newsletter – August 4, 2021

Leave a Comment Cancel reply