Data Science newsletter – August 19, 2021

Newsletter features journalism, research papers and tools/software for August 19, 2021

 

UGA football facilities: The $175 million upgrades

AJC.com, The Atlanta Journal-Constitution, Tim Tucker


from

The University of Georgia Athletic Association has spent $175 million on more than 350,000 square feet of football facilities projects since Kirby Smart was hired as head coach in December 2015. Here’s a breakdown of what the Bulldogs have gotten for all of that money


Were last season’s @NFL and @NCAA football games associated with substantial county-level COVID-19 cases?

Twitter, Jeep Wrangler Owner, Zach Binney


from

I have 3 thoughts here that I’ll put in a brief thread:

1. My personal inclination is to think this paper’s conclusions are correct. Limited capacity (10-30%), mostly-outdoor games pre-delta probably didn’t contribute to large amounts of COVID spread. (1/4)


The controversy over the term ‘citizen science’

CBC News, What on Earth?


from

Recently, however, some large institutions, scientists and community members have proposed replacing the term citizen science with “community science.”

Those in favour of the terminology change — such as eBird, one of the world’s largest biodiversity databases — say they want to avoid using the word citizen. They do so because they want to be “welcoming to any birder or person who wants to learn more about bird watching, regardless of their citizen status,” said Lynn Fuller, an eBird spokesperson, in a news release earlier this year.

Some argue that while the intention is valid, the term community science already holds another definition — namely projects that gather different groups of people around environmental justice focused on social action.


Can Twitter data help in spotting problems early with publications? What retracted COVID-19 papers can teach us about science in the public sphere

Impact of Social Science blog, Lutz Bornmann and Robin Haunschild


from

The COVID-19 pandemic has brought science into mainstream public and political debates in novel ways, notably through the widespread use of social media to share and discuss new findings. In this post, Robin Haunschild and Lutz Bornmann discuss their recent findings on how retracted papers were talked about on the social media platform Twitter and how this can be mapped onto the eventual retraction notices of these articles. They suggest that exploring how Twitter and other forms of open post-publication peer review highlight potential errors in the scientific record might serve as an early warning mechanism for spotting research errors.


John Deere acquires Bear Flag Robotics to accelerate autonomous farm tech

GPS World, Tracy Cozzens


from

Deere & Company has signed a definitive agreement to acquire Bear Flag Robotics for $250 million USD. Founded in 2017, Bear Flag is a Silicon Valley-based startup that develops autonomous-driving technology compatible with existing machines.

The deal accelerates the development and delivery of automation and autonomy on the farm and supports John Deere’s long-term strategy to create smarter machines with advanced technology to support individual customer needs.

Deere first started working with Bear Flag in 2019 as part of the company’s Startup Collaborator program, an initiative focused on enhancing work with startup companies whose technology could add value for Deere customers. Since then, Bear Flag has successfully deployed its autonomous solution on a limited number of farms in the United States.


Selected Readings on Data Responsibility and Cash Transfers for Children

New York University, GovLab RD4C, Constanza Vidal Bustamante and Emil Verhulst


from

Throughout the development and deployment of these programs, and especially given the increased adoption of digital tools for their implementation, key stakeholders such as humanitarian organizations and financial service providers collect, store, use, and share vast amounts of sensitive personal data on children and their families. Data and data analyses can help stakeholders better understand beneficiaries’ needs, enhance program efficiency, and evaluate success. However, the collection of large and sensitive datasets also poses important risks regarding the beneficiaries’ privacy and the potential misuse of data by third parties, which can be all the more sensitive in the context of minors. Irresponsible handling of data can not only erode the trust between program recipients and program leaders and jeopardize the program’s success–it can also threaten children’s safety, well-being, and development.

These selected readings on Data Responsibility and Cash Transfers for Children comprise two main sections: 1) Data responsibility considerations and best practices in cash transfer programs for children, and 2) Leveraging data and data analyses to assess the impact of cash transfers for children.


The Digital Twin Opportunity

MIT Sloan Management Review, Pushkar P. Apte and Costas J. Spanos


from

Half a century ago, NASA’s moon shot landed Neil Armstrong and Buzz Aldrin on the moon and set fire to our imaginations. Technology innovations from the program went on to seed entire industries, including microelectronics, software, and communications, which now form the backbone of our digital century. Another innovation was that NASA built and maintained a physical twin of the spacecraft on the ground so that it could troubleshoot problems without risk to the mission. This proved crucial during the troubled Apollo 13 mission and helped NASA bring the astronauts home safely. This basic concept has now evolved into the use of digital twins, or DTs — still twins, but built and maintained in the digital rather than physical realm. Fundamentally, a DT is a dynamic model of a physical system that enables fast and creative experimentation at very low cost and risk.

DTs have already been used in specialized, complex applications like observing and modeling the operation of an aircraft engine or manufacturing equipment. These initial DT deployments were tactical, mainly for data visualization and product life cycle management. But now, thanks to a confluence of technological advances, DTs are at an intriguing inflection point — transitioning from that specialized, tactical domain to becoming strategic tools with diverse applications.


The Shadowy Business of International Education

The Walrus (Canada), Nicolas Hune-Brown


from

Students like Kushandeep have complicated the usual picture of international study. The 2000s-era stereotype of the pampered young foreigner, usually from mainland China, who drives flashy sports cars and shops for Gucci bags between classes was always a caricature, but now it’s entirely divorced from reality. In 2019, 34 percent of the more than 642,000 international students in Canada were from India, well ahead of China’s 22 percent.

Many of these students are from Punjab, and they generally attend small community colleges, not internationally renowned universities. A recent study by Rakshinder Kaur and Kamaljeet Singh, professors of education at Punjabi University Regional Centre, surveyed students attending an English-language training school in preparation to study abroad: 80 percent came from farming families, most from small farms. When asked where they wanted to study, 78 percent said Canada. Mortgaging land to cover tuition has become common, with more and more families literally selling the farm to send their children to community colleges.


Deep learning and citizen science enable automated plant trait predictions from photographs

Nature, Scientific Reports; Christopher Schiller, Sebastian Schmidtlein, Coline Boonman, Alvaro Moreno-Martínez & Teja Kattenborn


from

Plant functional traits (‘traits’) are essential for assessing biodiversity and ecosystem processes, but cumbersome to measure. To facilitate trait measurements, we test if traits can be predicted through visible morphological features by coupling heterogeneous photographs from citizen science (iNaturalist) with trait observations (TRY database) through Convolutional Neural Networks (CNN). Our results show that image features suffice to predict several traits representing the main axes of plant functioning. The accuracy is enhanced when using CNN ensembles and incorporating prior knowledge on trait plasticity and climate. Our results suggest that these models generalise across growth forms, taxa and biomes around the globe. We highlight the applicability of this approach by producing global trait maps that reflect known macroecological patterns. These findings demonstrate the potential of Big Data derived from professional and citizen science in concert with CNN as powerful tools for an efficient and automated assessment of Earth’s plant functional diversity. [full text]


Georgia State Receives $1.3 Million to Study Causal Relationships in the Brain

Georgia State University, Georgia State News Hub


from

“We will take data collected at different speeds by different modalities and combine it to reveal more about how brain regions influence each other,” said [Sergey] Plis. “For example, we can take slow modality like fMRI and learn causal information at faster neural scale, and then fuse it with what we learn from MEG or EEG. By combining them, one could partially correct the other.”

In addition to providing scientists with a new set of methodological tools, the project will advance scientific knowledge about the neural bases of diseases. The team plans to apply their models to schizophrenia, which is considered a disorder of “disconnectivity.”


Collaboratory at Columbia: An Aspen Grove of Data Science Education

Harvard Data Science Review; Isabelle A. Zaugg, Patricia J. Culligan, Richard Witten, and Tian Zheng


from

The novelty of the Collaboratory lies in its crowd-sourcing approach to creating new data science pedagogy and its ability to kindle transdisciplinary collaboration in doing so. By offering seed funding, it fosters proactive efforts to embed data science “in context” into more traditional domains through a cohort of compelling, transdisciplinary, crowd-sourced data science education proposals each year. Collaboratory educational offerings are required to be developed through a partnership between two faculty members, a data scientist and a domain expert from another field, or a larger team with complementary expertise. Over the past 5 years, the Collaboratory has supported the development of a wide spectrum of data science pedagogical models spread across more than 40 academic departments, centers, institutes, and professional schools at Columbia University. As a result, the Collaboratory has to date served the learning needs of more than 4,000 students. Furthermore, it has cultivated a thriving ecosystem that includes a funding mechanism and a community-support structure that all contribute to its agility and success. Here, we offer our experience and best practices in developing and managing the Collaboratory, which, we hope, will contribute to a blueprint for data science education leaders everywhere.


I used to teach this finding to my MBA students. It has more than 400 citations on Google Scholar. It’s about dishonesty. Turns out it’s fraudulent.

Twitter, Joe Simmons


from

“Evidence of Fraud in an Influential Field Experiment About Dishonesty” https://datacolada.org/98


Oceanographer Uses Systems Argo and GO-SHIP to Track Ocean Temperature

EarthCube, Emma Smith


from

According to UC San Diego Assistant Professor of Physical Oceanography Sarah Purkey, over 90 percent of the total human-caused warming in the Earth system ends up in the ocean.

“The common misconception that global warming only impacts the atmosphere leads many to believe the issue is not as pressing as it really is,” said Purkey. “In reality, the warming that we experience living in the troposphere is only the tip of the iceberg. Most of the heat ends up in the ocean. This warming can impact ecosystems that are essential to the absorption of carbon dioxide from the atmosphere to the ocean, dampening the ocean’s ability to take up carbon dioxide, just causing the cycle to go on and on.”

To address this time sensitive issue, Purkey recently used National Science Foundation-funded systems Argo and GO-SHIP to study how drastically the ocean temperature is changing.


Changes in health promoting behavior during COVID-19 physical distancing: Utilizing wearable technology to examine trends in sleep, activity, and cardiovascular indicators of health

PLOS One, Emily R. Capodilupo and Dean J. Miller


from

The COVID-19 pandemic incited unprecedented restrictions on the behavior of society. The aims of this study were to quantify changes to sleep/wake behavior and exercise behavior, as well as changes in physiological markers of health during COVID-19 physical distancing. A retrospective analysis of 5,436 US-based subscribers to the WHOOP platform (mean age = 40.25 ± 11.33; 1,536 females, 3,900 males) was conducted covering the period from January 1st, 2020 through May 15th, 2020. This time period was separated into a 68-day baseline period and a 67-day physical distancing period. To provide context and allow for potential confounders (e.g., change of season), data were also extracted from the corresponding time periods in 2019. As compared to baseline, during physical distancing, all subjects fell asleep earlier (-0.15 hours), woke up later (0.29 hours), obtained more sleep (+0.21 hours) and reduced social jet lag (-0.13 hours). Contrasting sleep behavior was seen in 2019, with subjects falling asleep and waking up at a similar time (-0.01 hours; -0.03 hours), obtaining less sleep (-0.14 hours) and maintaining social jet lag (+0.06 hours) in corresponding periods. Individuals exercised more intensely during physical distancing by increasing the time spent in high heart rate zones. In 2020, resting heart rate decreased (-0.90 beats per minute) and heart rate variability increased (+0.98 milliseconds) during physical distancing when compared to baseline. However, similar changes were seen in 2019 for RHR (-0.51 beats per minute) and HRV (+2.97 milliseconds), suggesting the variation may not be related to the introduction of physical distancing mandates. The findings suggest that individuals improved health related behavior (i.e., increased exercise intensity and longer sleep duration) during physical distancing restrictions. While positive changes were seen to cardiovascular indicators of health, it is unclear whether these changes were a direct consequence of behavior change.


Apple Aims to Push More Patient Data to Doctors. But Who Can Gauge Its Impact on Health?

Kaiser Health News, Sarah Kwon


from

Some see great promise in building “pipes” between a patient’s phone and the health records viewed by their clinicians. Apple is “democratizing the flow of health data” between doctors and patients, said Anil Sethi, a former Apple health director and current CEO of Ciitizen, a startup that manages health data for cancer patients.

But Apple’s announcement was shrouded in ambiguity and short on particulars. The company would not provide a complete list of the data patients can share with doctors and declined to comment for this article. Previous Apple moves to get more data into the hands of doctors have been announced with great fanfare, but questions remain as to how many health care providers are using the data and to what effect, and whether success stories are the norm or outliers. To date, rigorous studies showing clear health benefits from monitoring these types of data remain limited.

Although Apple has built pipes enabling patients to share growing amounts of data with medical professionals, it’s unclear how much data flows through them.

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



A future for SQL on the web

James Long


from

I discovered something absurd recently and I’m very excited to tell you about it.

The end result is absurd-sql, and it’s a persistent backend for SQLite on the web. That means it doesn’t have to load the whole db into memory, and writes persist. In this post I will explain the absurdities of the web’s storage APIs (mainly IndexedDB), show how SQLite provides a 10x perf improvement, explain all the cool tricks that make it work, and explain the locking/transactional semantics that make it robust.


SoundStream: An End-to-End Neural Audio Codec

Google AI Blog, Neil Zeghidour, and Marco Tagliasacchi


from

Earlier this year, we released Lyra, a neural audio codec for low-bitrate speech. In “SoundStream: an End-to-End Neural Audio Codec”, we introduce a novel neural audio codec that extends those efforts by providing higher-quality audio and expanding to encode different sound types, including clean speech, noisy and reverberant speech, music, and environmental sounds. SoundStream is the first neural network codec to work on speech and music, while being able to run in real-time on a smartphone CPU. It is able to deliver state-of-the-art quality over a broad range of bitrates with a single trained model, which represents a significant advance in learnable codecs.


Careers


Tenured Faculty

Assistant Professor, Media Studies



University of Chicago, Department of Cinema and Media Studies; Chicago, IL
Tenured and tenure track faculty positions

Tenure-Track Professor in Sociology



Harvard University, Faculty of Arts and Sciences; Cambridge, MA

Leave a Comment

Your email address will not be published.