Data Science newsletter – March 23, 2021

Newsletter features journalism, research papers and tools/software for March 23, 2021

 

Hugging Face triples investment in open source machine learning models

VentureBeat, Khari Johnson


from

Hugging Face launched in 2016 with a chatbot app designed to be your “AI friend.” Now the NLP company has more than 100,000 community members and is planning to triple its efforts and expand beyond language models into fields like computer vision. Developers have used a hub on Hugging Face to share thousands of models, and CEO and cofounder Clement Delangue told VentureBeat Hugging Face wants to become to machine learning what GitHub is to software engineering.

As part of that effort, Hugging Face closed a $40 million series B funding round today.


WHOOP Founder Will Ahmed On the Future of Fitness

InsideHook, Tanner Garrity


from

… Do you have any macro predictions for the fitness industry over the next decade?

Health improvement will increasingly become personalized. Mass-market health advice is BS. It’s not like everyone should do the keto diet. It’s not like everyone should do the paleo diet. It’s not like everyone should do 10,000 steps. It’s what’s right for you. What’s the recipe of things that’s right for you? That’s what’s pretty exciting about health monitoring. It can help you effectively A-B test these things about your body.


NUS and Johnson Controls embark on S$5 million research on smart building systems

EurekAlert! Science News, National University of Singapore


from

The National University of Singapore (NUS) and Johnson Controls, the global leader for smart, healthy and sustainable buildings, will embark on a joint collaboration on smart buildings research. Johnson Controls will commit about S$5 million into this research programme, and teams from both organisations will work together to address industry-wide challenges.

“As Singapore advances its whole-of-nation movement on sustainable development, NUS is ready to seize the plethora of opportunities through our digital innovation activities and strategic collaborations with industry to drive the future of urban solutions. This collaboration between NUS and Johnson Controls will combine the University’s expertise in the science of cities and sustainability with Johnson Controls’ industry knowledge to co-create people-centric smart building systems to bring about impactful and radical changes to the lives of people in Singapore and the world,” said Professor Chen Tsuhan, NUS Deputy President (Research & Technology).


NeurIPS2021 will be using @OpenReviewNet to manage submissions.

Twitter, hardmaru


from

They will still keep the review process private this year, but afterwards, accepted and (opted-in) rejected papers will be made public, along with their anonymous reviews and meta-reviews.


Academic “Ghost-Writing”: The Cheating Scandal No One Will Discuss

Logos, Steward Lawrence


from

By hiring outside ghost-writers, students seem to have found a way around this barrier. The work written for them is completely original – it’s just not their own.

The extent of student ghost-writing is truly astounding. There are now dozens of web sites that offer paper writing services for students at all levels – undergraduate, graduate and PhD. Some students use these services in a pinch, when they run out of time and can’t meet a paper deadline.

But many others have built ghost-writing into their entire university career, contracting one or more “hired guns” – or “pens” — for the bulk of their classes, and in some cases, having their supportive parents footing the bill.


Pandemic’s reach far outstripped official coronavirus case counts, study suggests

Los Angeles Times, Amina Khan


from

A new study that looks at the presence of coronavirus antibodies in nearly 62,000 life insurance applicants found that before the devastating holiday surge, the number of asymptomatic or undiagnosed coronavirus infections in the U.S. may have been twice as high as the official tally of cases overall.

The findings, published Tuesday in the journal JAMA Network Open, indicate that there were millions more asymptomatic and undiagnosed people than experts had realized — and suggests the pandemic could have been far more pervasive than it first appeared.

“The overall number of SARS-CoV-2 infections in the U.S. may be substantially higher than estimates based on public health case reporting,” the study authors wrote.


Mozilla leads push for FCC to reinstate net neutrality

CNBC, Laura Feiner


from

Firefox-maker Mozilla is leading a push for the Federal Communications Commission to swiftly reinstate net neutrality rules stripped away under the Trump administration.

In a letter to FCC Acting Chair Jessica Rosenworcel Friday, ADT, Dropbox, Eventbrite, Reddit, Vimeo and Wikimedia joined Mozilla in calling net neutrality “critical for preserving the internet as a free and open medium that promotes innovation and spurs economic growth.”


Is “bioinformatics” dead?

PLOS Biology, Philip Bourne


from

Now that I have your attention, clearly, bioinformatics as a field is very much alive. The name, however, no longer applies to what we actually do in the field. It is not what forward-thinking scientists should be calling themselves in this era of the fourth paradigm of data science [1], where data sharing lies at the core of biology. If you’re asking why anyone should care, let me explain.


As enrollment crashes, universities are reinventing themselves

EdScoop, Colin Wood


from

A year of pandemic life fueled a transformation of higher education that was already brewing before the health crisis landed. But with enrollment slipping, many institutions are being forced to reinvent themselves on the fly.

Community colleges have been particularly hard-hit, but undergraduate enrollment is sinking across higher education institutions broadly. The exodus has forced many universities and colleges to redesign their approaches to student outreach and recruitment, while rejiggering their courses for an online format. They’re doubling down on student engagement, attempting to keep their students tapped into supportive social circles while anticipating their needs and keeping them on track to graduate.

“It’s a lot more cost effective to try to retain the students you’ve already recruited than spending more money to try to recruit more,” Meredith Coughlin, the interim vice president of institutional technology at Indian River State College in southern Florida, told EdScoop earlier this year. Administrators at Indian River are using a new app, Rah Rah, to keep their students engaged in campus life — a daunting, but increasingly necessary objective in the social-distancing era.


First-of-its-kind study shows carbon emissions across the entire food production and supply chain

Anthropocene, Emma Bryce


from

Our food systems pump out one-third of global greenhouse emissions – 34% – every year, according to new research published in Nature Food. These results chime with the findings of several previous studies.

But what’s new about this most recent contribution is that it goes into intensive detail to build up a huge dataset that identifies the sources of those emissions across the whole food production and supply chain. For the first time, that’s enabled researchers to reliably pinpoint emissions by sector and country, revealing some surprising facts in the process – such as the rising contribution of refrigeration and plastic packaging on global emissions.

To produce their granular results, the scientists relied on two large, existing datasets: the Emissions Database of Global Atmospheric Research (EDGAR), produced by the European Commission Joint Research Centre; and data from the Food and Agricultural Organisation. Their primary reliance on EDGAR gave the new dataset its name: ‘EDGAR-FOOD’. The study covered the period between 1990 and 2015, and drew together estimates on several types of emissions across all forms of food production – crop and livestock farming, aquaculture and fishing. It also measured emissions across the entire food system, from land conversion and food production, to transport, processing, packaging, retail, and consumption beyond the farm gate.


UK Biobank makes data from significant study of telomere length available to researchers

UK Biobank


from

UK Biobank, the large-scale biomedical database and research resource, has today made available to approved researchers the data from a study into telomere length.

The study, conducted by Dr Veryan Codd, Dr Chris Nelson and Professor Sir Nilesh Samani and their team at the University of Leicester, working with Professor John Danesh at the University of Cambridge measured telomere length in almost all 500,000 UK Biobank participants to deduce why some older adults succumb to chronic disease while others do not. Telomeres are tiny pieces of DNA found on the end of each chromosome, which play a central role in cell death and are thought to be a good biomarker of biological ageing.


UMass Amherst’s College of Information & Computer Sciences Announces Online Master of Science Program in Computer Science

UMass Amherst, Office of News & Media Relations


from

The new program, offered through the College of Information & Computer Sciences (CICS) with support from the University Without Walls program, is designed with flexibility in mind to help accommodate the schedules of professionals who are looking to amplify their undergraduate training in computer science or a related field. “The field keeps changing,” says James Allan, professor and chair. “Artificial intelligence, cybersecurity and ethics: These are all extremely hot topics right now, and they weren’t 10 years ago.” The 30-credit program will be entirely online, taught by UMass’s award-winning faculty and marked by the rigor for which CICS is known.


Inflammatory profiles across the spectrum of disease reveal a distinct role for GM-CSF in severe COVID-19

Science Immunology; Malcolm G Semple, J Kenneth Baillie, Peter JM Openshaw et al.


from

While it is now widely accepted that host inflammatory responses contribute to lung injury, the pathways that drive severity and distinguish coronavirus disease 2019 (COVID-19) from other viral lung diseases remain poorly characterized. We analyzed plasma samples from 471 hospitalized patients recruited through the prospective multicenter ISARIC4C study and 39 outpatients with mild disease, enabling extensive characterization of responses across a full spectrum of COVID-19 severity. Progressive elevation of levels of numerous inflammatory cytokines and chemokines (including IL-6, CXCL10, and GM-CSF) were associated with severity and accompanied by elevated markers of endothelial injury and thrombosis. Principal component and network analyses demonstrated central roles for IL-6 and GM-CSF in COVID-19 pathogenesis. Comparing these profiles to archived samples from patients with fatal influenza, IL-6 was equally elevated in both conditions whereas GM-CSF was prominent only in COVID-19. These findings further identify the key inflammatory, thrombotic, and vascular factors that characterize and distinguish severe and fatal COVID-19. [full text]


When will it be over? 3 key numbers scientists are watching to track the pandemic

NBC News, Erika Edwards


from

A variety of metrics could indicate that the country is getting the pandemic under control, including the number of new daily cases and the number of deaths. But epidemiologists say three specific metrics are required to get a clear understanding of the country’s overall response: the number of Covid-19-related hospitalizations, the virus’s so-called reproduction number and the number of vaccinations administered.

Hospitalizations


What the $28 million Series B means for OctoML and for accelerating ML deployment

Medium, OctoML, Luis Ceze


from

90% of machine learning models don’t make it to production

While there has been significant progress in techniques for data management and machine learning model creation, there is still a significant gap between building a model and making that model production-ready.

< Between rapidly evolving ML models, ML frameworks and a Cambrian explosion of hardware backends, it becomes incredibly difficult to ensure that your model deployment works as expected. It is not easy to make sure your model runs fast enough and to benchmark it across different deployment hardware. Even if your determined machine learning team has hurtled through this gauntlet they still have to go through a whole different set of challenges to package and deploying at the edge.


Events



Virtual Public Meeting – Patient-Generated Health Data Throughout the Total Product Life Cycle of Medical Devices

U.S. Food and Drug Association


from

Online May 4, starting at 10 a.m. Eastern. “The purpose of this meeting is to discuss patient-generated health data (PGHD) and its potential impact across the healthcare ecosystem.” [registration required]


Equity & AI CAIS Symposium

USC Center for Artificial Intelligence in Society


from

Online April 15-16. “The CAIS Symposium on Equity and AI is intended to be a forum for learning and collaboration among social scientists, computer and data scientists, engineers, service organizations, and community members. We believe such a multitude of perspectives is needed to begin to understand and ultimately address the ways in which AI systems create, enhance, and perpetuate biases that impact marginalized and underrepresented groups.” [registration required]


NULab Spring Conference: Data and Social Justice

Twitter, NULab


from

Online April 9, starting at 10:45 a.m. Eastern. “The keynote address will be delivered by Patricia Williams, University Distinguished Professor of Law and Humanities at @Northeastern.” [rsvp required]


Deadlines



Berkeley Computational Social Science Training Program: Call for Fellowship Nominations

“UC Berkeley’s Computational Social Science Training Program (CSSTP) announces its 2021 call for nominations for the second cohort of CSSTP Fellows.” Deadline for nominations is April 15.

The inaugural ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’21)

Online October 5-8. The conference “aims to highlight work where techniques from algorithms, optimization, and mechanism design, along with insights from the social sciences and humanistic studies, can help improve equity and access to opportunity for historically disadvantaged and underserved communities.” Deadline for research track submissions is June 3.

Harvey J. Greenberg Research Award

“The Harvey J Greenberg Research Award is a recently established annual award that honors research excellence in the field of computation and operations research applications, especially those in emerging application fields.” Deadline for nominations is June 15.

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



LEAF: A Learnable Frontend for Audio Classification

Google AI Blog, Neil Zeghidour


from

In “LEAF, A Fully Learnable Frontend for Audio Classification”, accepted at ICLR 2021, we present an alternative method for crafting learnable spectrograms for audio understanding tasks. LEarnable Audio Frontend (LEAF) is a neural network that can be initialized to approximate mel filterbanks, and then be trained jointly with any audio classifier to adapt to the task at hand, while only adding a handful of parameters to the full model. We show that over a wide range of audio signals and classification tasks, including speech, music and bird songs, LEAF spectrograms improve classification performance over fixed mel filterbanks and over previously proposed learnable systems. We have implemented the code in TensorFlow 2 and released it to the community through our GitHub repository.


Understanding Open Data – Everything you need to know about making your research data open and FAIR

Faculty of 1000, F1000Research


from

“On this page, we’ll walk you through the what, why, and how of data sharing, shining a light on how open data can help you and your research community.”

Leave a Comment

Your email address will not be published.