Data Science newsletter – April 7, 2021

Newsletter features journalism, research papers and tools/software for April 7, 2021

 

Measuring racism and discrimination in economic data

The Brookings Institution, Randall Akee and Marcus Casey


from

Although researchers in economics are increasingly cognizant that race and ethnicity are key determinants of economic outcomes, credibly assessing potential causes and identifying solutions is often complicated by the lack of high-quality data. The typical economist’s work primarily focuses on proposing relationships and testing for causal mechanisms across a broad set of economic phenomena. The study of race and the consequences of race in market interactions have long been hampered by the relative lack of longitudinal data collected on relevant markers of discrimination, racism, and related long-term outcomes.

In recognition of these limitations, the American Economic Association Committee on Economic Statistics, in collaboration with the National Economic Association, convened an expert panel comprised of academics, statistical agency officials and non-profit organizations at a
special joint session at the Allied Social Sciences Associations 2021 meeting in January. Their principal goal was to assess the state of federal statistics data and its ability to document key racial disparities. The participants also discussed challenges to access and collect administrative data, possibly linked across agencies. These types of data may help facilitate new research and identify where the most pressing problems persist and how they can be addressed. The panel also discussed a plan of action to expand resources to this type of data collection and sharing.


What the CSPC Has to Say About Artificial Intelligence

National Law Review, Foley & Lardner, Kristin M. McGaver and Erik K. Swanholt


from

American households are increasingly connected internally through the use of artificially intelligent appliances.1 But who regulates the safety of those dishwashers, microwaves, refrigerators, and vacuums powered by artificial intelligence (AI)? On March 2, 2021, at a virtual forum attended by stakeholders across the entire industry, the Consumer Product Safety Commission (CPSC) reminded us all that it has the last say on regulating AI and machine learning consumer product safety.


NUS offers new Masters and PhD programmes in FinTech to boost digital finance research talent and capabilities in Singapore

National University of Singapore, NUS News


from

The National University of Singapore (NUS) will offer two new graduate programmes in digital financial technology (FinTech) in the new academic year, to help build a robust ecosystem of high-quality research talent and capabilities to support the fast-growing financial industry in Singapore. The new Masters and PhD programmes are under the Asian Institute of Digital Finance (AIDF) at NUS, a university-level institute jointly founded by the Monetary Authority of Singapore (MAS), the National Research Foundation Singapore (NRF) and NUS. The PhD programme, in particular, is Singapore’s first and only doctoral programme in FinTech.


In major copyright battle between tech giants, SCOTUS sides w/ Google over Oracle, finding that Google didnt commit copyright infringement when it reused lines of code in its Android operating system. The code came from Oracle’s JAVA SE platform.

Twitter, SCOTUSblog


from

The court finds that Google’s use of the lines of code was “fair use” under the copyright laws. Justice Breyer delivered the opinion the of court joined by the Chief, Sotomayor, Kagan, Gorsuch, and Kavanaugh. Justice Thomas dissented and Alito joined. Barrett didn’t participate.


Tufts research lab aids states with redistricting process

Tufts University, The Tufts Daily student newspaper, Matt Chen


from

As state legislatures across the country prepare to undergo the decennial process of redrawing congressional districts, the MGGG Redistricting Lab at Tufts University has become an invaluable resource for many states.

The MGGG Redistricting Lab, which is affiliated with the Jonathan M. Tisch College of Civic Life, grew out of an informal research collective called the Metric Geometry and Gerrymandering Group. The lab is led by Moon Duchin, an associate professor in the mathematics department and director of the Science, Technology, and Society program at Tufts.

“As a lab, we have researchers, several with PhDs in different fields, like math, computer science and geography … I would say the lab is devoted to data science for civil rights,” Duchin said.


Ahead of new report, Yale faculty salaries rank low compared to peer institutions

Yale University, Yale Daily News student newspaper, Rose Horowitch


from

While Yale’s academic departments frequently snag the top spots in national rankings, salaries in the Faculty of Arts and Sciences have fallen behind Yale’s peers. The gap is widest in the science and engineering departments, where Yale has lost considerable ground over the last decade.

Every two years, a Committee on the Economic Status of Faculty assesses how faculty salaries at the University compare to Yale’s peers. CESOF plans to release its report later this spring, but its last report published in 2018 showed that since 2008, Yale’s average full-professor salaries in the biological sciences, physical sciences and engineering had fallen by seven percent relative to the University’s peers, compared to a loss of three and four percent in the humanities and social sciences, respectively. Assistant professors’ salaries had also fallen by about seven percent compared to Yale’s peers.


An artful study of cellular development in leaves

Stanford University, Stanford News


from

Researchers tracked 20,000 cells to find out what made each one unique and to learn how the cells coordinate to build a leaf. Collaboration with a visual artist added additional dimensions to the work.


Covid has been a catastrophe. Can it also be an opportunity?

Financial Times, Tim Harford


from

I recently wrote about a brief strike on the London Underground in 2014 which provoked many commuters to find new routes to work. Quite simply, when our old solutions are closed off, we find new ones. Sometimes the new ones would have been better all along.

This is perfectly well understood by computer scientists. Algorithms created to solve problems such as scheduling deliveries or designing computer chips tend to deploy random shocks to what would otherwise be a search for incremental improvements. Without the randomness, the algorithm gets stuck. So do we.

We can all imagine ways in which Covid-19 might prompt the same fresh thinking, most obviously in using the internet — at last! — to replace grinding, costly and time-consuming travel.


Yale researchers dig into EHR use data – and gender differences

Yale University, YaleNews


from

While the shift to electronic health records (EHR) in the medical profession was supposed to improve the quality and efficiency of healthcare for doctors and patients alike, many physicians have given the technology low grades.

A new Yale study digs into the data on how physicians are using EHRs, including how time spent using the systems differs by specialty, and what these findings reveal about how the technology can be improved.

One of the key findings is that female physicians spend an average of 30 minutes more per day using EHRs than their male colleagues. The results were published April 5 in the Journal of the American Medical Informatics Association. A related article appeared the same day in JAMA Network.


UCLA Researchers Create All-Optical Diffractive Deep Neural Network That is 3D Printed

Photonics.com, Vision Spectra


from

Using a 3D printer, a research team at the UCLA Samueli School of Engineering has created an artificial neural network that can analyze large volumes of data and identify objects at the speed of light. Called a diffractive deep neural network (D2NN), the technology uses the light scattering from an object to identify it. The technology is based on a deep learning-based design of passive diffractive layers that work collectively.

The team created a computer-simulated design, then used a 3D printer to create thin, 8 cm-sq polymer wafers. Each wafer was created with uneven surfaces to help diffract light coming from an object.


AI conferences use AI to assign papers to reviewers

Science, Matthew Hutson


from

This system is under strain: In the past 5 years, submissions to large AI conferences have more than quadrupled, leaving organizers scrambling to keep up. One example of the workload crush: The annual AI Conference on Neural Information Processing Systems (NeurIPS)—the discipline’s largest—received more than 9000 submissions for its December 2020 event, 40% more than the previous year. Organizers had to assign 31,000 reviews to about 7000 reviewers. “It is extremely tiring and stressful,” says Marc’Aurelio Ranzato, general chair of this year’s NeurIPS. “A board member called this a herculean effort, and it really is!”

Fortunately, they had help from AI. Organizers used existing software, called the Toronto Paper Matching System (TPMS), to help assign papers to reviewers. TPMS, which is also used at other conferences, calculates the affinity between submitted papers and reviewers’ expertise by comparing the text in submissions and reviewers’ papers. The sifting is part of a matching system in which reviewers also bid on papers they want to review.

But newer AI software could improve on that approach. One newer affinity-measuring system, developed by the paper-reviewing platform OpenReview, uses a neural network—a machine learning algorithm inspired by the brain’s wiring—to analyze paper titles and abstracts, creating a richer representation of their content.


An artificial intelligence algorithm has created “new” Jimi Hendrix, Nirvana songs

Guitar World, Richard Bienstock


from

We’ve heard AI-generated songs mimic the work of AC/DC, Metallica and more. Now artificial intelligence software has generated “new” Jimi Hendrix and Nirvana tracks, along with other artists and bands with members who died at the age of 27, to help raise awareness for the importance of mental health support amongst musicians and members of the music industry.

The Hendrix song, You’re Gonna Kill Me, and the Nirvana track, Drowned In the Sun, are part of a new project by the Toronto-based organization, Over the Bridge, which has put together a compilation, all created via artificial intelligence, in the style of musicians who died at the age of 27.

The release, titled Lost Tapes of the 27 Club, also features songs in the style of the Doors and Amy Winehouse, all made through Google’s AI program Magenta, which analyses an artist’s previous work in order to learn how to compose like them. An additional AI program was used to create the lyrics.


Artificial intelligence has advanced so much, it wrote this article

MarketWatch, Jurica Dujmovic


from

I did not write this article.

In fact, it wasn’t written by any person. Every sentence you see after this introduction is the work of OpenAI’s GPT-3, a powerful language-prediction model capable of composing sequences of coherent text. The only thing I did was provide it with topics to write about. I did not even fix its grammar or spelling.

According to OpenAI, more than 300 applications are using GPT-3, which is part of a field called natural language processing. An average of 4.5 billion words are written per day. Some say the quality of GPT-3’s text is as good as that written by humans.


How digital outreach informed Reveal’s 2020 census reporting

Reveal News, Byard Duncan


from

As the 2020 count’s in-person operation kicked into gear after months of pandemic-related delays, Carless and community engagement producer David Rodriguez heeded her advice. They soon exposed a maddening pattern of technological glitches, equipment shortages and error messages. It all added up to “an atmosphere of desperation and despondency” among enumerators, they wrote.

“We’ve all started calling it ‘The Senseless,’ ” one source explained. “What I have been telling people who I’m training is, ‘You really have to have zero expectations if you want to work here.’ ”

The story struck a nerve with other census workers, ushering in waves of new complaints from across the country. To help capture them, Reveal’s engagement and collaborations team built a web form and shared it across our social media channels, on our podcast and among sources we’d already developed. Many enumerators and supervisors who filled it out described familiar problems with their government-issued technology. But others shared more troubling patterns: poor training, repetitive work and pressure to close cases early.


The Foundations of AI Are Riddled With Errors

WIRED, Business, Will Knight


from

The current boom in artificial intelligence can be traced back to 2012 and a breakthrough during a competition built around ImageNet, a set of 14 million labeled images.

In the competition, a method called deep learning, which involves feeding examples to a giant simulated neural network, proved dramatically better at identifying objects in images than other approaches. That kick-started interest in using AI to solve different problems.

But research revealed this week shows that ImageNet and nine other key AI data sets contain many errors. Researchers at MIT compared how an AI algorithm trained on the data interprets an image with the label that was applied to it. If, for instance, an algorithm decides that an image is 70 percent likely to be a cat but the label says “spoon,” then it’s likely that the image is wrongly labeled and actually shows a cat. To check, where the algorithm and the label disagreed, researchers showed the image to more people.


Events



The University of Pittsburgh will be hosting the 50th Anniversary Decentralization Conference with the theme of Mechanism Design for Vulnerable Populations. conference is open to all!

University of Pittsburgh


from

Online April 15-17. “Local Organizer @seralinardi
has put together a fabulous array of stand alone sessions – you can selectively register for each.”


Deadlines



CRA/CCC Announces CIFellows 2021 Program

“The Computing Research Association (CRA) and Computing Community Consortium (CCC) are pleased to announce a new Computing Innovation Fellows (CIFellows) cohort for 2021. This program recognizes the continued disruption to hiring in academic institutions due to the COVID-19 pandemic.” … “We anticipate applications being due around early May 2021.”

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

Data Science Journal, Research Data Alliance Results; Jens Klump , Lesley Wyborn, Mingfang Wu, Julia Martin, Robert R. Downs, Ari Asmi


from

Although the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available. In this work, we analysed 39 use cases and current practices of data versioning across 33 organisations. We noticed that the term ‘version’ was used in a very general sense, extending beyond the more common understanding of ‘version’ to refer primarily to revisions and replacements. Using concepts developed in software versioning and the Functional Requirements for Bibliographic Records (FRBR) as a conceptual framework, we developed six foundational principles for versioning of datasets: Revision, Release, Granularity, Manifestation, Provenance and Citation. These six principles provide a high-level framework for guiding the consistent practice of data versioning and can also serve as guidance for data centres or data providers when setting up their own data revision and version protocols and procedures.


Careers


Full-time, non-tenured academic positions

Visiting Assistant Professor of Computer Science



Haverford College, Department of Computer Science; Haverford, PA
Postdocs

Postdoctoral Fellow in Motion Tracking, Wearable Sensors, Estimation, Machine Learning



Harvard University, Harvard Biodesign Lab; Cambridge, MA

Leave a Comment

Your email address will not be published.