Data Science newsletter – May 21, 2021

Newsletter features journalism, research papers and tools/software for May 21, 2021

 

Board of Trustees approve new projects, lower student health insurance costs

The Daily Illini student newspaper, Aliza Majid


from

The University of Illinois Board of Trustees approved a $52 million budget for the University’s Altgeld Hall renovation and Illini Hall replacement projects. The overall cost of the Altgeld and Illini Hall project is $192 million, and the State of Illinois decided to contribute $140 million for the project.

These two projects are part of the Illinois Innovation Network, a University network that consists of industry-based hubs throughout the state working toward innovative and economic development.

Altgeld Hall was opened in 1897 and was last renovated in 1956 and requires a new reconstruction along with a replacement for Illini Hall to accommodate the mathematics and statistics department. The new building replacing Illini Hall will include a data science center that will be a new hub for the IIN.


New data science minor to launch in fall 2021

University of North Carolina-Chapel Hill, News and Updates


from

The department of statistics and operations research, better known as STOR, will introduce a data science minor in fall 2021 that is designed to appeal to students majoring in a broad array of disciplines. The minor is an important component of the soon-to-launch data science initiative, a pan-University effort.


Machine Learning Accelerates Cosmological Simulations

Carnegie Mellon University, News


from

Cosmological simulations are an essential part of teasing out the many mysteries of the universe, including those of dark matter and dark energy. But until now, researchers faced the common conundrum of not being able to have it all — simulations could focus on a small area at high resolution, or they could encompass a large volume of the universe at low resolution.

Carnegie Mellon University Physics Professors Tiziana Di Matteo and Rupert Croft, Flatiron Institute Research Fellow Yin Li, Carnegie Mellon Ph.D. candidate Yueying Ni, University of California Riverside Professor of Physics and Astronomy Simeon Bird and University of California Berkeley’s Yu Feng surmounted this problem by teaching a machine learning algorithm based on neural networks to upgrade a simulation from low resolution to super resolution.


Detecting drones in cities is tough – Duke engineers say machine learning can help

WRAL TechWire, Ken Kingery


from

As drones become more popular and more worrisome from a security standpoint, many projects have sought to engineer systems to spot them. During his time as a Defense Advanced Research Projects Agency (DARPA) program manager, Jeffrey Krolik, professor of electrical and computer engineering at Duke University, launched one such project called “Aerial Dragnet.” Using a network of drones hovering above a cityscape or other large, developed area in need of defense, multiple types of sensors would peer down into the city’s canyons and pick out any drones. The project has recently successfully concluded with an urban test in Rossyln, Virginia, but challenges remain in discriminating drones from urban “clutter.”

Using a fleet of friendly drones to find enemy drones makes sense in a setting for a military unit that is trying to secure a wide urban area. However, in settings where protection of a fixed asset such as an embassy, hospital or encampment is the goal, a system that can maintain a perimeter from a safe stand-off distance is required. Once again funded by DARPA, Krolik is turning to radar, machine learning and specialized hardware to make a drone surveillance system with sufficient range to allow drones to be detected and stopped before they reach a protected area in a city.


Creative Destruction Lab Joins University of Washington’s Foster School of Business, Establishing CDL-Seattle

Business Wire, Creative Destruction Lab


from

Creative Destruction Lab (CDL), a nonprofit organization for massively scalable, seed-stage, science- and technology-based companies, based at the Rotman School of Management, University of Toronto, will launch its third U.S.-based location, CDL-Seattle this fall. Based at the UW’s Foster School of Business, CDL-Seattle will be a partnership with Microsoft Corporation, the UW College of Engineering, Paul G. Allen School of Computer Science & Engineering, and CoMotion, UW’s collaborative innovation hub. The area of focus for CDL-Seattle’s first stream is computational health.


CMU, R.K. Mellon Foundation Announce Historic Partnership

Carnegie Mellon University, News


from

Carnegie Mellon University and the Richard King Mellon Foundation today announced that the two long-time partners will together make a transformational investment in science and technology leadership at the university; in a more vibrant future for the Hazelwood neighborhood; and in Pittsburgh’s ongoing economic renaissance.

The Foundation has approved a $150 million grant to CMU — the largest single grant in the Foundation’s 74-year history. The first $75 million is the lead gift for a new cutting-edge science building on the CMU campus in the Oakland neighborhood of Pittsburgh. The second $75 million will seed a new robotics innovation center and an institute focused on advanced materials and manufacturing at Hazelwood Green — a former riverside steel mill site that will attract the jobs and industries of the new economy, while enabling development of a true mixed-use destination that will be a positive and inclusive part of Hazelwood.


CDS Professor Develops New Course Accessible Online to All: Mathematical Tools for Data Science

Medium, NYU Center for Data Science


from

CDS Assistant Professor of Data Science and Mathematics, Carlos Fernandez-Granda, has developed the course Mathematical Tools for Data Science. Contributors who participated in the development of the course include CDS visiting assistant professor Brett Bernstein and CDS PhD students Aakash Kaku, Sheng Liu, and Sreyas Mohan. The course provides an introduction to tools from several areas of mathematics such as linear algebra, Fourier analysis, probability theory, and convex optimization, which are useful in data science. Topics include covariance matrices, principal component analysis, linear regression, regularization, sparse regression, frequency representations, the short-time Fourier transform, wavelets, Wiener filtering, and convolutional neural networks. Course materials such as notes, slides, videos, and code are fully available on the CDS website.


Artificial intelligence has been of little use for diagnosing covid-19

New Scientist magazine, Comment, Michael Roberts


from

My colleagues and I looked at every such paper that was published between 1 January 2020 and 3 October 2020 and found that none of them produced tools that would be good enough to use in a clinical setting (Nature Machine Intelligence, doi.org/gjkjvw). Something has gone seriously wrong when more than 300 papers are published that have no practical benefit.

Our review found that there were often issues at every stage of the development of the tools mentioned in the literature. The papers themselves often didn’t include enough detail to reproduce their results.

Another issue was that many of the papers introduced significant biases with the data collection method, the development of the machine-learning system or the analysis of the results. For example, a significant proportion of systems designed to diagnose covid-19 from chest X-rays were trained on adults with covid-19 and children without it, so their algorithms were more likely to be detecting whether an X-ray came from an adult or a child than if that person had covid-19.


NIST Proposes Method for Evaluating User Trust in Artificial Intelligence Systems

Homeland Security Today


from

Every time you speak to a virtual assistant on your smartphone, you are talking to an artificial intelligence — an AI that can, for example, learn your taste in music and make song recommendations that improve based on your interactions. However, AI also assists us with more risk-fraught activities, such as helping doctors diagnose cancer. These are two very different scenarios, but the same issue permeates both: How do we humans decide whether or not to trust a machine’s recommendations?

This is the question that a new draft publication from the National Institute of Standards and Technology (NIST) poses, with the goal of stimulating a discussion about how humans trust AI systems. The document, Artificial Intelligence and User Trust (NISTIR 8332), is open for public comment until July 30, 2021.


Using AI to help find answers to common skin conditions

Google, The Keyword blog, Peggy Bui and Yuan Liu


from

Artificial intelligence (AI) has the potential to help clinicians care for patients and treat disease — from improving the screening process for breast cancer to helping detect tuberculosis more efficiently. When we combine these advances in AI with other technologies, like smartphone cameras, we can unlock new ways for people to stay better informed about their health, too.

Today at I/O, we shared a preview of an AI-powered dermatology assist tool that helps you understand what’s going on with issues related to your body’s largest organ: your skin, hair and nails. Using many of the same techniques that detect diabetic eye disease or lung cancer in CT scans, this tool gets you closer to identifying dermatologic issues — like a rash on your arm that’s bugging you — using your phone’s camera.


COVID-19 Point-of-Care Diagnostics: Present and Future

ACS Nano, Enrique Valera et al.


from

Point-of-care (POC) detection technologies that enable decentralized, rapid, sensitive, low-cost diagnostics of COVID-19 infection are urgently needed around the world. With many technologies approved for commercialization in the past 10 months, the field of COVID-19 POC diagnostics is rapidly evolving. In this Perspective, we analyze the current state of POC technologies for the diagnosis and monitoring of COVID-19 infection and discuss future challenges in COVID-19 diagnostics. As the COVID-19 pandemic becomes endemic, the advances gained during this past year will likely also be utilized for future prediction of emerging outbreaks and pandemics.


Controversial forestry experiment will be largest-ever in United States

Nature, News, Jeff Tollefson


from

If the project — proposed by DeLuca and other researchers at OSU — launches successfully, the newly created Elliott State Research Forest in southwestern Oregon would occupy a roughly 33,000-hectare parcel of land. This would be divided into more than 40 sections, in which scientists would test several forest-management strategies, some including extensive logging. The advisory committee for the project, which comprises environmentalists, hunters, loggers and members of local Indigenous tribes, approved the latest research proposal on 22 April.

The plan comes as US President Joe Biden and other international leaders are strengthening commitments to conserve land and biodiversity before a meeting of the United Nations Convention on Biological Diversity later this year. In time, the Elliott research forest could help policymakers to determine how best to define and implement those pledges, says DeLuca.


Making Mobile Applications Accessible with Machine Learning

Apple Machine Learning Research


from

We conducted a study to explore inferring accessibility for mobile apps from their onscreen pixels. We trained a robust, fast, memory-efficient, on-device model to detect UI elements using a dataset of manually collected and annotated screens from iPhone apps. To further improve UI detections and added semantic information, we introduced heuristics and additional models. For example, we grouped relevant UI elements together to save navigation time for screen reader users, and recognized whether a UI element is clickable.

From this study, we built the Screen Recognition feature, which we shared to the public in iOS 14. Screen Recognition uses machine learning and computer vision to automatically detect and provide content readable by VoiceOver for apps that would otherwise not be accessible.


How the Covid pandemic ends: Scientists look to the past to see the future

STAT, Helen Branswell


from

But there were no flu vaccines in 1918, when the world didn’t yet know that the great influenza was caused by a virus, H1N1. In 1957, when the H2N2 pandemic swept the world, flu vaccine was mainly a tool of the military. In the pandemic of 1968, which brought us H3N2, the United States produced nearly 22 million doses of vaccine, but by the time it was ready the worst of the pandemic had passed, and demand subsided. That “too little and too late” phenomenon played out again in 2009, when the world finally had the capacity to make hundreds of millions of doses of H1N1 vaccine; some countries canceled large portions of their orders because they ended up not needing them.

How did those pandemics end? The viruses didn’t go away; a descendent of the Spanish flu virus, the modern H1N1, circulates to this day, as does H3N2. Humans didn’t develop herd immunity to them, either. That’s a phenomenon by which a pathogen stops spreading because so many people are protected against it, because they’ve already been infected or vaccinated.


The social predictors of coronavirus vaccination rates

Axios, Caitlin Owens


from

Where you live, how educated you are, whether you have health insurance and whether you have access to the internet are all correlated with how likely you are to be vaccinated against the coronavirus.

Why it matters: None of these factors has anything to do directly with an individual’s risk. Instead, this emphasizes, yet again, the powerful role played by social determinants of health.


Deadlines



Overhead Geopose Challenge

“In this challenge, your goal is to make satellite imagery taken from a significant angle more useful for time-sensitive applications like disaster and emergency response. To take on the challenge, you will transform RGB images taken from a satellite to more accurately determine each object’s real-world structure or “geocentric pose”.” Deadline for submissions is July 19.

NIH Common Fund’s Bridge to Artificial Intelligence (Bridge2AI) program

“By bringing technological and biomedical experts together with social scientists and humanists, the Bridge2AI program will help bring solutions to this deficit by:

Generating new flagship biomedical and behavioral data sets that are ethically sourced, trustworthy, well-defined, and accessible [and]
Developing software and standards to unify data attributes across multiple data sources and across data types” Deadline for letters of intent is July 20.

SPONSORED CONTENT

Assets  




The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

 


Tools & Resources



reflow: A language and runtime for distributed, incremental data processing in the cloud

GitHub – grailbio


from

Reflow is a system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs. Reflow then evaluates these programs in a cloud environment, transparently parallelizing work and memoizing results. Reflow was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses.


Know Your Data: a new tool to explore datasets

Medium, People + AI Research


from

We are excited to announce the beta release of Know Your Data, a new tool to help researchers and product teams better understand datasets, improve data quality and mitigate bias issues.

We hope this tool can help the ML community explore, discuss and improve datasets and, ultimately, the ML models trained on that data. Low data quality stems from a range of issues, from incorrect labels to imbalance across attributes. It contributes to machine learning bias and fairness issues, and it can lead to cascading failures.


Meet the redesigned People + AI Guidebook

Medium, People + AI Research


from

Let’s start from the beginning: what is the People + AI Guidebook?

Rebecca: The People + AI Guidebook is a set of best practices and methods for designing products that use AI.

It’s something that grew out of our own work on PAIR designing AI products. A few years ago, we realized it would be really helpful to have a compendium of best practices for AI design that we could reference. When we couldn’t find anything out there that met our needs, we decided to make one and share it for people to use.


Google Cloud unveils Vertex AI, one platform, every ML tool you need

Google Cloud, Craig Wiley


from

Today at Google I/O, we announced the general availability of Vertex AI, a managed machine learning (ML) platform that allows companies to accelerate the deployment and maintenance of artificial intelligence (AI) models. Vertex AI requires nearly 80% fewer lines of code to train a model versus competitive platforms1, enabling data scientists and ML engineers across all levels of expertise the ability to implement Machine Learning Operations (MLOps) to efficiently build and manage ML projects throughout the entire development lifecycle.

Leave a Comment

Your email address will not be published.