Researchers are eager to tap into the steadily expanding pool of health information collected from users by products like Fitbit, Clue, and the Apple Watch. But while these datasets could be a scientific treasure trove for scientists, they also pose logistical and ethical challenges that need to be addressed.
“There are huge opportunities. I think that’s the attraction,” says Ida Sim, director of digital health for the Division of General Internal Medicine at the University of California, San Francisco. Sim explains that part of the appeal for scientists is that the apps and tech are designed to appeal to the general public. A commerheractive interface is primed for long-term use by far more people than can usually be included in a research study, and people actually use them. “As opposed to a clunky research wristband, which is ugly, and people won’t wear it,” she says.
In a new publication in Nature, University of Utah chemists Jolene Reid and Matthew Sigman show how analyzing previously published chemical reaction data can predict how hypothetical reactions may proceed, narrowing the range of conditions chemists need to explore. Their algorithmic prediction process, which includes aspects of machine learning, can save valuable time and resources in chemical research.
“We try to find the best combination of parameters,” Reid says. “Once we have that we can adjust features of any reaction and actually predict how that adjustment will affect it.”
Stony Brook scientists have received major NIH funding to investigate a pioneering approach to fetal monitoring that could improve outcomes in the delivery room.
A multi-disciplinary team led by the College of Engineering and Applied Sciences (CEAS) has received $3.2 million under the National Institutes of Health (NIH) Research Project Grant Program (RO1) to investigate machine learning methods for classification of intrapartum signals (FHR and uterine activity) that has the potential to significantly outperform the accuracy of contemporary methods.
In 2014, researchers from the University of Washington announced that pairing Google StreetView with a cluster of “smart” surveillance cameras allowed them to create “a self-organized and scalable multiple-camera tracking system that tracks humans across the cameras.”
In so many words, they showed that it was possible to build a dynamic, near real-time visualization of pedestrians and traffic flows, projected onto a 360-degree map of the world. A bit of machine-learning software helped erase any seams. This was an early proof of concept in an urban setting of a technological model now known as a “digital twin.”
“Digital twin” is a creepy-sounding phrase, conjuring visions of pixelated doppelgangers haunting your every step. It doesn’t necessarily describe an all-out surveillance state, though: In some ways, this is an extension of the 3-D computer models that architects and engineers use to help plan a building, or maneuver the inner workings of a car engine before they hit the factory.
Orlando’s two-phase pilot with Amazon to try out real-time facial recognition software ended Thursday, capping 15 months of technical lags, bandwidth issues and uncertainty over whether the controversial face-scanning technology actually works.
The city’s second pilot phase with the software, called Rekognition, began in October 2018 using four cameras at the police department’s headquarters, three in downtown and one outside a community recreation center. The technology is designed to automatically identify and track suspects in real-time using facial recognition algorithms.
“At this time, the city was not able to dedicate the resources to the pilot to enable us to make any noticeable progress toward completing the needed configuration and testing,” Orlando’s Chief Administrative Office said in a memo to City Council, adding that the city has “no immediate plans regarding future pilots to explore this type of facial recognition technology.”
Having trawled more than half a million compounds, an algorithm has identified the 569 molecules that could drive a circular economy, where chemical waste becomes feedstock for new materials. Called strategic molecules, they are key players in pathways that lead from biowaste – such as terpene mixtures produced by the paper industry – to valuable compounds such as drugs.
‘At the moment, if you have a waste stream and you want to produce a high value end product, there is a lot of uncertainty over what to make and by what routes,’ explains Jana Marie Weber from the University of Cambridge, UK. ‘People normally focus on very specific waste feedstocks and then see what they can make from it. Or they go from the product, step by step, and relate high value end product to the biowaste that they have.’ However, both of these approaches take an extraordinary amount of both chemical expertise and computational power.
The idea of algorithms designing buildings may sound exciting to some and concerning to others. However, after decades of attempts at harnessing this process, companies like The Living may have cracked the system to automatically generating building designs. The New York-based research studio focuses on computation, sustainability practices, and biology to explore variations of building prototyping.
In a recent news release from the studio, the team shares their new collaborative project with Princeton University. “The Embodied Computation Lab is a building that includes new sustainability and low-carbon features, and the facade involves the use of custom algorithms trained to detect knots in wood—bringing the power of machine learning to the physical world.”
Belmont University will have two new majors in the fall: in data science and business systems and analytics.
The announcement follows the news earlier this month that the university would offer a tourism management major.
The university cited high entry-level salaries for new graduates in the data science field, as well as the forthcoming arrival of companies like Amazon, AllianceBernstein and possibly Oracle in the city.
Sometimes even Bill Gates’ crystal ball is cloudy.
The Microsoft co-founder became one of the most successful tech moguls of all time by foreseeing and capitalizing on world-changing trends, but he acknowledged this week that it’s difficult to predict when the coming tide of automation and artificial intelligence will have the most impact on the our economy and labor market.
The biggest implications could still be decades away, Gates said in a conversation with Microsoft Research Labs director Eric Horvitz on stage at the Microsoft Research Faculty Summit in Redmond this week.
“We have many decades to get this right, but it is a fairly dramatic thing that you want to get society broadly involved in helping you think about with plenty of lead time,” he said.
The New England Journal of Medicine NEJM announced new guidelines for authors for statistical reporting yesterday*. The ASA describes the change as “in response to the ASA Statement on P-values and Statistical Significance and subsequent The American Statistician special issue on statistical inference” (ASA I and II, in my abbreviation). If so, it seems to have backfired. I don’t know all the differences in the new guidelines, but those explicitly noted appear to me to move in the reverse direction from where the ASA I and II guidelines were heading.
The most notable point is that the NEJM highlights the need for error control, especially for constraining the Type I error probability, and pays a lot of attention to adjusting P-values for multiple testing and post hoc subgroups. ASA I included an important principle (#4) that P-values are altered and may be invalidated by multiple testing, but they do not call for adjustments for multiplicity, nor do I find a discussion of Type I or II error probabilities in the ASA documents. NEJM gives strict requirements for controlling family-wise error rate or false discovery rates (understood as the Benjamini and Hochberg frequentist adjustments).
Once a patient leaves the hospital or doctor’s office, Pattern Health ensures they are not gone forever. As a digital health platform-as-a-service (PaaS) company, Durham’s Pattern Health allows clinicians and researchers to create advanced apps for specific medical conditions to increase adherence to medical treatment and directions while elevating overall well-being for patients.
Chief Growth Officer Tim Horan described Pattern Health as a Wix for digital health instead of website creation. At the beginning of this year, the startup launched a new service that allows partners to take the digital health programs they developed on Pattern Health’s platform and create a marketplace for sharing and licensing those programs to other health systems and employers.
University of California-Santa Barbara, The UCSB Current
from
Tens of millions of people face malnutrition the world over. Fortunately, there are systems in place to help anticipate famines and coordinate life-saving aid. Among them are the Famine Early Warning Systems Network, or FEWS NET, a cutting-edge drought early warning system, created some 30 years ago and run by scientists at UC Santa Barbara and several partner institutions.
FEWS NET identifies the location, severity and causes of food insecurity and issues alerts to humanitarian NGOs and government agencies. It achieves this by taking advantage of satellite observations, in-situ measurements, Earth systems models and field scientists’ observations. FEWS NET covers Africa, central America and parts of central Asia and the Caribbean.
Miami, FL September 10-11. “This intimate event is the only one of its kind and curates data science sessions to bring specialists in finance, hospitality, and healthcare face-to-face to educate each other on innovative new solutions in artificial intelligence, machine learning, predictive analytics and acceptance around best practices.” [$$$]
Harvard University, Center for Geographic Analysis Center for Geographic Analysis
from
Cambridge, MA November 7, starting at 12 p.m., Harvard Center for Geographic Analysis. Speaker: Dr. Alex Liu, Chief Data Scientist, IBM & Director, RMDS Lab. [rsvp required]
New York, NY July 30, starting at 5:30 p.m. “Hear from DataKind volunteers Anasuya Das, Lina Vourgidou, and Stuart Lynn about their work with Community Based Care of Central Florida, Solutions Journalism Network, and Clean Coalition. There will be a facilitated Q&A after each project showcase as well as the chance to mix and mingle with fellow data do-gooders.” [sold out, wait list only]
Brooklyn, NY August 6, starting at 5:30 p.m., Capgemini AIE (79 5th Ave). “In the latest edition of Focus Al, the NYU Tandon Future Labs partner with Capgemini’s Applied Innovation Exchange to explore the stages of understanding and preventing data bias — framing the problem, collecting the data, and preparing the data.” [rsvp required]
Data & Policy is a new, peer-reviewed open-access journal dedicated to the potential of data science to address important policy challenges. It aims to promote a deeper understanding of what the opening editorial calls “policy-data interactions” by publishing research that takes a “consistent, symmetrical approach to consideration of systems of policy and data, [and] how they interact with one another.”
This survey includes questions regarding personal and potentially identifiable information. We’re asking these questions because the R community is actively working to improve the inclusiveness and openness of our online and offline spaces. Taking snapshots of demographic data allows us to see how we’re doing over time, and adjust accordingly if the results suggest our events are excluding certain demographics.
Our team’s new stylist selection algorithm has shown excellent performance in helping stylists pick the best items for our clients. To feed this model with input features, we need to extract information from many sources, including Fix Notes. A common place to start with NLP is the bag-of-words approach, but this doesn’t capture the contextual relationship between words, nor does it take into account word order. For example, “Give me jeans not shoes” and “Give me shoes not jeans” would receive the same representation with this approach. Our goal is to come up with features that bear more information about the requests so that our styling model can do a better job at clothing item selection. This is where we get help from attention-based NLP models. Throughout this post, we will discuss the basic operations behind BERT in the hopes of familiarizing data scientists and other interested readers with these models and their potential applications in addressing similar problems.
Algorithmic decision making is both increasingly common and increasingly controversial. Critics worry that algorithmic tools are not transparent, accountable or fair. Assessing the fairness of these tools has been especially fraught as it requires that we agree about what fairness is and what it entails. Unfortunately, we do not. The technological literature is now littered with a multitude of measures, each purporting to assess fairness along some dimension. Two types of measures stand out. According to one, algorithmic fairness requires that the score an algorithm produces should be equally accurate for members of legally protected groups, blacks and whites for example. According to the other, algorithmic fairness requires that the algorithm produces the same percentage of false positives or false negatives for each of the groups at issue. Unfortunately, there is often no way to achieve parity in both these dimensions. This fact has led to a pressing question. Which type of measure should we prioritize and why?
This Article makes three contributions to the debate about how best to measure algorithmic fairness: one conceptual, one normative, and one legal. Equal predictive accuracy ensures that a score means the same thing for each group at issue. As such, it relates to what one ought to believe about a scored individual. Because questions of fairness usually relate to action not belief, this measure is ill-suited as a measure of fairness. This is the Article’s conceptual contribution. Second, this Article argues that parity in the ratio of false positives to false negatives is a normatively significant measure. While a lack of parity in this dimension is not constitutive of unfairness, this measure provides important reasons to suspect that unfairness exists. This is the Article’s normative contribution. Interestingly, improving the accuracy of algorithms overall will lessen this unfairness. Unfortunately, a common assumption that antidiscrimination law prohibits the use of racial and other protected classifications in all contexts is inhibiting those who design algorithms from making them as fair and accurate as possible. This Article’s third contribution is to show that the law poses less of a barrier than many assume.
Access to data and AI for social impact is #Solvable. Jacob Weisberg talks to Jake Porway about how organizations aimed at social good can leverage the benefits of AI and data science. [audio, 33:53]