Data Science newsletter – July 22, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for July 22, 2019

GROUP CURATION: N/A

Data Science News

Data from health apps offers opportunities and obstacles to researchers

The Verge, Nicole Wetsman

from July 03, 2019

Researchers are eager to tap into the steadily expanding pool of health information collected from users by products like Fitbit, Clue, and the Apple Watch. But while these datasets could be a scientific treasure trove for scientists, they also pose logistical and ethical challenges that need to be addressed.

“There are huge opportunities. I think that’s the attraction,” says Ida Sim, director of digital health for the Division of General Internal Medicine at the University of California, San Francisco. Sim explains that part of the appeal for scientists is that the apps and tech are designed to appeal to the general public. A commerheractive interface is primed for long-term use by far more people than can usually be included in a research study, and people actually use them. “As opposed to a clunky research wristband, which is ugly, and people won’t wear it,” she says.

Education publisher Pearson to phase out print textbooks

BBC News

from July 16, 2019

The world’s largest education publisher has taken the first step towards phasing out print books by making all its learning resources “digital first”.

Algorithms improve the odds of synthetic chemistry success

University of Utah, UNews

from July 17, 2019

In a new publication in Nature, University of Utah chemists Jolene Reid and Matthew Sigman show how analyzing previously published chemical reaction data can predict how hypothetical reactions may proceed, narrowing the range of conditions chemists need to explore. Their algorithmic prediction process, which includes aspects of machine learning, can save valuable time and resources in chemical research.

“We try to find the best combination of parameters,” Reid says. “Once we have that we can adjust features of any reaction and actually predict how that adjustment will affect it.”

Microsoft invests $1 billion in artificial intelligence project co-founded by Elon Musk

CNBC, Lauren Feiner

from July 22, 2019

Microsoft and OpenAI announced a new partnership to build artificial general intelligence to tackle more complex tasks than AI.

Microsoft will invest $1 billion in OpenAI as part of the project, the companies said.

While today’s AI can tackle simple tasks, the companies said AGI will be able to take on more “multidisciplinary problems.”

Scientists Awarded $3.2 Million to Improve Delivery Room Decisions

Stony Brook University, SBU News

from July 18, 2019

Stony Brook scientists have received major NIH funding to investigate a pioneering approach to fetal monitoring that could improve outcomes in the delivery room.

A multi-disciplinary team led by the College of Engineering and Applied Sciences (CEAS) has received $3.2 million under the National Institutes of Health (NIH) Research Project Grant Program (RO1) to investigate machine learning methods for classification of intrapartum signals (FHR and uterine activity) that has the potential to significantly outperform the accuracy of contemporary methods.

MapLab: Rise of the Digital Twin

CityLab, Laura Bliss

from July 17, 2019

In 2014, researchers from the University of Washington announced that pairing Google StreetView with a cluster of “smart” surveillance cameras allowed them to create “a self-organized and scalable multiple-camera tracking system that tracks humans across the cameras.”

In so many words, they showed that it was possible to build a dynamic, near real-time visualization of pedestrians and traffic flows, projected onto a 360-degree map of the world. A bit of machine-learning software helped erase any seams. This was an early proof of concept in an urban setting of a technological model now known as a “digital twin.”

“Digital twin” is a creepy-sounding phrase, conjuring visions of pixelated doppelgangers haunting your every step. It doesn’t necessarily describe an all-out surveillance state, though: In some ways, this is an extension of the 3-D computer models that architects and engineers use to help plan a building, or maneuver the inner workings of a car engine before they hit the factory.

Orlando cancels Amazon Rekognition program, capping 15 months of glitches and controversy

Orlando Weekly, Joey Roulette

from July 18, 2019

Orlando’s two-phase pilot with Amazon to try out real-time facial recognition software ended Thursday, capping 15 months of technical lags, bandwidth issues and uncertainty over whether the controversial face-scanning technology actually works.

The city’s second pilot phase with the software, called Rekognition, began in October 2018 using four cameras at the police department’s headquarters, three in downtown and one outside a community recreation center. The technology is designed to automatically identify and track suspects in real-time using facial recognition algorithms.

“At this time, the city was not able to dedicate the resources to the pilot to enable us to make any noticeable progress toward completing the needed configuration and testing,” Orlando’s Chief Administrative Office said in a memo to City Council, adding that the city has “no immediate plans regarding future pilots to explore this type of facial recognition technology.”

Algorithm finds the 569 molecules that might drive a waste-free economy

Chemistry World, Katrina Kramer

from July 18, 2019

Having trawled more than half a million compounds, an algorithm has identified the 569 molecules that could drive a circular economy, where chemical waste becomes feedstock for new materials. Called strategic molecules, they are key players in pathways that lead from biowaste – such as terpene mixtures produced by the paper industry – to valuable compounds such as drugs.

‘At the moment, if you have a waste stream and you want to produce a high value end product, there is a lot of uncertainty over what to make and by what routes,’ explains Jana Marie Weber from the University of Cambridge, UK. ‘People normally focus on very specific waste feedstocks and then see what they can make from it. Or they go from the product, step by step, and relate high value end product to the biowaste that they have.’ However, both of these approaches take an extraordinary amount of both chemical expertise and computational power.

Princeton University explores algorithm-based construction design with New York-based research group The Living

Archinect, Katherine Guimapang

from July 18, 2019

The idea of algorithms designing buildings may sound exciting to some and concerning to others. However, after decades of attempts at harnessing this process, companies like The Living may have cracked the system to automatically generating building designs. The New York-based research studio focuses on computation, sustainability practices, and biology to explore variations of building prototyping.

In a recent news release from the studio, the team shares their new collaborative project with Princeton University. “The Embodied Computation Lab is a building that includes new sustainability and low-carbon features, and the facade involves the use of custom algorithms trained to detect knots in wood—bringing the power of machine learning to the physical world.”

Belmont adds data science, business analytics programs

Nashville Post

from July 17, 2019

Belmont University will have two new majors in the fall: in data science and business systems and analytics.

The announcement follows the news earlier this month that the university would offer a tourism management major.

The university cited high entry-level salaries for new graduates in the data science field, as well as the forthcoming arrival of companies like Amazon, AllianceBernstein and possibly Oracle in the city.

Bill Gates: Biggest impact of AI may be decades away, but society needs to prepare now

GeekWire, Monica Nickelsburg & Todd Bishop

from July 18, 2019

Sometimes even Bill Gates’ crystal ball is cloudy.

The Microsoft co-founder became one of the most successful tech moguls of all time by foreseeing and capitalizing on world-changing trends, but he acknowledged this week that it’s difficult to predict when the coming tide of automation and artificial intelligence will have the most impact on the our economy and labor market.

The biggest implications could still be decades away, Gates said in a conversation with Microsoft Research Labs director Eric Horvitz on stage at the Microsoft Research Faculty Summit in Redmond this week.

“We have many decades to get this right, but it is a fairly dramatic thing that you want to get society broadly involved in helping you think about with plenty of lead time,” he said.

The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)

Error Statistics Philosophy, Mayo

from July 19, 2019

The New England Journal of Medicine NEJM announced new guidelines for authors for statistical reporting yesterday*. The ASA describes the change as “in response to the ASA Statement on P-values and Statistical Significance and subsequent The American Statistician special issue on statistical inference” (ASA I and II, in my abbreviation). If so, it seems to have backfired. I don’t know all the differences in the new guidelines, but those explicitly noted appear to me to move in the reverse direction from where the ASA I and II guidelines were heading.

The most notable point is that the NEJM highlights the need for error control, especially for constraining the Type I error probability, and pays a lot of attention to adjusting P-values for multiple testing and post hoc subgroups. ASA I included an important principle (#4) that P-values are altered and may be invalidated by multiple testing, but they do not call for adjustments for multiplicity, nor do I find a discussion of Type I or II error probabilities in the ASA documents. NEJM gives strict requirements for controlling family-wise error rate or false discovery rates (understood as the Benjamini and Hochberg frequentist adjustments).

ML/CV publication venues are moving up in the h5-index ranking

Twitter, Yann LeCun

from July 21, 2019

#9: CVPR (240) ahead of PNAS, PRL, JAMA.
#27: NeurIPS (169)
#42: ICLR (150) ahead of Neuron and Nature Neuroscience
#56: ECCV (137)
#59: ICML (135)

Durham’s Pattern Health Fosters Innovation In Healthcare

GrepBeat, Suzanne Blake

from July 16, 2019

Once a patient leaves the hospital or doctor’s office, Pattern Health ensures they are not gone forever. As a digital health platform-as-a-service (PaaS) company, Durham’s Pattern Health allows clinicians and researchers to create advanced apps for specific medical conditions to increase adherence to medical treatment and directions while elevating overall well-being for patients.

Chief Growth Officer Tim Horan described Pattern Health as a Wix for digital health instead of website creation. At the beginning of this year, the startup launched a new service that allows partners to take the digital health programs they developed on Pattern Health’s platform and create a marketplace for sharing and licensing those programs to other health systems and employers.

Getting Better All the Time

University of California-Santa Barbara, The UCSB Current

from July 18, 2019

Tens of millions of people face malnutrition the world over. Fortunately, there are systems in place to help anticipate famines and coordinate life-saving aid. Among them are the Famine Early Warning Systems Network, or FEWS NET, a cutting-edge drought early warning system, created some 30 years ago and run by scientists at UC Santa Barbara and several partner institutions.

FEWS NET identifies the location, severity and causes of food insecurity and issues alerts to humanitarian NGOs and government agencies. It achieves this by taking advantage of satellite observations, in-situ measurements, Earth systems models and field scientists’ observations. FEWS NET covers Africa, central America and parts of central Asia and the Caribbean.

Events

Data Science Salon Miami

Formulatedby and Data Science Salon

from September 10, 2019

Miami, FL September 10-11. “This intimate event is the only one of its kind and curates data science sessions to bring specialists in finance, hospitality, and healthcare face-to-face to educate each other on innovative new solutions in artificial intelligence, machine learning, predictive analytics and acceptance around best practices.” [$$$]

Data Science Ecosystems with AI to Improve Social Research

Harvard University, Center for Geographic Analysis Center for Geographic Analysis

from November 07, 2019

Cambridge, MA November 7, starting at 12 p.m., Harvard Center for Geographic Analysis. Speaker: Dr. Alex Liu, Chief Data Scientist, IBM & Director, RMDS Lab. [rsvp required]

DataKind Summer 2019 Project Showcase

DataKind

from July 30, 2019

New York, NY July 30, starting at 5:30 p.m. “Hear from DataKind volunteers Anasuya Das, Lina Vourgidou, and Stuart Lynn about their work with Community Based Care of Central Florida, Solutions Journalism Network, and Clean Coalition. There will be a facilitated Q&A after each project showcase as well as the chance to mix and mingle with fellow data do-gooders.” [sold out, wait list only]

Focus | AI: Breaking Bias

NYU Tandon School of Engineering, the Future Labs

from August 06, 2019

Brooklyn, NY August 6, starting at 5:30 p.m., Capgemini AIE (79 5th Ave). “In the latest edition of Focus Al, the NYU Tandon Future Labs partner with Capgemini’s Applied Innovation Exchange to explore the stages of understanding and preventing data bias — framing the problem, collecting the data, and preparing the data.” [rsvp required]

Deadlines

Data & Policy – Call for Papers 2019

Data & Policy is a new, peer-reviewed open-access journal dedicated to the potential of data science to address important policy challenges. It aims to promote a deeper understanding of what the opening editorial calls “policy-data interactions” by publishing research that takes a “consistent, symmetrical approach to consideration of systems of policy and data, [and] how they interact with one another.”

Forwards, the R Foundation taskforce for under-represented groups, invites you to participate in the useR! 2019 survey.

This survey includes questions regarding personal and potentially identifiable information. We’re asking these questions because the R community is actively working to improve the inclusiveness and openness of our online and offline spaces. Taking snapshots of demographic data allows us to see how we’re doing over time, and adjust accordingly if the results suggest our events are excluding certain demographics.

PhD Fellowship – Microsoft Research

Nominations from the university accepted through August 15, 2019

PhD student nominees receive a request to submit their proposal at the end of August 2019

Tools & Resources

Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want

Stitch Fix Multithreaded blog, Reza Sohrabi

from July 15, 2019

Our team’s new stylist selection algorithm has shown excellent performance in helping stylists pick the best items for our clients. To feed this model with input features, we need to extract information from many sources, including Fix Notes. A common place to start with NLP is the bag-of-words approach, but this doesn’t capture the contextual relationship between words, nor does it take into account word order. For example, “Give me jeans not shoes” and “Give me shoes not jeans” would receive the same representation with this approach. Our goal is to come up with features that bear more information about the requests so that our styling model can do a better job at clothing item selection. This is where we get help from attention-based NLP models. Throughout this post, we will discuss the basic operations behind BERT in the hopes of familiarizing data scientists and other interested readers with these models and their potential applications in addressing similar problems.

Measuring Algorithmic Fairness

SSRN, Deborah Hellman

from July 11, 2019

Algorithmic decision making is both increasingly common and increasingly controversial. Critics worry that algorithmic tools are not transparent, accountable or fair. Assessing the fairness of these tools has been especially fraught as it requires that we agree about what fairness is and what it entails. Unfortunately, we do not. The technological literature is now littered with a multitude of measures, each purporting to assess fairness along some dimension. Two types of measures stand out. According to one, algorithmic fairness requires that the score an algorithm produces should be equally accurate for members of legally protected groups, blacks and whites for example. According to the other, algorithmic fairness requires that the algorithm produces the same percentage of false positives or false negatives for each of the groups at issue. Unfortunately, there is often no way to achieve parity in both these dimensions. This fact has led to a pressing question. Which type of measure should we prioritize and why?

This Article makes three contributions to the debate about how best to measure algorithmic fairness: one conceptual, one normative, and one legal. Equal predictive accuracy ensures that a score means the same thing for each group at issue. As such, it relates to what one ought to believe about a scored individual. Because questions of fairness usually relate to action not belief, this measure is ill-suited as a measure of fairness. This is the Article’s conceptual contribution. Second, this Article argues that parity in the ratio of false positives to false negatives is a normatively significant measure. While a lack of parity in this dimension is not constitutive of unfairness, this measure provides important reasons to suspect that unfairness exists. This is the Article’s normative contribution. Interestingly, improving the accuracy of algorithms overall will lessen this unfairness. Unfortunately, a common assumption that antidiscrimination law prohibits the use of racial and other protected classifications in all contexts is inhibiting those who design algorithms from making them as fair and accurate as possible. This Article’s third contribution is to show that the law poses less of a barrier than many assume.

Misuse of data is #Solvable

Rockefeller Foundation, Puskin Industries

from July 17, 2019

Access to data and AI for social impact is #Solvable. Jacob Weisberg talks to Jake Porway about how organizations aimed at social good can leverage the benefits of AI and data science. [audio, 33:53]

Careers

Tenured and tenure track faculty positions

Open Rank

University of Maryland, Department of Criminology and Criminal Justice and the College of Information Studies; College Park, MD

Assistant Professor – Marketing (2)

University of Toronto, Rotman School of Management; Toronto, ON, Canada

Assistant Professor of Psychology

Harvard University, Department of Psychology; Cambridge, MA

Full-time, non-tenured academic positions

Research Professor of Cultural Data Analytics

Tallinn University, School of Baltic Film, Media, Arts and Communication; Tallinn, Estonia

Lecturer/Senior Lecturer Statistics (Data Science)

University of Melbourne, School of Mathematics and Statistics; Parkville, Victoria, Australia

Full-time positions outside academia

GIS Technician, Senior

City of Bellingham; Bellingham, WA

AI & Machine Learning Engineer

Teach for America; New York, NY

Sports.BradStenger.com

Data Science newsletter – July 22, 2019

Leave a Comment Cancel reply