Data Science newsletter – June 13, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for June 13, 2019


Data Science News

A Just and Comprehensive Strategy for Using NLP to Address Online Abuse

arXiv, Computer Science > Social and Information Networks; David Jurgens, Eshwar Chandrasekharan, Libby Hemphill


Online abusive behavior affects millions and the NLP community has attempted to mitigate this problem by developing technologies to detect abuse. However, current methods have largely focused on a narrow definition of abuse to detriment of victims who seek both validation and solutions. In this position paper, we argue that the community needs to make three substantive changes: (1) expanding our scope of problems to tackle both more subtle and more serious forms of abuse, (2) developing proactive technologies that counter or inhibit abuse before it harms, and (3) reframing our effort within a framework of justice to promote healthy communities.

This is so cool: @zotero now flags retracted articles in its latest beta, in collaboration with @RetractionWatch

Twitter, Sebastian Karcher


Facebook launches PyTorch Hub for reproducing AI model results

VentureBeat, Khari Johnson


Reproducibility puts the “science” in the computer science of A — it’s how researchers can prove their AI systems are robust and reliable. To support reproducibility for AI models, Facebook today announced the beta release of PyTorch Hub, an API and workflow for research reproducibility and support.

PyTorch Hub can quickly publish pretrained models to a GitHub repository by adding a hubconf.pyfile and publishing models using a GitHub pull request. PyTorch Hub comes with support for models in Google Colab and PapersWithCode.

Mary Meeker’s most important trends on the internet

Vox, Rani Molla


The general partner at venture capital firm Bond Capital delivered a rapid-fire 333-page slideshow that looked back at every important internet trend in the last year and looked forward about what these trends tell us to expect in the year ahead. The “Queen of the Internet” and former Kleiner Perkins Caufield & Byers partner touched on everything from accelerating internet ad spend in the US to the growth of digital delivery services in Latin America.

But 333 pages is a lot of data to wade through. So Recode has pulled out some of the significant and most interesting trends in Meeker’s report. (You can find the full slide deck at the bottom of this story.)

  • Some 51 percent of the world — 3.8 billion people — were internet users last year, up from 49 percent (3.6 billion) in 2017. Growth slowed to about 6 percent in 2018 because so many people have come online that new users are harder to come by. Sales of smartphones — which are the primary internet access point for many people across the globe — are declining as much of the world that is going to be online already is.

  • Data Visualization of the Week

    The New York Times, Kevin Litman-Navarro


    h/t to Gina Helfrich of NumFocus for sending this in to us.

    Report: Digital health innovation stymied by poor data sharing

    MobiHealthNews, Dave Muoio


    Patients and providers alike are interested in, and see major benefits from, increased access to digital health records, according to a new report from Royal Philips.

    How much NASA, DOD, and NOAA are spending on AI research

    Quartz, Amanda Shendruk


    Government research in machine learning is ramping up. Since 2003, the US government has spent nearly a billion dollars on AI-focused or facilitated research grants. Half of it was awarded in the past four years. The money has been used to fund research into everything from bird migration to astronaut assessments, according to a trove of federal assistance data compiled by Quartz.

    Here’s what that explosion in funding looks like. These are all the grants involving machine learning since 2016 that provide at least $500,000.

    Risky partner: Top U.S. universities took funds from Chinese firm tied to Xinjiang security

    Reuters, Alexandra Harney


    The Massachusetts Institute of Technology and at least one other university have research partnerships with a Chinese artificial intelligence company that has business ties with police in China’s Xinjiang region, where a sweeping crackdown on Uighurs has drawn international condemnation.

    Energy Department to Invest $32 Million in Computer Design of Materials

    U.S. Department of Energy


    The U.S. Department of Energy announced today that it will invest $32 million over the next four years to accelerate the design of new materials through use of supercomputers.

    Seven projects will be supported, three led by teams at DOE National Laboratories and four by Universities. The teams are led by Argonne National Laboratory (ANL), Brookhaven National Laboratory (BNL) and Lawrence Livermore National Laboratory (LLNL) as well as the University of Illinois, the Pennsylvania State University, the University of Texas and the University of Southern California.

    A Duke study recorded thousands of students’ faces. Now they’re being used all over the world

    Duke University, The Chronicle student newspaper, Jake Satisky


    The data set and the project’s Duke website were taken down in April, after Microsoft came under fire for its facial recognition database that had more than 10 million images of roughly 100,000 people. The company’s database was exposed in a Financial Times investigation, in which the data set from Duke was also mentioned as “one of the most popular pedestrian recognition training sets.”

    Microsoft has since pulled its data set, and Stanford University also removed one of its public data sets.

    The takedown of the website from Duke came after media reports spurred an investigation by the Institutional Review Board, wrote Michael Schoenfeld, vice president for public affairs and government relations, in an email to The Chronicle.

    World’s largest plant survey reveals alarming extinction rate

    Nature, News, Heidi Ledford


    The world’s seed-bearing plants have been disappearing at a rate of nearly 3 species a year since 1900 ― which is up to 500 times higher than would be expected as a result of natural forces alone, according to the largest survey yet of plant extinctions.

    The project looked at more than 330,000 species and found that plants on islands and in the tropics were the most likely to be declared extinct. Trees, shrubs and other woody perennials had the highest probability of disappearing regardless of where they were located. The results were published on 10 June in Nature Ecology & Evolution.

    Tackling bias in artificial intelligence (and in humans)

    McKinsey, Jake Silberg and James Manyika


    In, Notes from the AI frontier: Tackling bias in AI (and in humans) (PDF–120KB), we provide an overview of where algorithms can help reduce disparities caused by human biases, and of where more human vigilance is needed to critically analyze the unfair biases that can become baked in and scaled by AI systems. This article, a shorter version of that piece, also highlights some of the research underway to address the challenges of bias in AI and suggests six pragmatic ways forward.

    Two opportunities present themselves in the debate. The first is the opportunity to use AI to identify and reduce the effect of human biases. The second is the opportunity to improve AI systems themselves, from how they leverage data to how they are developed, deployed, and used, to prevent them from perpetuating human and societal biases or creating bias and related challenges of their own. Realizing these opportunities will require collaboration across disciplines to further develop and implement technical improvements, operational practices, and ethical standards.

    Text-based Editing of Talking-head Video

    Ohad Fried et al.


    Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.

    Consumer Reports Launches New Digital Lab

    Consumer Reports, Allen St. John


    In 2018 CR applied the Digital Standard to its television testing, discovering that certain smart TVs were vulnerable to hacking. That led Samsung to fix a security flaw in its sets. Similar work is underway with routers, printers, password managers, and a number of other connected devices and services. The Digital Standard will remain a key benchmark for product testing in the Digital Lab.

    “Our digital testing has already showed how products and services we use every day can expose us to many new and potential harms,” says Marta Tellado, CR’s CEO and president. “Consumer Reports’ new Digital Lab will reveal precisely how and where our rights are undermined by the unchecked influence of technology. Armed with that knowledge, consumers can make more secure choices that protect our privacy and hold today’s digital giants to account.”

    Google and NOAA partner on Pattern Radio: Whale Songs

    Google AI Blog, Julie Cattiau


    Back in the 1960s, scientists first discovered that humpback whales actually sing songs, which evolve over time. But there’s still so much we don’t understand. Why do humpbacks sing? What is the meaning of the patterns within their songs?

    Scientists sift through an ocean of sound to find answers to these questions. But what if anyone could help make discoveries?

    For the past year, Google AI has been partnering with NOAA’s Pacific Island Fisheries Science Center to train an artificial intelligence model on their vast collection of underwater recordings. This project is helping scientists better understand whales’ behavioral and migratory patterns, so scientists can better protect whales. The effort fits into Google’s AI for Social Good program, applying the latest in machine learning to the world’s biggest humanitarian and environmental challenges.


    Great AI Debate Series

    NYU Tandon School of Engineering, the Future Labs


    Brooklyn, NY June 18, starting at 6 p.m., Pfizer Auditorium (5 Metrotech Center). [$$]


    Email newsletters survey

    Luminary Labs is crowdsourcing a list of your must-read email newsletters.
    Tools & Resources

    Downloading Fitbit Data Histories with R

    R-bloggers, Method Matters blog


    In this post, we will see how to download personal Fitbit data histories for step counts, heart rate, and sleep via the Fitbit API. We will use a combination of existing R packages and custom calls to the Fitbit API to get all of the data we are interested in.

    This post won’t focus on data analysis per se, but rather data collection.




    “GrapheneOS is a privacy / security research and engineering project that has been under way for over 5 years. It recently became rebranded as GrapheneOS and is taking a different direction based on obtaining funding for the research and development work as a non-profit open source project rather than being a company. The reborn project is still in a very early stage and lots of the past work on privacy and security has not yet been restored for the new incarnation of the OS.”

    2019 Data Release landing page | IRIS2019 Data Release

    University of Michigan, Institute for Research on Innovation & Science


    IRIS UMETRICS 2019 Data Release

    ICML 2019 | Google, ETH Zurich, MPI-IS, Cambridge & Share Best Paper Honours



    “Conference organizers have announced the recipients of the ICML 2019 Best Paper Awards: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations from Google Research, ETH Zurich, and Max Planck Institute for Intelligent Systems (MPIS-IS); and Rates of Convergence for Sparse Variational Gaussian Process Regressionfrom the University of Cambridge and



    New postdoc position in advancing trait-based ecology

    University of Arizona, Ecology and Evolutionary Biology (EEB) Department; Tucson, AZ

    Postdoctoral Fellowship in NeuroImaging

    Northwestern University, Department of Physical Therapy and Human Movement Sciences; Chicago, IL

    Postdoctoral Research Associate

    Brown University, Data Science Initiative; Providence, RI
    Full-time positions outside academia

    Data Scientist

    New York Stem Cell Foundation (NYSCF) Research Institute; New York, NY


    DataKind; New York, NY

    Interdisciplinary (Regional Water Control Data System Manager) Job

    Department of the Army; Portland, OR

    AI/ML Engineer

    Verisk; Jersey City, NJ

    Data Scientist

    Scale; San Francisco, CA

    Journalist, Shared Data Unit

    BBC; Birmingham, England

    NLP/Linguistics Machine Learning

    Proxi AI; Berkeley, CA

    Data Engineer, Baseball Operations

    Detroit Tigers; Detroit, MI
    Full-time, non-tenured academic positions

    Research Librarian for Engineering & Biotechnology

    North Carolina State University, University Libraries; Raleigh, NC

    Assoc Data Sci Anlyst/Engineer

    NYU School of Medicine; NYU Langone Comprehensive Program on Obesity; New York, NY

    Data Manager

    New York University, Research Alliance for New York City Schools; New York, NY

    Associate Director

    University of Pennsylvania, Wharton School, Behavior Change for Good Initiative; Philadelphia, PA
    Internships and other temporary positions

    Internship -Masters and PhD students

    Verisk; Jersey City, NJ

    Leave a Comment

    Your email address will not be published.