Data Science newsletter – March 1, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for March 1, 2020


Data Science News

Why faces don’t always tell the truth about feelings

Nature, News Feature, Douglas Heaven


Although AI companies market software for recognizing emotions in faces, psychologists debate whether expressions can be read so easily.

New Undergraduate Cornell Clinic Researches Automated Scoring Systems

The Cornell Daily Sun, Catherine Cai and Feifei Hu


The proliferation of rating or ranking systems in everyday life often leads to complaints of online misrepresentation.

Cornell’s Due Process Clinic, an undergraduate “clinical” course designed to understand automated scoring systems such as credit scores and search engine rankings, started sending student researchers to collect qualitative data and build their own case studies on these systems.

Clinic director, Prof. Malte Ziewitz, science and technology studies, founded the clinic because he wanted to use legal frameworks to research non-legal situations.

Artificial Intelligence Will Soon Be Used To Predict Water Quality In The Anacostia River

DCist, Jacob Fenston


Every Wednesday during the warmer months, volunteers test the water for bacteria at sites up and down the Anacostia. But what if you want to go paddleboarding on Saturday? The results from three days earlier could be completely irrelevant, especially if there’s been a big rainstorm, possibly causing sewage to overflow into the river.

The environmental group Anacostia Riverkeeper is trying to fill in those water quality gaps, partnering with the artificial intelligence company DataRobot to predict bacteria levels, based on known current conditions and past information.

“It’s kind of similar to how they do climate change models or weather forecasting,” says Robbie O’Donnell, with Anacostia Riverkeeper. There is a long list of factors that can impact bacteria levels on any given day, he says. “There’s rain, there’s the turbidity at that moment, there’s the water level, there’s the tide, there’s the pH of the water, how sunny it is outside,” O’Donnell says.

Musicians Algorithmically Generate Every Possible Melody, Release Them to Public Domain

VICE, Motherboard, Samantha Cole


Damien Riehl and Noah Rubin generated and saved every possible melody to a hard drive, then turned it back around to the commons.

Artificial intelligence uncovers outrageous employee expense reports

TechRepublic, Esther Shein


Guns and bullets. Bulletproof vests. Nail salon excursions, jewelry and hangover kits. These are among the expenses employees bill to their companies, according to AppZen, a provider of financial software.

You’ve got to hand it to them, employees are certainly creative in their perceptions of what constitutes business expenses. Businesses process an average of 100,000 expense reports and 700,000 invoices a year, and that means suspect expenses are falling through the cracks, according to AppZen’s The State of Business Spend report for Q4 2019.

How National 5G Policy Became Chaotic

POLITICO, John Hendel


Trump administration officials have been lobbying their own ideas for how America can win the race to build superfast 5G service — seemingly with no coordination at all. Economic adviser Larry Kudlow talked up a new “virtualized” network to counter Chinese hardware dominance. Attorney General William Barr suggested that instead the U.S. should buy a controlling stake in one of Huawei’s European competitors. The idea was unprecedented — and apparently unsupported: A day later, Vice President Mike Pence seemed to walk it back.

Democrats in Congress haven’t missed the chance to pounce on the White House’s disarray on such a fundamental new technology. Sen. Brian Schatz of Hawaii, the top Democrat on the Commerce Committee’s telecommunications subcommittee, called the White House’s 5G planning “haphazard in the extreme.”

It’s not just the White House, though: If the strategy is ever finalized, Washington is poised for even more confusion, and the reason dates back long before Trump.

me, after spending countless hours arguing how my department needs to recruit and admit more Black grad students, to then see that this year they gave an offer to ONE Black person who wasn’t already in the department. ONE?! out of 30. nah.

Twitter, Kaneesha R. Johnson


*spending countless hours alongside some dope Black grad students who put in SO. MUCH. WORK. trying to make our workplace just a little more inclusive and comfortable for Black students

Coronavirus “infodemic” threatens world’s health institutions

Axios, Eileen Drage O'Reilly


The spread of the novel coronavirus outbreak is being matched, or even outrun, by the spread on social media of both unintentional misinformation about it and vociferous campaigns of malicious disinformation, experts tell Axios.

Why it matters: The tide of bad information is undermining trust in governments, global health organizations, nonprofits and scientists — the very institutions that many believe are needed to organize a global response to what may be turning into a pandemic.

Apple, Johnson & Johnson study to predict stroke risk with Apple Watch

CNBC, Todd Haselton


  • Apple and Johnson & Johnson announced a new study called Heartline on Tuesday to research atrial fibrilation and early detection of stroke using the Apple Watch.
  • Apple and Stanford published the results of a different health study in November, and the finding are still up for debate among doctors.
  • The new Heartline study is open to people aged 65 and older and open enrollment begins today.

  • Swinging the Vote? Google’s black box algorithm controls which political emails land in your main inbox. For 2020 presidential candidates, the differences are stark.

    The Markup; Adrianne Jeffries, Leon Yin, and Surya Mattu


    Pete Buttigieg is leading at 63 percent. Andrew Yang came in second at 46 percent. And Elizabeth Warren looks like she’s in trouble with 0 percent.

    These aren’t poll numbers for the U.S. 2020 Democratic presidential contest. Instead, they reflect which candidates were able to consistently land in Gmail’s primary inbox in a simple test.

    How archaeologists can dig deeper with big data

    Futurity, Brown University


    Centuries of archaeological research on the Inca Empire has netted a veritable library of knowledge. But new digital and data-driven projects are proving that there is much more to discover about precolonial life in the Andes.

    In a new study in the Journal of Field Archaeology, researchers detail new work they conducted in the former Inca Empire in South America using drones, satellite imagery, and proprietary online databases.

    “Some questions are really hard to answer when you’re looking at only one small story—say, digging a trench in the remains of a single house,” says Parker VanValkenburgh, assistant professor of anthropology at Brown University. “Those big questions about the entire Inca Empire—those are very hard to answer with just a pick and a trowel.”

    FARE Expands Partnership with Northwestern University, Announces Data Commons

    Food Industry Executive


    FARE (Food Allergy Research & Education), the world’s leading food allergy research, advocacy and education organization, today announced a data commons partnership with Northwestern University, home to the Center for Food Allergy and Asthma Research (CFAAR), which seeks to reduce the burden of disease and improve health equity by developing, evaluating and disseminating asthma and allergy interventions, as well as by conducting work to inform local, state, national and international health policy.

    Due to constant innovation in the field of health and biomedical informatics, it is now possible for researchers across the globe to share data and collaborate on secure and efficient platforms. These platforms, known as data commons, are collections of standardized data available in a common interface for download and/or analysis.

    First-year dean shares his vision for Carey Business School

    Johns Hopkins University, Hub


    Starting this fall, the two-year, full-time MBA program will offer two pathways for academic concentration: one focused on in-demand analytics and data skills and one that leverages Hopkins’ expertise in biotechnology and medicine to train students for health-related fields. With the support of a new $25 million gift from the W. P. Carey Foundation announced this week, the school will also expand its career development services to better prepare students for a competitive job market. … Carey alum Jack Hirsch reached out to Triantis for a wide-ranging conversation about the changes to the MBA curriculum and the new dean’s vision for the school. A member of the school’s inaugural full-time MBA Class of 2012, Hirsch is senior director of product management at Box, a cloud content management and file sharing company based in Silicon Valley.

    10 Breakthrough Technologies 2020 – MIT Technology Review

    MIT Technology Review


  • Unhackable internet
  • Hyper-personalized medicine
  • Digital money
  • Anti-aging drugs
  • AI-discovered molecules
  • Satellite mega-constellations
  • Quantum supremacy
  • Tiny AI
  • Differential privacy
  • Climate change attribution

  • eScience takes on 7 projects in the 2020 Winter Incubator program

    University of Washington, eScience Institute


    The goal of the Winter Data Science Incubator is to enable new research discoveries by bringing together data scientists and domain scientists to work on focused, intensive, collaborative projects. Our team of data scientists provide expertise in state-of-the-art technology and methods in large-scale data manipulation and analytics, cloud and cluster computing, statistics and machine learning, and visualization to help researchers extract knowledge from large, complex, and noisy datasets. Projects also frequently, but not exclusively, involve a non-trivial software engineering component. Each fall we release a call for proposals to participate in the Winter Incubator program. Selected projects identify a project lead who spends Tuesdays and Thursdays throughout winter quarter working one-on-one with a data scientist in the Data Science Studio. Projects also benefit from crosstalk during weekly full group meetings where all projects present updates and pressing challenges.

    This year, in our 7th iteration of this program, we selected 7 projects from 6 different departments/units on campus.


    The Implications of the Growing Role of Genetic Testing—Lecture by Anthropologist Rayna Rapp, March 25

    New York University, Bentson Dean's Lectures


    New York, NY March 25, starting at 5:30 p.m., NYU’s Jurow Lecture Hall. “How are we to make sense of the increase in testing in reproduction? Rapp will consider the rapidly escalating role of genetic testing in reproduction and will highlight the benefits and burdens of intensified testing in family-making and their gendered implications.” [free]

    libigl Hackathon 2020

    Alec Jacobson and Daniele Panozzo


    Toronto, ON, Canada March 20-21 at The Fields Institute for Research in Mathematical Sciences. “The libigl hackathon is two-day event of intense collaborative programming. We estimate the attendance will be 30–50 people. The targeted audience will be users or future users of the libigl software library. Most participants will have a background in field of geometry processing.” [application required]

    Pfizer/ASA/Columbia University Symposium on Risks and Opportunities of AI in Clinical Drug Development

    American Statistical Association, Pfizer, Columbia University


    New York, NY May 18, starting at 8:30 a.m., Columbia University. “This forum is, therefore, expected to serve as a platform for distinguished statisticians, data scientists, regulators, and other professionals to address the challenges and opportunities of AI in pharmaceutical medicine; to foster collaboration among industry, academia, regulatory agencies, and professional associations; and to propose recommendations with policy implications for proper implementation of AI in promoting public health.” [$$$]

    Boston Area Meeting on Computational Social Science

    Alex Pentland, David Lazer, Esteban Moro


    Cambridge, MA March 14 at MIT Media Lab. “We are inviting all the Computational Social Science research groups in the area to meet on the top floor of the Media Lab building, where we have space for posters from all projects (well, up to about 100), and have four quick highlight talks from each group, lunch (free!) and lots of meeting and discussing.” [registration required]


    CRCS Summer Fellowship

    “CRCS brings together computer scientists and scholars from a broad range of fields to make advances in computational research that serve public interest. We are currently collaborating on interdisciplinary projects in areas such as public health, poverty and inequality, and conservation. CRCS is informed by a deep knowledge of the societal issues at stake and a commitment to advancing the human condition through cutting edge collaborations between computational and social science.” Deadline for applications is March 4.

    ASA Data Visualization Poster Competition for Grades K–12

    “The ASA/NCTM Joint Committee on Curriculum in Statistics and Probability and the ASA’s education department encourage students and their advisers to participate in its annual Data Visualization Poster Competition.” Deadline for poster submissions is April 1.
    Tools & Resources

    YC’s New Guide to Raising a Series AYC’s New Guide to Raising a Series A

    Y Combinator, Aaron Harris and Janelle Tam


    This guide collects nearly everything we’ve learned about Series A’s. It is designed to help founders tactically and strategically plan for fundraises. We explain every step leading up to a successful raise, as well as the steps taken throughout the process; we also explain the logic behind every step. Along the way, we outline the expectations, incentives, and rough thought process of the investors involved, so founders feel less mystified.

    Use Data to Revolutionize Project Planning

    Harvard Business Review, Yael Grushka-Cockayne


    Today, changing attitudes toward data collection, data-driven prediction, and decision-making offers unprecedented opportunities in the field of project planning. With data, firms can now operationalize Kahneman and Tversky’s ideas, going beyond their original vision. Using historical data on projects’ initial forecasted completion dates and total costs, in addition to realized or actual expenditures and durations, accuracy estimates can be established. Such estimates can then be used when forecasting and setting new projects’ goals. Here are some examples of places where data-based prediction is occurring, as well as some publicly available resources you can use.

    In the U.K., data on project performance has been collected for over a decade now. The HM Treasury’s Green Book provides guidance on how project proposals should be appraised before significant public funds are committed. The appraisal procedure includes an explicit adjustment to account for systematic optimism, sometimes referred to as “optimism bias,” which is the overstatement of benefits and the understatement of durations and costs.

    A student guide for navigating ethical issues in the tech industry

    Mozilla Foundation


    The ethical issues facing the tech industry are abundant — military contracts, invasive data mining, biased algorithms, inhumane warehouse conditions, racist facial recognition software, and more. Addressing ethical issues in tech can be overwhelming for students interested in working in tech. But change in the industry is not impossible. And it is, increasingly, necessary.

    The first step is recognizing that those who build tech have always faced ethical challenges and that all tech workers, from warehouse workers to software developers, have always been at the forefront of organizing for change. By learning from the history of tech worker organizing and understanding the tactics used by tech workers today, students entering the tech industry can be empowered to improve the future of tech and the future of society.

    When to use CPUs vs GPUs vs TPUs in a Kaggle Competition?

    Towards Data Science, Paul Mooney


    Behind every machine learning algorithm is hardware crunching away at multiple gigahertz. You may have noticed several processor options when setting up Kaggle notebooks, but which one is best for you? In this blog post, we compare the relative advantages and disadvantages of using CPUs (Intel Xeon*) vs GPUs (Nvidia Tesla P100) vs TPUs (Google TPU v3) for training machine learning models that were written using tf.keras (Figure 1**). We’re hoping this will help you make sense of the options and select the right choice for your project.

    What you can learn from the most persuasive adverts

    BBC Worklife, David Robson


    Let’s start in a surprising place: the folk tale Three Little Pigs. A pig builds a straw house, only for the Big Bad Wolf to blow it down. Next up is a house made of wood – the wolf blows that down too, and gobbles up the pig inside. Finally, the wolf comes to a house full of bricks – but the wolf now fails to blow it down, despite all his huffing and puffing. So he tries to climb down the chimney, straight into a steaming cauldron. Now it’s the pig that has the tasty dinner.

    What you might not realise is that this familiar childhood story uses a powerful persuasive tool: the “repeat-break” plot structure. There is a basic idea – the wolf blowing the house down – which is repeated once to create an expectation, and which is then violated, producing a sense of surprise.


    Full-time, non-tenured academic positions

    Sr. Research Data Scientist

    University of California-Davis, DataLab: Data Science & Informatics; Davis, CA

    Application for Data Scientist – Schmidt DataX Project

    Princeton University, Center for Information Technology Policy; Princeton, NJ
    Internships and other temporary positions

    Seasonal field research assistants, Great Basin

    University of California-Davis, Earth Research Institute; Nevada and eastern California

    Project Manager for Data Science for Social Good

    Alan Turing Institute, University of Warwick; Coventry, England

    Postdoc in Music Cultures & AI

    KTH Royal Institute of Technology, Department of Intelligent Systems; Stockholm, Sweden

    Leave a Comment

    Your email address will not be published.