Data Science newsletter – October 19, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for October 19, 2018


Data Science News

How deep learning helps archaeologists rediscover the past

TechRepublic, Tony Pacheco


With the wide use of remote sensor data in archaeology, deep learning allows the archaeologist to use the machine for much of the grunt work humans previously did. With the image recognition abilities of machine learning, this means more time can be spent on verification than on identification.

“I found that the way that it’s been done before could probably be improved with the use of deep learning,” said Iris Kramer, a computer science Ph.D. student at the University of Southampton in the United Kingdom.

From Paywall to Datawall

The Scholarly Kitchen, Lisa Janicke Hinchliffe


I found myself thinking about this as I clicked the link in a tweet to read a review of the movie Paywall: The Business of Scholarship. From the tweet I saw, I was expecting the text of the review to appear. Instead, given the review was published in The Lancet, there was a demand to create an account (or use my existing one if I had one). Given the topic of the movie, I couldn’t resist tweeting out “LOL. Review isn’t OA.” More than one reply came back observing: yes, but it’s free.

True, Elsevier tells you this straightforwardly on the page you encounter: “This article is available free of charge. Simply log in to access the full article, or register for free if you do not yet have a username and password.”

Well, sure, the article is “free” in the sense there is no monetary transaction. But, not free in the sense that I must trade my time and my personal information in exchange for the access. And, I must consent to account terms — e.g., data tracking, analysis, reporting — that I have no mechanism for negotiating. Instead of a paywall, I face a datawall.

IU will lead $2 million partnership to expand access to research data: News at IU: Indiana University

Indiana University, News at IU


IU Libraries and IU Network Science Institute are leading a public-private partnership to create the Shared BigData Gateway for Research Libraries

University of Pittsburgh Scientist Receives $1.2 Million NSF Grant to Find Big Data Solutions for Complications from Anesthesia

University of Pittsburgh, Swanson Engineering


The National Science Foundation awarded $1,182,305 to the University of Pittsburgh Swanson School of Engineering to support research into using machine learning and Big Data to analyze electronic anesthesia records and prevent postoperative complications and death.

Heng Huang, John A. Jurenko Professor in Computer Engineering at Pitt, is principal investigator on the study titled “SCH: INT: New Machine Learning Framework to Conduct Anesthesia Risk Stratification and Decision Support for Precision Health” (Award No. 1838627). Dr. Huang will analyze more than two million cases of anesthesia data taken from 303 UPMC clinics and treatment centers.

“A human doctor uses guidelines from manuals in combination with subjective experience to determine patients’ risk factors and needs,” says Dr. Huang. “We are using artificial intelligence and machine learning to develop an objective way to predict surgical outcomes based on historical patient data.”

From NASA To Jerusalem: Israeli Scientist Develops Handheld Monitor To Detect Early Breast Cancer

No Camels, Angelica Kouperman


Dr. Yehudit Abrams, a former scientist at NASA now living in Jerusalem, came up with an idea for a revolutionary new device that will allow women to test for tumors and other changes in their breast in the comfort of their own homes.

The handheld device, called MonitHer, is set to bring new hope to cancer monitoring and takes the guesswork and uncertainty out of manual self-examinations that check for abnormalities and changes in the breast. It may well put women and men on a path toward regular, safe, reliable tracking of their breast health.

Palantir, Peter Thiel’s All-Seeing Eye, Looks to a $41 Billion I.P.O.

Vanity Fair, The Hive blog, Maya Kosoff


Going public could more than double Thiel’s net worth—and prove that, even in Palo Alto, a close relationship with the Trump administration can pay off.

[D] ML is losing some of its luster for me. How do you like your ML career? : MachineLearning


I feel like the effort I put into rigorous and reasoned application of ML is wasted and makes me less competitive – management wants the “deep learning” solution and they are satisfied by someone reading a blog post, throwing half-baked training data and Keras at the problem and calling it solved. I’m not sure I can do ML in an environment like that, and it’s difficult to push back against the seductive hype of “cheap and easy” deep learning (ironically a simple random forest would be much easier and often quite effective, but that isn’t sexy. I’ve seen pressure to use neural networks even when something else makes much more sense to use). I love ML and like seeing others learn and be excited about it, but the low barrier to entry makes it easy for people to sell bad modeling to those who don’t know any better.

How are you all enjoying your ML career? I’m considering moving away from ML and going back into software engineering, but maybe I just need to switch companies. Perhaps I’m just a curmudgeon or an idealist. Does anyone else have similar thoughts? [114 comments on October 19]

In Admissions, Harvard Favors Those Who Fund It, Internal Emails Show

The Harvard Crimson, Delano R. Franklin and Samuel W. Zwickel


Getting into Harvard is hard. But it’s a lot less hard if your family promises to pay for a new building, according to internal emails presented in court on the third day of the Harvard admissions trial.

John M. Hughes, a lawyer for Students for Fair Admissions — the anti-affirmative action group suing the College over its race-conscious admissions policies — introduced the emails in a bid to prove Harvard unfairly prefers the wealthy and well-connected. Hughes read each messsage aloud before grilling the College’s long-serving Dean of Admissions and Financial Aid William R. Fitzsimmons ’67 on their contents.

Computer Stories: A.I. Is Beginning to Assist Novelists

The New York Times, David Streitfeld


Robin Sloan has a collaborator on his new novel: a computer.

The idea that a novelist is someone struggling alone in a room, equipped with nothing more than determination and inspiration, could soon be obsolete. Mr. Sloan is writing his book with the help of home-brewed software that finishes his sentences with the push of a tab key.

It’s probably too early to add “novelist” to the long list of jobs that artificial intelligence will eliminate. But if you watch Mr. Sloan at work, it is quickly clear that programming is on the verge of redefining creativity.

A neuroscientist explains the limits and possibilities of using technology to read our thoughts

The Verge, Angela Chen


[Russell] Poldrack is a neuroscientist at Stanford University and the author of The New Mind Readers: What Neuroimaging Can and Cannot Reveal about Our Thoughts (out now from Princeton University Press). His research focuses on what we can learn from brain imagining techniques such as fMRI, which measures blood activity in the brain as a proxy for brain activity. And one of the clearest conclusions, he writes, is that activity in a particular brain region doesn’t actually tell us what the person is experiencing.

The Verge spoke to Poldrack about the limits and possibilities of fMRI, the fallacies that people commit in interpreting its results, and the limits of its widespread use. This interview has been lightly edited for clarity.

Artificial intelligence crowdsources data to speed up drug discovery

Science News, Maria Temming


A new cryptographic system could allow pharmaceutical companies and academic labs to work together to develop new medications more quickly — without revealing any confidential data to their competitors.

The centerpiece of this computing system is an artificial intelligence program known as a neural network. The AI studies information about which drugs interact with various proteins in the human body to predict new drug-protein interactions.

More training data beget a smarter AI, which was a challenge in the past because drug developers generally don’t share data due to intellectual property concerns. The new system allows an AI to crowdsource data while keeping that information private, which could encourage partnerships for speedier drug development, researchers report in the Oct. 19 Science.

Data science, the ‘new Latin’ for students, in demand in Silicon Valley

San Francisco Chronicle, Melia Russell


David Wagner had a question for his students. Standing in front of one of the fastest-growing classes at UC Berkeley, the professor explained lines of code used to construct a graph comparing pressure in footballs used by the New England Patriots during the 2015 AFC Championship — the subject of a hashtag-ready scandal known as “Deflategate.”

“You’re the consultant reporting back to the NFL on what you’ve found. Do you buy their story?” Wagner said, prompting boos from the lecture hall as the students — like the NFL — concluded that the Patriots had deflated their footballs.

In spring 2016, UC Berkeley’s first foundations of data science course attracted around 300 students. This semester, about 1,300 have enrolled, and as of this month, most eligible undergraduates can choose data science as their major.

These New Tricks Can Outsmart Deepfake Videos—for Now

WIRED, Science, Sarah Scoles


For weeks, computer scientist Siwei Lyu had watched his team’s deepfake videos with a gnawing sense of unease. Created by a machine learning algorithm, these falsified films showed celebrities doing things they’d never done. They felt eerie to him, and not just because he knew they’d been ginned up. “They don’t look right,” he recalls thinking, “but it’s very hard to pinpoint where that feeling comes from.”

Finally, one day, a childhood memory bubbled up into his brain. He, like many kids, had held staring contests with his open-eyed peers. “I always lost those games,” he says, “because when I watch their faces and they don’t blink, it makes me very uncomfortable.”

These lab-spun deepfakes, he realized, were needling him with the same discomfort: He was losing the staring contest with these film stars, who didn’t open and close their eyes at the rates typical of actual humans.

To find out why, Lyu, a professor at the University of Albany, and his team dug into every step in the software, called DeepFake, that had created them.

Will Compression Be Machine Learning’s Killer App?

Pete Warden's blog


When I talk to people about machine learning on phones and devices I often get asked “What’s the killer application?“. I have a lot of different answers, everything from voice interfaces to entirely new ways of using sensor data, but the one I’m most excited about in the near-team is compression. Despite being fairly well-known in the research community, this seems to surprise a lot of people, so I wanted to share some of my personal thoughts on why I see compression as so promising.

I was reminded of this whole area when I came across an OSDI paper on “Neural Adaptive Content-aware Internet Video Delivery“. The summary is that by using neural networks they’re able to improve a quality-of-experience metric by 43% if they keep the bandwidth the same, or alternatively reduce the bandwidth by 17% while preserving the perceived quality. There have also been other papers in a similar vein, such as this one on generative compression, or adaptive image compression. They all show impressive results, so why don’t we hear more about compression as a machine learning application?


Exponential Medicine

Singularity University


San Diego, CA November 4-7. “Intensive four-day experience that gathers world-class faculty, innovators and organizations from across the biomedical and technology spectrum to explore and leverage the convergence of fast moving technologies in the reinvention of health and medicine.” [$$$$, application required]

AIR Seminar: “‘Does This Vehicle Belong to You?’ Processing the Language of Policing for Improving Police-Community Relations” Dan Jurafsky (Stanford)

Boston University, Rafik Hariri Institute for Computing and Computational Science & Engineering


Boston, MA October 22, starting at 10:30 a.m., Boston University Kilachand Center (610 Commonwealth Avenue). [free]

Tools & Resources

‘Defining the universe’: An essential in writing about survey data

Pew Research Center, John Gramlich


At Pew Research Center, we conduct surveys in the United States and dozens of other countries on topics ranging from politics and religion to science and technology. Given the wide range of people we speak to for our polls – and the many issues we ask them about – it’s important to be as clear as possible in our writing about exactly who says what.

In research circles, this practice is sometimes called “defining the universe” – that is, clearly identifying the population whose attitudes we’re studying, whether those people are police officers in the U.S., Christians in Western Europe or some other specific group. This kind of clarification can go a long way toward ensuring that readers interpret survey results correctly.

Hypothetical Outcome Plots (HOPs) Help Users Separate Signal from Noise

Medium, UW Interactive Data Lab


In daily life, we often find ourselves trying to separate signal from noise. For example, does the monthly jobs report suggest a growth trend, or that the jobs rate is steady? In a pair of experiments, we found that hypothetical outcome plots (HOPs) — animated samples of possible outcomes — can help people to make this judgment with greater accuracy.

Graph Nets library

GitHub – deepmind


“Graph Nets is DeepMind’s library for building graph networks in Tensorflow and Sonnet.”

BAIR Open-Sources Popular DeepMimic Project



The DeepMimic paper’s first author, Berkeley PhD student Xue Bin Peng, has now open-sourced the project’s codes, data, and frameworks. Moreover, Peng’s new research demonstrates that DeepMimic’s simulated characters can also learn to perform highly dynamic movements by using regular video clips of human examples as input data.

Sutton & Barto Book: Reinforcement Learning: An Introduction

Richard S. Sutton and Andrew G. Barto


Second Edition in progress, online draft


Full-time, non-tenured academic positions

Technology Policy Clinic Lead

Princeton University, Center for Information Technology Policy; Princeton, NJ
Tenured and tenure track faculty positions

Professorship in Human and Crowd Computing

University of Zurich, Department of Informatics; Zurich, Switzerland

Leave a Comment

Your email address will not be published.