Data Science newsletter – August 9, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 9, 2018


Data Science News

How a Computer Learns To Dribble: Practice, Practice, Practice

Carnegie Mellon University, News


Basketball players need lots of practice before they master the dribble, and it turns out that’s true for computer-animated players as well. By using deep reinforcement learning, players in video basketball games can glean insights from motion capture data to sharpen their dribbling skills.

Researchers at Carnegie Mellon University and DeepMotion Inc., a California company that develops smart avatars, have for the first time developed a physics-based, real-time method for controlling animated characters that can learn dribbling skills from experience. In this case, the system learns from motion capture of the movements performed by people dribbling basketballs.

This trial-and-error learning process is time consuming, requiring millions of trials, but the results are arm movements that are closely coordinated with physically plausible ball movement. Players learn to dribble between their legs, dribble behind their backs and do crossover moves, as well as how to transition from one skill to another.

“Once the skills are learned, new motions can be simulated much faster than real-time,” said Jessica Hodgins, Carnegie Mellon professor of computer science and robotics.

Together, big data, bench science and genome-wide diagnostics predict genomic instability that can lead to disease

Baylor College of Medicine


They are the most common repeated elements in the human genome; more than a million copies are scattered among and between our genes. Called Alu elements, these relatively short (approximately 300 Watson-Crick base pairs), repetitive non-coding sequences of DNA have been implicated in the rapid evolution of humans and non-human primate species. Unfortunately, these repeats also cause genomic structural variation that can lead to disease.

Disease-causing Alu elements do not work alone. To cause structural variations, pairs of elements (Alu/Alu) mediate genomic rearrangements that result in either gene copy number gains or losses, and these changes can have profound consequences for an individual’s health.

Women and men get research grants at equal rates — if women apply in the first place

Science News, Bethany Brookshire


Women face an uphill battle in biomedical science, on many fronts. There is bias in hiring and in how other scientists view their research. Fewer women are chosen to review scientific papers. Men still outnumber women at the ivory tower’s highest floors, and of course, women in science face harassment based on their gender. But once the top of the hill is in sight — once a female scientist gets a coveted major research grant — the playing field levels out, a new study shows. Women who get major grants stay funded and head their labs just as long as men. The hitch? Women must reach the top of the academic hill and apply for those grants in the first place.

“We’ve known from the data that’s publicly available that women are getting approximately 50 percent or more of the biomedical Ph.D.s, but when the time comes to apply for grants, the number drops precipitously,” says Judith Greenberg, the deputy director of the National Institute of General Medical Science in Bethesda, Md. Less than one-third of first-time applicants for the big grants from the National Institutes of Health are women.

The Foundry at the Heart of DARPA’s Plan to Let Old Fabs Beat New Ones

IEEE Spectrum, Samuel K. Moore


Bloomington, Minn.,-based SkyWater makes chips using 90-nanometer process technology on equipment designed to handle 200-millimeter wafers of silicon. But why would DARPA invest so much in a foundry using a process technology that was cutting edge in 2004 on a size of silicon wafer that big players started to abandon more than 15 years ago?

Listening to the arguments of SkyWater president Thomas Sonderman makes it seem that this foundry is probably not just the best choice but that it might have been the only one.

Because it’s such a reliable process and one that’s widely available for making relatively small volumes of chips, “a lot of the government work is done around 90 nm,” says Sonderman. “So having a technology that can be evolved to higher performance but still follows 90-nm flow is very important.”

Tokyo Olympics to use facial recognition system as part of security efforts

ESPN, Associated Press


A facial recognition system will be used across an Olympics for the first time as Tokyo organizers work to keep security tight and efficient at dozens of venues during the 2020 Games.

The NeoFace technology developed by NEC Corp. will be customized to monitor every accredited person — including athletes, officials, staff and media — at more than 40 venues, games villages and media centers, Olympic and company officials said Tuesday.

Olympic officials said Tokyo will be the first Olympic host to introduce the face recognition technology at all venues. The system is expected to effectively eliminate entry with forged IDs, reduce congestion at accredited waiting lines and reduce athletes’ stress under hot weather.

Facebook Wanted Us to Kill This Investigative Tool

Gizmodo, Kashmir Hill and Surya Mattu


Last year, we launched an investigation into how Facebook’s People You May Know tool makes its creepily accurate recommendations. By November, we had it mostly figured out: Facebook has nearly limitless access to all the phone numbers, email addresses, home addresses, and social media handles most people on Earth have ever used. That, plus its deep mining of people’s messaging behavior on Android, means it can make surprisingly insightful observations about who you know in real life—even if it’s wrong about your desire to be “friends” with them on Facebook.

In order to help conduct this investigation, we built a tool to keep track of the people Facebook thinks you know. Called the PYMK Inspector, it captures every recommendation made to a user for however long they want to run the tool. It’s how one of us discovered Facebook had linked us with an unknown relative. In January, after hiring a third party to do a security review of the tool, we released it publicly on Github for users who wanted to study their own People You May Know recommendations. Volunteers who downloaded the tool helped us explore whether you’ll show up in someone’s People You Know after you look at their profile. (Good news for Facebook stalkers: Our experiment found you won’t be recommended as a friend just based on looking at someone’s profile.)

Facebook wasn’t happy about the tool.

As microbiome testing firms proliferate, so do questions about their claims

STAT, Leah Shaffer



icrobiome testing companies have become a thing, offering consumers a chance to see a snapshot of the billions of microbes that reside in their bodies. Some promise even more from a swab: personalized advice on how to improve your health.

“Take control of your gut bacteria to help with weight management, fitness, skin health and more!” says Thryve’s website. Meanwhile, Viome, launched with $21 million in startup funding last year, promises to help people “discover what’s happening inside your gut and get a personalized action plan to fix it,” using technology “developed for National Security.” Their competitor uBiome has a vaguer sales pitch, saying customers will “learn” and “discover” more about their microbiome — and it may have good reason to keep it simple.

There’s no doubt that the microbiome, the community of trillions of bacteria and viruses that live in a person’s body, has a profound impact on human health. But our understanding of the microbiome isn’t advanced enough, nor are the commercial tests precise enough, to guide customized health recommendations, experts told STAT.

[1804.08872] Assessment of Deep Convolutional Neural Networks for Road Surface Classification

arXiv, Computer Science > Computer Vision and Pattern Recognition; Marcus Nolte, Nikita Kister, Markus Maurer


When parameterizing vehicle control algorithms for stability or trajectory control, the road-tire friction coefficient is an essential model parameter when it comes to control performance. One major impact on the friction coefficient is the condition of the road surface. A camera-based, forward-looking classification of the road-surface helps enabling an early parametrization of vehicle control algorithms. In this paper, we train and compare two different Deep Convolutional Neural Network models, regarding their application for road friction estimation and describe the challenges for training the classifier in terms of available training data and the construction of suitable datasets.

As Russians hack the US grid, a look at what’s needed to protect it

The Conversation, Manimaran Govindarasu and Adam Hahn


The U.S. electricity grid is hard to defend because of its enormous size and heavy dependency on digital communication and computerized control software. The number of potential targets is growing as “internet of things” devices, such as smart meters, solar arrays and household batteries, connect to smart grid systems.

As researchers of grid security, we believe that current security standards mandated by federal regulations provide sufficient protection against observed threats. But recent incidents demonstrate the ongoing challenge of ensuring everyone follows the guidelines, which themselves must change over time to keep up with technological shifts.

The threat is real: In late 2015 and again in 2016, Russian hackers shut down parts of Ukraine’s power grid. In March 2018, federal officials warned that Russians had penetrated the computers of multiple U.S. electric utilities and were able to gain access to critical control systems. Four months later, the Wall Street Journal reported that the hackers’ access had included privileges that were sufficient to cause power outages.

Dell Medical School Launches Data Hub to Accelerate Biomedical Research, Advance Health

The University of Texas at Austin, UT News


Dell Medical School at The University of Texas at Austin is accelerating innovation and research by creating a Biomedical Data Science Hub to help solve complex research and clinical problems.

Imagine having a complicated scientific question: How do we predict who will be diagnosed with Type 2 diabetes based not only on clinical and family history, but also on lifestyle, community factors, work life and medical history?

The answer could improve the lives of millions because it could lead to early, simple, preventive interventions. That’s the power of big data analytics in health care: It uses huge amounts of a population’s data and state-of-the-art analysis methods to boil it down to a small core of information to potentially help prevent individual illnesses and large-scale epidemics, cure disease, personalize medical care, reduce expenses and more.

The Defense Department has produced the first tools for catching deepfakes

MIT Technology Review, Will Knight


The tools for catching deepfakes were developed through a program—run by the US Defense Advanced Research Projects Agency (DARPA)—called Media Forensics. The program was created to automate existing forensics tools, but has recently turned its attention to AI-made forgery.

“We’ve discovered subtle cues in current GAN-manipulated images and videos that allow us to detect the presence of alterations,” says Matthew Turek, who runs the Media Forensics program.

Metro Health-University of Michigan Health Partners With Eon to Improve Lung Cancer Care

Imaging Technology News


Metro Health-University of Michigan Health announced it is partnering with healthcare data company Eon to improve the management of lung cancer screening and pulmonary incidental nodule patients with EonDirect. EonDirect is the first cloud-based software powered by deep learning and machine automation to seamlessly interface with hospitals’ electronic medical records (EMR) and picture archiving and communication systems (PACS) to deliver meaningful, actionable data to healthcare providers.

Metlife partners with mortgage tech startup

Digital Insurance, Bloomberg News, Julie Verhage and Katherine Chiglinsky


Blend Labs Inc., a startup that first became known by helping big banks make it easier for homebuyers to get a mortgage online, is branching out into a new industry: insurance.

The San Francisco-based company is launching Blend Insurance Agency. Much like its mortgage platform, the firm will leverage partnerships with large insurers to scale its business by pairing the offerings. One of those initial partners will be MetLife Inc., a U.S. insurer with more than $700 billion in assets.

“Blend is the type of partner that we are looking for because our visions are very aligned,” said Kevin Chean, vice president of group auto and home at MetLife, adding the company is looking to strengthen its digital presence to meet changing consumer preferences. “They are used to doing more and more of their shopping online.”

Wall Street on West Peachtree

Georgia Tech, News Center


Georgia Tech was one of the first colleges in the country to build a Wall Street-style trading floor on campus, and inside we’re creating the next generation of investment bankers, stock traders, and financial experts.

During the regular semester, you can find undergraduate students building Excel models and master’s students from Georgia Tech’s Quantitative and Computational Finance program hard at work in this 2,000-square-foot facility. But when summer rolls around, the high-tech haven is open to high school students who learn the ins and outs of personal finance and investment.

O’Reilly Survey: The State of Enterprise Machine Learning Adoption

Business Wire, O'Reilly Media


O’Reilly, the premier source for insight-driven learning on technology and business, today announced its latest survey findings in the report, “The State of Machine Learning Adoption in the Enterprise.” As machine learning has become more widely adopted across industries, O’Reilly set out to learn more about how companies approach this work.


Data for Good Exchange 2018



New York, NY Sunday, September 16. “The overall theme for this year’s gathering will be “Our Data for Good?” or how data scientists, corporations, policy makers and researchers can collaborate on data science projects that result in positive social outcomes.” [free, registration required]

Tools & Resources

DataViz Cheatsheet

John Schwabish, Policy Viz blog


“When it comes to learning how to best visualize your data, there is a plethora of great books, websites, blogs, and podcasts. In my Core Principles classes and workshops, I provide this cheat sheet as a summary of some of the core lessons. Obviously, this doesn’t cover everything, but I think it serves as a good reminder of best practices and core things to keep in mind.”

Model-Based Machine Learning (Early Access): an online book

John Winn and Christopher Bishop with Thomas Diethe


This is an early access version of the book, made available so we can get feedback on the book as we write it. The completed book will have additional chapters, along with source code for every chapter. Please do send feedback, since it will help us to shape and improve the book!

Terence Parr


“Deep explanations of machine learning and related topics.”

7 Benefits of Using Search Engine Tools for Data Analysis

Oracle, Grant Ingersoll


Mention the word “search” to most laypeople and it conjures images of Google and Bing. Mention it to most data scientists and it usually conjures notions of keywords and text retrieval, and maybe a passing reference to open source projects like Elasticsearch, Apache Solr, or—if they are particularly well-versed—Apache Lucene. However, many of the data scientists I work with don’t usually understand the full breadth and depth of capabilities a modern search engine can help them on not just text problems, but most every data problem they encounter.

Most data scientists are quite comfortable discussing things like k-nearest neighbors, collaborative filtering, and k-means clustering—as well as the basics of linear algebra and statistics—but don’t seem to realize that the same techniques and tools that power their everyday work are the same ones that power search. So they’re often left either reinventing the wheel (most often poorly) or working less efficiently than they should be. By reframing many of their approaches to data exploration and analysis, data scientists can often see a significant increase in their productivity and their ability to analyze more types of data at scale and with less overhead.

I know, it sounds too good to be true, but I’ve seen it happen in my own career as well as with countless other data scientists and engineers. In this article (and future articles), I will walk you through how search technology can transform your data analysis—and also show you the places where it can fall short.


Full-time, non-tenured academic positions

Full stack programmer

Harvard Medical School, Undiagnosed Disease Network; Boston, MA

Postdoctoral Position in Hologenomics

Norwegian University of Science and Technology; Trondheim, Norway
Tenured and tenure track faculty positions

Assistant/Associate Professor, Nonhuman Primate Behavioral Neuroscience

Emory University, Department of Psychology; Atlanta, GA

Leave a Comment

Your email address will not be published.