Data Science newsletter – October 25, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 25, 2017


Data Science News

Rice expert: Be concerned about how apps collect, share health data

Rice University


As of 2016 there were more than 165,000 health and wellness apps available though the Apple App Store alone. According to Rice University medical media expert Kirsten Ostherr, the Food and Drug Administration (FDA) regulates only a fraction of those. Americans should be concerned about how these apps collect, save and share their personal health data, she said.

Professor Lisa Dierker – Falling in Love with Statistics: Shaping Students’ Relationships with Data


Statistical data analysis is a cornerstone of the sciences and operates as a shared language across disparate fields, from neuroscience to astronomy. However, current curricula often result in disengaged and stressed students who struggle to connect the concepts of statistics to the real world. Professor Lisa Dierker and her team at Wesleyan University have developed a novel approach to teaching statistics and data analysis that empowers students from diverse educational backgrounds. Her program, Passion-Driven Statistics, offers a multidisciplinary project-based approach that is both supportive and engaging for students at all levels of statistical mastery and those coming from diverse educational backgrounds.

Meeting Titans of Open Data

Ari Lamstein


The recent Association of Public Data Users (APDU) Conference gave me the opportunity to meet some people who have made tremendous contributions to the world of Open Data.

Boeing Invests in Drone Startup in Push for Automated Technology

Bloomberg Technology, Dana Hull


Boeing Co. is investing in Pittsburgh-based Near Earth Autonomy, a self-guided drone startup, marking its first financial backing for a company specializing in autonomous technology since establishing the HorizonX venture fund in April.

Boeing, the world’s largest aerospace company, declined to specify the size of the investment. Near Earth Autonomy authorized the sale of about $10 million in equity, which would value the company at $40 million to $50 million, according to an estimate by private stock market operator Equidate. HorizonX typically makes investments “that span the single millions up to the low teens,” a spokeswoman said.

At MIT and Georgia Tech, MOOCs Are Showing Up On Campus

Class Central, Dhawal Shah


Recently, Georgia Tech and MIT in certain courses were given a choice: enroll in the traditional on-campus course, or sign up for a parallel version of the class that would be completely online.

These courses are available on edX as MOOCs. In fact, the residential students had to enroll in these courses on — in the same version of the course that is open to the rest of the world for free. There are many instances of MOOCs being offered for credit to learners who are not enrolled in any of the corresponding university’s programs. But this is the first time on-campus students can earn credit from a MOOC.

The courses being offered simultaneously online and on-campus include MIT’s Circuits and Electronics and Georgia Tech’s Introduction to Computing using Python.

A Berkeley View of Systems Challenges for AI

EECS at UC Berkeley


With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies.

The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore’s Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI’s potential to improve lives and society.

On Rewarding ‘Bullshit’: Algorithms Should Not Be Grading Essays

Undark magazine, Kai Riemer


By using algorithms to grade papers, we risk encouraging writing that follows a script but essentially says nothing of worth.

NYU Abu Dhabi to host space research in a new data center

DatacenterDynamics, Sebastian Moss


New York University Abu Dhabi will build a data center in the UAE for archiving and processing scientific datasets obtained during space missions.

Work on the ‘National Data Center’ will start next year, with the facility designed to have “relevant capacity” for facilitating pre-launch studies associated with Transiting Exoplanet Survey Satellite (launch in 2018), Solar Orbiter (launch 2019) and the Emirates Mars Mission (launch 2020).

Are you getting sick? Predicting influenza-like symptoms using human mobility behaviors

EPJ Data Science; Gianni BarlacchiEmail authorView ORCID ID profile, Christos Perentis, Abhinav Mehrotra, Mirco Musolesi and Bruno Lepri


Understanding and modeling the mobility of individuals is of paramount importance for public health. In particular, mobility characterization is key to predict the spatial and temporal diffusion of human-transmitted infections. However, the mobility behavior of a person can also reveal relevant information about her/his health conditions. In this paper, we study the impact of people mobility behaviors for predicting the future presence of flu-like and cold symptoms (i.e. fever, sore throat, cough, shortness of breath, headache, muscle pain, malaise, and cold). To this end, we use the mobility traces from mobile phones and the daily self-reported flu-like and cold symptoms of 29 individuals from February 20, 2013 to March 21, 2013. First of all, we demonstrate that daily symptoms of an individual can be predicted by using his/her mobility trace characteristics (e.g. total displacement, radius of gyration, number of unique visited places, etc.). Then, we present and validate models that are able to successfully predict the future presence of symptoms by analyzing the mobility patterns of our individuals. The proposed methodology could have a societal impact opening the way to customized mobile phone applications, which may detect and suggest to the user specific actions in order to prevent disease spreading and minimize the risk of contagion. [full text]

7 Recommendations for Data Sci

Ben Weber, Gamasutra:Ben's Blog


After five years of working on different data teams in the game industry, I recently decided to join a startup in the finance industry. While many data teams were allocated large amounts of resources, it was usually challenging to actually put findings into production and have customer facing impact. Based on my experience at Twitch, Electronic Arts, and Daybreak Games, I have the following recommendations for leaders of data science teams:

1. Set realistic expectations for the Data Scientist role

The rise of big data policing

TechCrunch, Andrew Guthrie Ferguson


Real-time crime data comes in. Real-time police deployments go out. This high-tech command center in downtown Los Angeles forecasts the future of policing in America.

Welcome to the Los Angeles Police Department’s Real-Time Analysis Critical Response (RACR) Division. The RACR Division, in partnership with Palantir—a private technology company that began developing social network software to track terrorists—has jumped head first into the big data age of policing.

Ruggedized, High-Performance Storage Cards Target New Use Cases

Design News, Charles Murray


Western Digital Corp. has rolled out new high-capacity data storage cards with faster read- and write-speeds to meet new use cases in automotive, as well as in drones, surveillance cameras, and myriad industrial applications.

The new SanDisk Automotive and SanDisk Industrial cards are targeted at a growing cadre of applications that require local intelligence. “We’re moving from storage being mainly read from, to applications where storage is also being written to,” Oded Sagee, director of product marketing for embedded and integrated solutions at Western Digital Corp. “It’s not just a matter of more data – it’s about changes in the use cases.”

Knowledge for Precision Medicine – Mechanistic Reasoning and Methodological Pluralism

JAMA, The JAMA Network, Viewpoint; Mark R. Tonelli, MD, MA and Brian H. Shirts, MD, PhD


Precision medicine (PM) describes prevention, diagnosis, and treatment strategies that take individual variability into account.1 While PM aims to incorporate individual variability in genes, environment, and lifestyle, the emphasis in current practice is on personalized genetic profiling for diagnosis and risk assessment.

As genetic testing and interpretation advance, PM stands to move medicine away from the population-based knowledge that grounds evidence-based medicine (EBM) to the treatment of patients “based on a deep understanding of health and disease attributes unique to each individual.”2(p1842) Such understanding requires a different and broader concept of medical knowledge, the development of new methods for generating such knowledge, and approaches for incorporation into clinical practice. As PM advances, for some decisions it will replace the population-based “best evidence” of EBM with specific and detailed understanding of what makes an individual patient different from others. To practice PM, clinicians should reconsider current notions regarding the relative value of evidence, as case-based reasoning and understanding of mechanisms will figure more prominently.

Teaching Machines How to Learn: An Interview with Animashree Anandkumar | Caltech



New Caltech faculty member Animashree (Anima) Anandkumar is researching ways to make machine learning fast and practical for real-world use. A Bren Professor of Computing and Mathematical Sciences in the Division of Engineering and Applied Science, she develops efficient techniques to speed up optimization algorithms that underpin machine-learning systems. Born in Mysore, India, Anandkumar received her B.S. in electrical engineering from the Indian Institute of Technology Madras and her PhD from Cornell University. She was a postdoctoral researcher at MIT from 2009 to 2010 and an assistant professor at UC Irvine from 2010 to 2016, as well as a visiting researcher at Microsoft Research New England in 2012 and 2014. Since 2016, she has been a principal scientist at Amazon Web Services, working on the practical aspects of deploying machine learning at scale using the cloud infrastructure. Recently, Anandkumar answered a few questions about her research and the future of machine learning at Caltech.

Big data meets Big Brother as China moves to rate its citizens

Wired UK, Rachel Botsman


The Chinese government plans to launch its Social Credit System in 2020. The aim? To judge the trustworthiness – or otherwise – of its 1.3 billion residents

The Worst Tweeter In Politics Isn’t Trump

FiveThirtyEight; Oliver Roeder, Dhrumil Mehta and Gus Wezerek


Twitter has become the de facto public podium for President Trump.1 So what blend of retweets, likes and replies characterizes the response to tweets from the most powerful public figure in the United States? And how does it compare to the way people react to the tweets of his powerful governing colleagues — friendly and rival both — in the U.S. Senate?2


Lynford Lecture | NYU Tandon School of Engineering

NYU Tandon School of Engineering


Brooklyn, NY Thursday, November 30, starting at 4 p.m., Dibner Building, Pfizer Auditorium (5 MetroTech Center). Title: LIGO and the Dawn of Gravitational Wave Astronomy. Lecturer: Peter Fritschel (MIT Kavli Institute for Astrophysics and Space Research). [free, rsvp required]

MinneMUDAC 2017 – Fall Student Challenge



Eden Prairie, MN November 4. “MinneAnalytics is proud to host this second-annual analytics event inviting teams of graduates and undergraduates to explore real-world data while enhancing and showcasing their skills.” [Sold out, waiting list available]

DATAPALOOZA 2017 — University of Virginia, Data Science Institute

University of Virginia, Data Science Institute


Charlottesville, VA November 9-10. [separate registrations for 11/9 and for 11/10]

NYU Computer Science Department – Numerical Analysis and Scientific Computing Seminar

NYU, Courant Institute


New York, NY November 10, starting at 10 a.m., Warren Weaver Hall 1302. Speaker: Lars Ruthotto (Emory University) will present “An Optimal Control Framework for Efficient Training of Deep Neural Networks” [free]


AI4ALL Bay Area Mentor apps open!

AI4ALL connects high school students with Bay Area AI professionals for a 3-month mentorship experience where mentors and mentees collaborate on projects that use AI for good.”

NYU J-Term Startup Sprint

“A two-week intensive program for NYU undergrad, graduate and postdocs to experiment, learn from customers, and receive expert coaching on their ventures along the way.” Deadline to apply is November 20.

IBM Watson AI XPrize Adds Wild-Card Round

“The $5 million IBM Watson AI XPrize competition, which kicked off last year and will end in 2020, was the first of the XPrize contests (14 since 1995) to have a contestant-defined “open” goal rather than a predetermined objective. Now it is also the first XPrize to add a wild card, giving new contestants until Dec. 1 to join the 147 teams that made the first-year cut.”
Tools & Resources

[1710.06068] Data analysis recipes: Using Markov Chain Monte Carlo

arXiv, Astrophysics > Instrumentation and Methods for Astrophysics; David W. Hogg and Daniel Foreman-Mackey


Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties.”

TigerGraph Builds a Bigger Graph Database

The New Stack, Susan Hall


“Five years in the making, TigerGraph came out earlier this month with its graph database platform featuring parallel processing and analytics.”

“Its native parallel graph technology (NPG) powers real-time deep link analytics for enterprises trying to graph and process really Big Data. It’s touting it as the only system on the market to unify real-time analytics with large-scale offline data processing for graphs.”

From academia to co-founding a startup

Insight Data Science


Kari Goodman is a Data Scientist and Co-founder at AnimalBiome. … Here, she shares her experience in transitioning from academia to life in a startup.

Open Access Week 2017: Launch of LIS Scholarship Archive

April Hathcock, At The Intersection blog


“This Open Access Week, I’m proud to help spread the word about a new platform for sharing library and archive work, the newly launching (as of October 25) LIS Scholarship Archive, or LISSA.”

Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework

Uber, Engineering, Alex Sergeev & Mike Del Balso


“Uber Engineering introduced Michelangelo, an internal ML-as-a-service platform that democratizes machine learning and makes it easy to build and deploy these systems at scale. In this article, we pull back the curtain on Horovod, an open source component of Michelangelo’s deep learning toolkit which makes it easier to start—and speed up—distributed deep learning projects with TensorFlow.”

Suite of free, open-source tools to help even non-experts monitor large-scale land use change

Mongabay, Wildtech, Sue Palminteri


Collect Earth is a free, open-source tool built on Google Earth that enables non-experts to assess deforestation and other land cover change through point sampling.


Full-time positions outside academia

Research Scientist, Aristo

Allen Institute for Artificial Intelligence; Seattle, WA

Leave a Comment

Your email address will not be published.