Data Science newsletter – August 27, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 27, 2018


Data Science News

UMD’s computer science major will become a Limited Enrollment Program in fall 2019

University of Maryland, The Diamondback, Leah Brennan


Students planning to declare a computer science major at the University of Maryland will have to successfully complete a specific set of “gateway” requirements, beginning in fall 2019.

The major is set to become an Limited Enrollment Program that semester, according to a university site listing the programs. Students will have to pass CMSC131: Object-Oriented Programming I, CMSC132: Object-Oriented Programming II and MATH 140: Calculus I with a minimum grade of a C-, which for students entering freshman year as a computer science major would be reviewed once students have reached their 45-credit mark.

Deep Multi-Output Forecasting: Learning to Accurately Predict Blood Glucose Trajectories

arXiv, Computer Science > Machine Learning; Ian Fox, Lynn Ang, Mamta Jaiswal, Rodica Pop-Busui, Jenna Wiens


In many forecasting applications, it is valuable to predict not only the value of a signal at a certain time point in the future, but also the values leading up to that point. This is especially true in clinical applications, where the future state of the patient can be less important than the patient’s overall trajectory. This requires multi-step forecasting, a forecasting variant where one aims to predict multiple values in the future simultaneously. Standard methods to accomplish this can propagate error from prediction to prediction, reducing quality over the long term. In light of these challenges, we propose multi-output deep architectures for multi-step forecasting in which we explicitly model the distribution of future values of the signal over a prediction horizon. We apply these techniques to the challenging and clinically relevant task of blood glucose forecasting. Through a series of experiments on a real-world dataset consisting of 550K blood glucose measurements, we demonstrate the effectiveness of our proposed approaches in capturing the underlying signal dynamics. Compared to existing shallow and deep methods, we find that our proposed approaches improve performance individually and capture complementary information, leading to a large improvement over the baseline when combined (4.87 vs. 5.31 absolute percentage error (APE)). Overall, the results suggest the efficacy of our proposed approach in predicting blood glucose level and multi-step forecasting more generally.

Transforming Psychiatry into Data-Driven Medicine with Digital Measurement Tools

npj Digital Medicine; Honor Hsin et al.


Psychiatry has been limited by historically rooted practices centered primarily on subjective observation. Fields such as oncology have progressed toward data-driven clinical decision-making that combines subjective clinical assessment of symptoms and preferences with biological measures such as genetics, biomarkers, imaging, and integrative physiology to derive quantitative risk scores and decision support. In contrast, psychiatry has just begun to scratch the surface of measurement-based care with validated clinical questionnaires. An opportunity exists to improve modern psychiatric care with novel data streams from digital sensors combined with clinical observation and subjective self-report. The prospect of integrating this complex information with modern computational and analytical methods could advance the field, both in research and clinical practice. Here we discuss this possibility and propose some key priorities to enable these innovations toward improving clinical outcomes in the future. [full text]

AI and machine learning take centre stage at Microsoft’s student developer competition

ZDNet, Mary Branscombe


Microsoft’s annual Imagine Cup student developer competition would be easy to mistake for a commercial app accelerator. These aren’t science fair presentations or even 24-hour hackathon projects; in almost all the cases, they were near-professional prototypes, and many of the finalists had interest from investors, government departments and universities to carry on working on their projects.

That didn’t apply just to the three grand finalists, all of whom have investors, partners, medical facilities or government departments ready to work with them, but many of the 49 teams from around the world who made it to the final. That includes projects like Hachy, which used a smartphone camera and Azure’s custom image recognition service instead of a $4,000 spectrometer to check the progress of fertilized chicken eggs on their way to hatching. The Dairy Association of China wants to start using that app.

Boomerang, a project for geotagging asthma inhalers with Azure IoT and Wi-Fi to help sufferers (or their parents) find where they left the inhaler last is in a 12-month 200-user survey with John Hopkins University to validate clinical outcomes (if you can find your inhaler, you’ll use it more often).

Leaving the house – where preprints go

Crossref, Jennifer Lin and Karthik Ram


“Pre-prints” are sometimes neither Pre nor Print (c.f., but they do go on and get published in journals. While researchers may have different motivations for posting a preprint, such as establishing a record of priority or seeking rapid feedback, the primary motivation appears to be timely sharing of results prior to journal publication.

So where in fact do preprints get published?

Although this is a simple question, we have not had an easy way to answer how this varies across disciplines, preprint repositories and journals. Until now. Crossref metadata provides not only an open and easy way to do so, but up-to-date data to get the latest results.

How AI can be a force for good

Science, Perspective, Mariarosaria Taddeo and Luciano Floridi


The ethical debate on AI as a new form of agency dates to the 1960s (2, 4). Since then, many of the relevant problems have concerned delegation and responsibility. As AI is used in ever more contexts, from recruitment to health care, understanding which tasks and decisions to entrust (delegate) to AI and how to ascribe responsibility for its performance are pressing ethical problems. At the same time, as AI becomes invisibly ubiquitous, new ethical challenges emerge. The protection of human self-determination is one of the most relevant and must be addressed urgently. The application of AI to profile users for targeted advertising, as in the case of online service providers, and in political campaigns, as unveiled by the Cambridge Analytica case, offer clear examples of the potential of AI to capture users’ preferences and characteristics and hence shape their goals and nudge their behavior to an extent that may undermine their self-determination.

Artificial Intelligence Is Now a Pentagon Priority. Will Silicon Valley Help?

The New York Times, Cade Metz


In a May memo to President Trump, Defense Secretary Jim Mattis implored him to create a national strategy for artificial intelligence.

Mr. Mattis argued that the United States was not keeping pace with the ambitious plans of China and other countries. With a final flourish, he quoted a recent magazine article by Henry A. Kissinger, the former secretary of state, and called for a presidential commission capable of “inspiring a whole of country effort that will ensure the U.S. is a leader not just in matters of defense but in the broader ‘transformation of the human condition.’” Mr. Mattis included a copy of Mr. Kissinger’s article with his four-paragraph note.

Mr. Mattis’s memo, which has not been reported before and was viewed by The New York Times, reflected a growing sense of urgency among defense officials about artificial intelligence. The consultants and planners who try to forecast threats think A.I. could be the next technological game changer in warfare.

The Enterprise Journey to Artificial General Intelligence

Wall St Journal, Deloitte, Oren Etzioni


In March 2016, the American Association for Artificial Intelligence and I asked 193 AI researchers how long it would be until we achieve artificial superintelligence (ASI), defined as an intellect that is smarter than the best human in practically every field. Of the 80 respondents, 67.5 percent said it could take a quarter century or more, and 25 percent said it would likely never happen.

Given the sheer number of “AI is coming to take your job” articles appearing across media, these survey findings may come as a surprise to some. Yet they are grounded in certain realities. While psychometrics measure human IQ fairly reliably, AI psychometrics are not nearly as mature. Ill-formed problems are vague and fuzzy, and wrestling them to the ground is difficult.

Few interactions in life have clearly defined rules, goals, and objectives, and the expectations of artificial general intelligence (AGI)—with capabilities that could largely match human “gut instincts” on such areas as language communications—are squishy. How can one person tell whether another has understood a sentence properly? Improving speech recognition doesn’t necessarily enhance language understanding, since even simple communication can quickly get complicated—consider the numerous ways to order coffee, for instance. Successfully creating AGI that matches human intellectual capabilities—or ASI that surpasses them—will require dramatic improvements over the current state of AI.

Discovering the Art of the Possible for Machine Learning and Disaster Risk Management



When a natural disaster strikes, knowing where people and buildings are is of the utmost importance for saving lives. The World Bank Global Facility for Disaster Reduction and Recovery (GFDRR) and DataKind teamed up to understand how might satellite imagery and machine learning aide in disaster risk management and improving resilience for vulnerable communities. DataKind’s DataCorps team of pro bono data scientists completed an assessment of current literature on how satellite imagery could improve disaster relief efforts using image analysis and convolutional neural networks, and then made recommendations for how the GDFRR could scale efforts using their imagery in Sri Lanka to build and implement detection models.

DARPA Expert Warns Autonomous Car Firms Aren’t Teaming Up on Safe A.I.

Inverse, Mike Brown


Autonomous car development has a cooperation problem. That’s according to Hava Siegelmann, program manager of the Defense Advanced Research Projects Agency (or DARPA), who claimed that the companies she’s spoken to are lukewarm about working together on making A.I. safer.

“The next program that I’ve been working on is about safety of A.I. […] the companies that I’ve contacted that are working on drones and self-driving cars, said they are actually not interested to work together on safety of A.I.,” Siegelmann said in a panel discussion at the Human-Level Artificial Intelligence conference in Prague, Czech Republic, on Friday. The comments come as automakers ramp up their autonomy efforts, with Waymo reaching eight million miles this summer, but questions have lingered around the best way to approach safety. In March, 49-year-old Elaine Herzberg was the first person killed by an autonomous car. But while automakers have shown enthusiasm for on-road communications technologies like 5G, they’ve shown less interest in communicating about development.

‘NASA Selfies’ and TRAPPIST-1 VR Apps Now Available

NASA Jet Propulsion Laboratory


The universe is at your fingertips with two new digital products from NASA.

The NASA Selfies app and NASA’s Exoplanet Excursions virtual reality app were created to celebrate the 15th anniversary of the launch of NASA’s Spitzer Space Telescope. Spitzer’s incredible discoveries and amazing images are at the center of these new products.

Tweet of the Week

Twitter, Antti Lipponen


Data Visualization of the Week

Twitter, Blakely Hoffman Payne



HPML 2018 – High Performance Machine Learning Workshop



Lyon, France September 24, held in conjunction with IEEE SBAC-PAD 2018. [$$$]

Data-Pop Alliance Workshop at MIT Media Lab – Leveraging Big Data for Sustainable Development

Data-Pop Alliance


Cambridge, MA October 15-17, “part of our Global Professional Training Program on Big Data for Sustainable Development developed and delivered in partnership with UNSSC and financial support from the Hewlett Foundation.” [$$$$]


Relational Representation Learning | NIPS 2018 Workshop

Montreal, QC, Canada Workshop is December 8. Submission deadline for papers is October 19.
Tools & Resources

Build botnet detectors using machine learning algorithms in Python [Tutorial]

Packt Hub, Melisha Dsouza


Botnets are connected computers that perform a number of repetitive tasks to keep websites going. Connected devices play an important role in modern life. From smart home appliances, computers, coffee machines, and cameras, to connected cars, this huge shift in our lifestyles has made our lives easier. Unfortunately, these exposed devices could be easily targeted by attackers and cybercriminals who could use them later to enable larger-scale attacks. Security vendors provide many solutions and products to defend against botnets, but in this tutorial, we are going to learn how to build novel botnet detection systems with Python and machine learning techniques.

One Data Science Job Doesn’t Fit All

LinkedIn, Elena Grewal


One of the fun things about being a leader at a hyper-growth company is that you don’t just have the opportunity to change things — you must drive change to keep up. And working in the new and rapidly evolving field of Data Science (DS) entails another level of rapid change. We are evolving within the company and as an industry in parallel.

At Airbnb, we think of data as the voice of our users at scale. Our goal is for data scientists to maximize their impact and to look forward to coming to work. Achieving this goal is a work in progress, and we’re continually looking for ways to improve. We recently established a role-defining framework as a part of this evolution. My hope is that what we’ve learned along the way can help other companies be strategic in defining data science roles.

The main takeaway I will share is that companies consider three tracks of data science work to meet the needs of your business — Analytics, Inference, and Algorithms. Below I’ll describe the evolution of how we came to these three tracks of work and how it helps.

Binder 2.0 – Reproducible, interactive, sharable environments for science at scale

SciPy 2018


Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources. Binder works with pre-existing workflows in the analytics community, aiming to create interactive versions of repositories that exist on sites like GitHub with minimal extra effort needed. This paper details several of the design decisions and goals that went into the development of the current generation of Binder.

Commit to Clean Data: Collaborative Curation

Trifacta, Adam Wilson


At Trifacta, we’re committed to clean data. That’s why we created the Clean Data Manifesto, which is focused on this commitment and anchored by our five tenets of proper data preparation. I’ve walked through the first two tenets in my last two blog posts—the first, focused on prioritizing and setting targets, and the second, about diving into the data to make sure that you’re able to identify issues early and often.

As is the case with so much critical work, data preparation is a team effort that requires the seamless orchestration of a lot of moving parts in order to curate data quality. That’s why our third tenet is collaborative curation.



Postdoctoral Associate (2)

New York University, Center for Data Science; New York, NY

Postdoc Position

Indiana University, Department of Psychological and Brain Sciences; Bloomington, IN

Leave a Comment

Your email address will not be published.