Data Science newsletter – November 30, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for November 30, 2016


Data Science News

Hacking Epidemics in a Hyper-Connected World

NYU Tandon School of Engineering

from November 28, 2016

At the first hints of a disease outbreak, epidemiologists, health care providers, policy makers, and scientists turn to sophisticated predictive models to determine how an illness is spreading and what should be done to minimize contagion. A research collaboration between the New York University Tandon School of Engineering and Politecnico di Torino in Italy is upending the traditional modeling process, yielding predictions that are both simpler to calculate and more attuned to a hyper-connected world.

All predictive models correlate the movement of an illness through a population over time, but current simulations fail to account for a seemingly obvious idea: that mobility and activity varies among people, and that these variations impact the likelihood of contracting or spreading an illness.

A new paradigm was explained in a paper published in Physical Review Letters by Maurizio Porfiri, a professor of mechanical and aerospace engineering at NYU Tandon, Alessandro Rizzo, a visiting professor at NYU Tandon and an associate professor of control engineering at Politecnico, and Lorenzo Zino, a Politecnico doctoral student in pure and applied mathematics.

New Measurement System Can Help Navy Conserve Energy

SIGNAL Magazine

from November 29, 2016

Engineers at the Massachusetts Institute for Technology (MIT), with support from the Office of Naval Research (ONR), have developed a portable measurement system to precisely and inexpensively monitor the amount the electricity used by individual household appliances, lighting fixtures and electronic devices.

The system was developed by MIT professor Dr. Steven Leeb and one of his graduate students, Dr. John Donnal, a former U.S. Army captain.

MapD Tweetmap


from November 29, 2016

MapD is a GPU (Graphics Processor Unit)-powered database and visualization platform designed for lightning-fast, immersive data exploration that eliminates the disconnect between analyst and data. By bringing the power of GPU supercomputing to data analytics, MapD can query and visualize billions of records in milliseconds. Nothing in this demo is pre-computed or pre-rendered.

Embracing Technology and Changing the Culture at the NYPD

Data-Smart City Solutions, Laura Adler

from November 28, 2016

city than they have been in years, and yet tensions between police officers and the communities in which they work have continued to mount. The challenge facing the New York Police Department (NYPD) today is to maintain safe streets while ushering in a new era of mutual respect between officers and local communities.

In the last two years, William Bratton has served for a second term as Commissioner of the New York Police Department. In this time, he led the NYPD away from aggressive tactics used to crack down on street-level crime, embracing instead a strategy that balances crime prevention and community engagement. In Bratton’s words, his second term was focused on moving the NYPD from a “warrior” to a “guardian” policing mindset.

This attempt at cultural change involved new policies regarding training and recruiting, neighborhood policing, and, perhaps most visibly, the adoption of social media platforms throughout the NYPD. A recently-published case study, co-authored by Harvard Law School Professor Susan Crawford and me, chronicles these organizational changes, drawing from dozens of interviews conducted by Professor Crawford starting in 2015.

Precision Medicine Inspires HPC

NCI BioMedical Informatics Blog, Eric Stahlberg

from November 29, 2016

The recent weeks have been momentous as the high-performance computing (HPC) community embraced the challenge of precision medicine. The theme of this year’s leading international supercomputing conference, SC16, was “HPC Matters” and it was evident that HPC matters to precision medicine and that precision medicine matters to the high-performance computing community.

The National Cancer Institute (NCI) recognizes that high-performance computing is a critical key to success when facing challenges in data, computation, and collaboration for the many efforts within and across organizations as well as within and across nations. With the involvement of the high-performance computing community in precision medicine, insights will be more quickly reached, limits overcome, and results shared, accelerating the critical innovations needed in precision oncology to help those touched by cancer.

Whether inspired by the Cancer Moonshot, the National Strategic Computing Initiative, the Precision Medicine Initiative, or personal interest, the response of the community to the opportunity in precision medicine was compelling throughout the conference.

University Data Science News

  • University of British Columbia launches 10-month, interdisciplinary data science program to teach programming, databases, algorithms as well as collaborative development, ethics, and data/cyber security.
  • Clemson Ecologists Release GFlow, 170 times faster than current wildlife mapping software; main developer then hired by BMW
  • New PhD Program in Technology Management at UC Santa Barbara looking for students “interested in technology-related work regardless of whether you have a background in the social sciences, physical sciences, management or engineering”
  • University of Massachusetts Amherst and Google improved diversity at the university Computer Science level by easing the transfer from community college in two reports – a qualitative report on California’s community college students and a national longitudinal report on CS-degree seekers who start in community college
  • Andrew Moore, Dean of the School of Computer Science at Carnegie Mellon’s testified at U.S. Senate subcommittee hearings on artificial intelligence, informing legislators of the need to educate high school students for technical careers in AI.
  • The University of Washington launched MusicNet for music data researchers. It is the first publicly available large-scale classical music dataset with curated fine-level annotations.

  • The Promise and Challenge of Big Data for Pharma

    Harvard Business Review, Ryan Copping and Michael Li

    from November 29, 2016

    The emergence of big data, as well as advancements in data science approaches and technology, is providing pharmaceutical companies with an opportunity to gain novel insights that can enhance and accelerate drug development. It will increasingly help government health agencies, payers, and providers to make decisions about such issues as drug discovery, patient access, and marketing. From our unique vantage points at Genentech, a leading biotechnology company with a major data science practice, and The Data Incubator, a data-science education company that places and trains data scientists, we have seen how the pharmaceuticals industry has leveraged big data for some potentially revolutionary advances and the challenges it has faced along the way.

    For the industry, the biggest challenge by far has been talent: upgrading skill sets from those sufficient to analyze relatively small amounts of clinical trial data to those required to gain insights from the vast amount of real-world data, including unstructured data such as physicians’ notes, scans and images, and pathology reports. The pharmaceuticals industry has seen an explosion in the amount of available data beyond that collected from traditional, tightly controlled clinical trial environments. To be sure, anonymized insurance-claims data and electronic health record (EHR) data has been accessed and analyzed for many years. But in the past, EHR data was often limited to a single research institution or provider network, and obtaining the data needed to help answer a specific research question usually involved a tedious and inefficient process. While much still needs to be done to create standardized methods for sharing and making sense of anonymized EHR and genomic data across providers, it is now possible to link different data sources, which allows complex research questions to be addressed.

    An AI Ophthalmologist Shows How Machine Learning May Transform Medicine

    MIT Technology Review, Will Knight

    from November 29, 2016

    Google researchers got an eye-scanning algorithm to figure out on its own how to detect a common form of blindness, showing the potential for artificial intelligence to transform medicine remarkably soon.

    The algorithm can look at retinal images and detect diabetic retinopathy—which affects almost a third of diabetes patients—as well as a highly trained ophthalmologist can. It makes use of the same machine-learning technique that Google uses to label millions of Web images.

    Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

    Proceedings of the National Academy of Sciences; Carlos Baldassi et al.

    from November 15, 2016

    Artificial neural networks are some of the most widely used tools in data science. Learning is, in principle, a hard problem in these systems, but in practice heuristic algorithms often find solutions with good generalization properties. We propose an explanation of this good performance in terms of a nonequilibrium statistical physics framework: We show that there are regions of the optimization landscape that are both robust and accessible and that their existence is crucial to achieve good performance on a class of particularly difficult learning problems. Building on these results, we introduce a basic algorithmic scheme that improves existing optimization algorithms and provides a framework for further research on learning in neural networks. [full text]

    The Dawn of Artificial Intelligence – Hearings

    U.S. Senate Committee On Commerce, Science, & Transportation

    from November 30, 2016

    U.S. Sen. Ted Cruz (R-Texas), chairman of the Subcommittee on Space, Science, and Competitiveness, will convene a hearing on Wednesday, November 30, 2016, at 2:30 p.m. on “The Dawn of Artificial Intelligence.” The hearing will conduct a broad overview of the state of artificial intelligence, including policy implications and effects on commerce.

    Laboratory Sharing to Improve Rock Deformation Research

    Eos; A. Kronenberg, G. Hirth, N. Lapusta, J. Chester, A. Freed, C. Marone, and T. Tullis

    from November 29, 2016

    Developing new instruments for testing the physical properties of geomaterials is key to scientific advances, and similar instrument development and learning experiences are occurring in other rock deformation laboratories throughout the world. Unfortunately, many such instruments are rarely produced beyond the prototype because of limited resources and the relatively small number of rock deformation laboratories. What’s more, experimental, observational, and theoretical studies are not often integrated, as each approach requires a unique set of methods, preparation, and specialization.

    How do we solve these problems? To foster new integrated research of tectonically significant processes, experimental scientists, field researchers, and modelers need shared access to working laboratories and technical support. A new group, born from discussions at meetings and workshops [Tullis et al., 2013], aims to facilitate this access.

    Called the Deformation Experimentation at the Frontier of Rock and Mineral research (DEFORM), the group seeks to integrate rock deformation labs from across the United States into one shared national facility with access to visiting scientists and students.

    ISB Q&A: Ruth West

    Institute for Systems Biology

    from November 28, 2016

    Ruth West is a creative catalyst and veteran in the field of art+science integration. Her groundbreaking
 work Atlas in silico is a physically interactive and immersive virtual reality art+science installation that is driven by data from the Global Ocean Sampling Expedition, a survey of marine microbial communities led by the J. Craig Venter Institute. Atlas in silico offers visitors a dream-like and aesthetic experience of the pioneering science of metagenomics. Ruth West came to Seattle to set-up her immersive
 VR work for a short-term exhibition held in October 2016 at King Street Station called “9 evenings 2.” ISB’s Consilience Program Director Allison Kudla invited West to present at ISB. We captured a few of the viewpoints she shared in an interview.

    The Subtle Ways Your Digital Assistant Might Manipulate You

    WIRED, Gear, Maurice E. Stucke

    from November 29, 2016

    Yet, despite the promise of digital assistants, they also carry significant social, political, and economic concerns. The leading platforms’ plans, the Guardian reports, are clear: They envision “a future where humans do less thinking when it comes to the small decisions that make up daily life.” To work well, the digital butler will likely operate from an existing platform and tap into the vast personal data and services that platform offers. Four super-platforms—Apple, Amazon, Facebook, and Alphabet—dominate today’s online world. Not surprisingly, each is aiming for its digital assistant (Apple’s Siri, Amazon’s Alexa and Echo, Facebook’s M, and Google’s Assistant and Home) to become our head butler.

    Google Earth update shows how climate change has morphed our planet

    Popular Science, Peter Hess

    from November 29, 2016

    In 2013, Google released Google Earth Timelapse, an interactive viewer that lets users see satellite images of Earth from 1984 to the present, giving a rapid timelapse look at how human development and climate change have shaped our planet. Also in 2013, Popular Science reported that the scientific consensus surrounding human-driven climate change was stronger than ever. With Google’s new update to Timelapse, users can see the effects of global climate change even more sharply, including melting glaciers, rising sea levels, and receding forests.


    Next Databite: Security and Privacy in a Hyper-connected World

    New York, NY Our December 7 speaker is Bruce Schneier — presenting Security and Privacy in a Hyper-connected World. 4 p.m., 36 West 20th Street, 11th Floor. [free, rsvp required]

    Creating Impact with Open Data in Agriculture and Nutrition

    The Hague, Netherlands February 13-15 at Ministerie van Economische Zaken
    (73 Bezuidenhoutseweg)


    Virtual Reality in Education Study

    You are invited to participate in a study investigating the viability of virtual reality environments in education.You will be asked to attend a classroom style lecture in VR. The experiment is an hour-long session, including the lecture, two short questionnaires and a test assessing what you have learned. Signup ends on Monday, December 19.

    Call for Papers – First Workshop on Ethics in Natural Language Processing

    Valencia, Spain Workshop takes place on April 4, co-located with EACL 2017. Deadline for submissions is Monday, January 16.
    NYU Center for Data Science News

    Where do CDS Alumni go? – NYU Center for Data Science

    NYU Center for Data Science

    from November 29, 2016

    With our MS Open House for prospective students coming up on December 14, one might be wondering: where exactly could this degree take you?

    During our exciting Alumni Student Networking event on November 4, our alumni gathered with current CDS students to share how they found success in several industries like finance, media, and technology after graduating.

    Tools & Resources

    Deep Learning Made Easy in Azure

    Microsoft, Cortana Intelligence and Machine Learning Blog

    from November 28, 2016

    Deep learning is an exciting new space for predictive modeling and machine learning and I’ve previously written about a variety of different models and tools in my previous blogs. However, it can be intimidating to get started and frustrating to get everything installed and configured. In this blog, we’ll show how you can jump-start your deep learning process with Jupyter Notebooks in Azure.

    [1611.09326] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

    Computer Science > Computer Vision and Pattern Recognition; Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio

    from November 28, 2016

    State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.
    Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.
    In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.

    elasticsearchr – a Lightweight Elasticsearch Client for R

    Alex Ioannides, When Localhost Isn't Enough blog

    from November 28, 2016

    Elasticsearch is a distributed NoSQL document store search-engine and column-oriented database, whose fast (near real-time) reads and powerful aggregation engine make it an excellent choice as an ‘analytics database’ for R&D, production-use or both. Installation is simple, it ships with sensible default settings that allow it to work effectively out-of-the-box, and all interaction is made via a set of intuitive and extremely well documented RESTful APIs. I’ve been using it for two years now and I am evangelical.

    Leave a Comment

    Your email address will not be published.