NYU Data Science newsletter – October 19, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for October 19, 2016


Data Science News


The sunny happy land of Open Science

By far the biggest announcement in open science and/or open software this week was the launch of the Artificial Intelligence Open Network which publishes rigorous, meaningful questions in the science of AI, invites “anyone” to contribute, and will then publish all resulting code publicly.

My colleague Josh Greenberg at the Alfred P. Sloan Foundation tipped me off to a great website full of interviews with Open Science contributors.

The Linux Foundation announced it is providing an institutional home for the JS Foundation (JavaScript and server side apps, for those who are unfamiliar). It’s extremely important to have a stable institutional home with enough administrative support to keep the lights on and the grants processed.

Fedscoop writer Samantha Ehlinger noted a mix of kick-in-the-butt motivation to get more functional data formats and applause for trying to open government data in the first place in her piece on Data.gov’s current and future usefulness.


White House To Announce Big Push For Cancer Blood Tests

Forbes, Matthew Herper

from October 17, 2016

The White House, as part of the Cancer Moonshot effort being run by Vice President Joe Biden, is announcing a major push to develop blood tests that can detect and monitor cancer, that aims to unite makers of diagnostic tests, drugs, and other cancer-related products.

“If you think about somebody who is at risk of cancer, or has been diagnosed with cancer, through the rest of life there is this question: Where is the cancer, what does the cancer look like? And what is going to happen next?” says Peter Kuhn, the Dean’s Professor of Life Sciences at the University of Southern California, which is participating in the effort.

The project, called the Blood Cancer Atlas (updated: they went with “Blood Profiling Atlas”), will try to jumpstart efforts to develop such blood tests by convincing companies to share data. “This effort is about strongly encouraging individuals who might not otherwise be caught dead together to work together for the greater good,” says Michael Pellini, the chief executive of Foundation Medicine, which is participating.


The West Is Burning, And Climate Change Is Partly To Blame | FiveThirtyEight

FiveThirtyEight, Cally Carswell

from October 19, 2016

So far this year, wildfires have scorched nearly 5 million acres in the U.S. … In the Proceedings of the National Academies of Science this month, two researchers took on the tricky task of apportioning blame. “People have sort of thrown conjecture out there, saying that the big fire seasons we’ve had since 2000 are attributable to climate change,” said John Abatzoglou, the lead author of the new study and a climatologist at the University of Idaho. “But we wanted to go out and make an effort to try to quantify it.” How much of the recent uptick in fire activity is due to climate change versus other factors, like the natural drought cycle?


Quantifying Everything About Urban Life

CityLab, Eillie Anzilotti

from October 14, 2016

Anyone who lives in a city is affected and influenced by an evolving network of environmental, economic, behavioral, and biological factors.

It’s an overwhelming concept, but that’s kind of the point for Paul Glimcher, an NYU professor and the director of the Kavli HUMAN Project—a first-of-its-kind study that, once it launches next summer, will track the lives of 10,000 New Yorkers in 4,000 families over the span of decades.


Apple hires deep learning expert to make Siri smarter

Cult of Mac

from October 17, 2016

Siri is about to get a lot smarter thank to Carnegie Mellon researcher Russ Salakhutdinov, who announced today that he is joining Apple to lead the company’s artificial intelligence efforts.


Deep learning takes on GIFs, fashion, doodles and more at ACM Multimedia


from October 18, 2016

This week is the Association for Computing Machinery’s Multimedia conference in Amsterdam, and the theme this year beyond a doubt is machine learning. Many of the papers and presentations in some way or another leverage neural networks and deep learning, and the purposes to which they put these increasingly useful tools are by turns fascinating, practical, and whimsical. [slide show]


EPA using data science to save environment, support agencies

Federal News Radio

from October 18, 2016

The Chesapeake Bay produces 500 million pounds of seafood per year, supports thousands of fish and birds, and sustains just as many plants and underwater grasses.

It also falls under the purview of 1,800 local government entities. So who gets a phone call when there’s a problem with the water?

“Think about the difficulty in trying to manage an ecosystem that has this much impact on the environment, where you have to deal with 1,800 different entities for the government,” said Robin Thottungal, chief data scientist and director of analytics at the Environmental Protection Agency. “If any of you have dealt with one agency, you know how difficult it is to make a change. Think about the challenge we have here.”


How is the world searching for the US Presidential Election? A Data Visualisation from the Google News Lab.

Simon Rogers

from October 18, 2016

The US Presidential election is not just being watched in America. From Indiana to India, each development is being intently followed by people across the globe.

Those people are also searching about the election online?—?and that’s whereworldpotus.com comes in.

Along with Accurat, a design group headed by Gabriele Rossi, Simone Quadri and Giorgia Lupi (of Dear Data fame), we built a data visualisation that shows search interest in the U.S. Presidential candidates and top political issues. The data viz is part of a series of experiments to work with great designers to find new ways of using Google data in newsrooms.


Open Compute Project accepts Facebook’s 100Gbps Wedge switch; Accton starts selling it

VentureBeat, Jordan Novet

from October 18, 2016

Facebook is announcing today that the Open Compute Project (OCP) has accepted its contribution of the Wedge 100 top-of-rack Ethernet switch that transmits data at 100 Gigabits per second (Gbps).

This in itself isn’t very surprising, given that Facebook has often shared technology with the wider community through the OCP, and given that the organization was formed by Facebook. But it’s interesting because it shows that Facebook doesn’t want to keep to itself the infrastructure that enables it to more economically handle data-heavy applications at great scale.


Building The Phonic Web

Medium, Suman Deb Roy

from October 18, 2016

The Digg experience comes in many flavors, such as the Homepage curated by editors, the Channel pages (e.g., Entertainment, Election2016, Sports) driven by algorithms and editorial oversight, Digg Deeper?—?an algorithmic stream that plucks the best links from your Twitter feed, Digg Reader?—?an RSS reader aggregating more than 10 million feeds and Digg Messaging powered with trending news and keyword subscriptions.

Almost all these products are primarily designed for a visual experience. In fact unsurprisingly, visual-inspired design is the defining characteristic of web media. Headlines, descriptions and body-content of news articles is largely produced for consumption through visual feeds.

With the advancement of Text to Speech (TTS) technologies and voice agents, consumers are open to choose an alternative dimension for news. This alternative dimension is News via Personalized Audio Feeds.


OpenMS – from mass spectra to knowledge

Software Sustainability Institute, Timo Sachsenberg and Oliver Kohlbacher

from October 18, 2016

High-throughput mass spectrometry has become a versatile technique to tackle a large range of questions in the life sciences. Being able to quantify diverse classes of biomolecules opens the way for improved disease diagnostics, elucidation of molecular structure and investigation of cellular pathways. In an interplay with other open-source software, OpenMS enables powerful workflows to transform biological data into meaningful knowledge.


Will usability issues hinder Google Assistant, the star of Google’s new Pixel phone?

Marketing Land, Danny Sullivan

from October 18, 2016

Why would you choose the new Google Pixel smartphone over one of the many competitors out there, including the iPhone? Google’s betting big that the new “Google Assistant” feature in the Pixel will make it a compelling choice. That’s a bet which might not pay off due to usability issues.

I’ve been using the Pixel for about four days now. If you’re looking for a review of the phone overall versus other phones, I’d suggest visiting Techmeme after reading this. There will be any number of critiques out now that dissect camera quality, look-and-feel, battery life and more.

My focus is on the Google Assistant part of the phone. Google Assistant is an evolution of the Google Search that so many have used and depended on over the years. Google Assistant wants to interact with you, offering suggestions and potentially taking actions on your behalf. It wants to be a smarter Siri, if you will.


WIMP2: Creating Music with AI: Highlights of Current Research — François Pachet

YouTube, aesuksection

from October 05, 2016

Keynote lecture given by François Pachet for the 2nd AES Workshop on Intelligent Music Production at the Centre for Digital Music at Queen Mary University of London on 13th September 2016.


Using, losing and abusing top talent?: Tales from academic data science

Another top professor left Carnegie Mellon for industry last week when Russ Salakhutdinov joined Apple’s AI group. This in the wake of 40 CMU researchers leaving in 2015 (6 PIs, 34 engineers) to work for Uber. CMU’s Dean Andrew Moore arguing that these departures are good because they allow professors to make enough to stop worrying about “sending their kids to college”. In fact, in an interview with TechCrunch Moore said “I and the department heads here have actually begun encouraging them to do a startup for a few years, or else work for big companies, and come back.”

At the schools in the Moore-Sloan Data-Science Environment where I work, Moore’s attitude is not shared. My colleagues believe that it is important to try to retain top talent in the academy in order to teach the next generation of technologists and retain complete freedom over research agendas. Admittedly, though, many of my colleagues could make twice as much if they took jobs in industry, a fact that is less about greed and more about regaining their financial balance after spending over a decade in underpaid grad school and postdoc roles.

Another way to retain professors when it’s impossible to pay them more is to let them design new courses, like this interdisciplinary course on the history and social context of data science taught by Matt Jones and Chris Wiggins at Columbia University.

My current (as yet unpublished) project is finding that the promise of intellectual freedom in academia is hampered by overcommitment among academics who are also tasked with managing their departments, establishing computational resources, and in some cases more intensive student mentoring. New research from Christine Borgman’s lab at UCLA substantiates the notion that academic scientists are struggling to find time and money for “help with the software … and computational side of” research infrastructure. In this context, the academic libraries – traditional hubs of information storage and retrieval – are taking on more responsibility for data storage and management.

We are still in the chaotic gold rush era of data science research and development. It remains to be seen if and how universities will retain and nurture top research talent without squeezing postdocs and faculty so hard they pop off to development positions in industry that are better paid and less management-intensive.


Data Management in the Long Tail: Science, Software, and Service

International Journal of Digital Curation; Christine L. Borgman, Milena S. Golshan, Ashley E. Sands, Jillian C. Wallis, Rebekah L. Cummings, Peter T. Darch, Bernadette M. Randles

from October 16, 2016

Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders.


Understanding and Sustaining the Role of Academic Libraries in Research Data Management | Institute of Museum and Library Services

Institute of Museum and Library Services, Ashley Sands

from October 17, 2016

IMLS recently announced 41 awards made through the National Leadership Grants for Libraries program (NLG), the Laura Bush 21st Century Librarian program (LB21), and Sparks! Ignition Grants for Libraries program (Sparks). Among these awards, we are pleased to support a number of projects seeking to understand and sustain the role of academic libraries in research data management. The three projects highlighted in this post represent a total investment of nearly $200,000. Click the log number for each grant to access more information, including selected documents from each grant proposal.


AI•ON: Artificial Intelligence Open Network

François Chollet

from October 18, 2016

AI•ON is an open community dedicated to advancing Artificial Intelligence by:

  • Drawing attention to important yet under-appreciated research problems.
  • Connecting researchers and encouraging open scientific collaboration.
  • Providing a learning environment for students looking to gain machine learning experience.

  • Headline:

    Genetic fossil-hunters dig through HIV’s long history for clues to treatments

    STAT, Shayla Love

    from October 18, 2016

    The Malayan flying lemur is a small brown animal with buggy eyes. It has extra skin stretching from its neck to its toes that it uses to glide between the treetops in Southeast Asia. In August of this year, a group of Czech scientists discovered another trait of this mammal: Deep within its genome is DNA of the oldest extinct virus related to HIV. They dated it to be around 60 million years old — meaning it was circulating when Europe and Greenland were still connected.

    A lot of virus-fighting happens in the “now” — developing drugs, predicting where viruses will spread, and quarantining those infected.

    But a relatively new field of research called paleovirology is taking a broader view: considering viruses on an evolutionary scale.


    Historic Achievement: Microsoft researchers reach human parity in conversational speech recognition

    Microsoft, Next blog

    from October 18, 2016

    Microsoft has made a major breakthrough in speech recognition, creating a technology that recognizes the words in a conversation as well as a person does.

    In a paper published Monday, a team of researchers and engineers in Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists. The researchers reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month.

    The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task.


    Ask The Chefs: What’s Your Favorite ‘Dirty Little Secret’ About Scholarly Publishing?

    The Scholarly Kitchen

    from October 18, 2016

    … Two “dirty little secrets” come immediately to mind. The first is that not all libraries actually activate their subscriptions to content, and usually renew these same subscriptions despite this. I’ve seen this at multiple publishers, and it’s always surprising. They buy the content, but never activate their access, which means they aren’t measuring the value, delivering the value, or assessing the value.

    The other “dirty little secret” is that despite being brilliant individuals, most scientists and physicians are very poor businesspeople.


    Design and UX for an AI-first world

    Clarifai Blog

    from October 14, 2016

    Remember when people were writing about designing mobile-first products? Now, at the frontier of one of the possible next platforms for our digital interactions, we discuss how we at Clarifai think about designing AI-first products.


    Can We Predict the Future of Ocean Carbon Dioxide Uptake?

    Eos, Shannon Hall

    from October 18, 2016

    A new understanding of uncertainties in climate change models allows scientists to decide which source to tackle first in order to better forecast our planet’s changing climate.


    Interview: Jim McHugh, GM of the Deep Learning Group at NVIDIA


    from October 18, 2016

    recently caught up with Jim McHugh, GM of the Deep Learning Group for NVIDIA, from the floor of Strata + Hadoop World 2016 NYC to discuss analytics, the overall AI evolution and how there’s a lot of synergy between the two. Jim leads the teams who are responsible for the NVIDIA DGX-1, the world’s first AI supercomputer in a box. His responsibilities include product management, product marketing and partner solutions. Jim is focused on executing strategies to deliver GPU-based computing solutions for the data center. Jim has more than 25 years of experience as a marketing and business executive with many technology leaders including Apple, Cisco, and Sun. He has a deep understanding of business drivers, market/customer dynamics, and technology-centered products for AI, enterprise and data center applications.


    VIS 2016 30 Second Videos on Vimeo

    251 short videos. VIS 2016 is October 23-28 in Baltimore.

    PROCJAM Talks 2016

    Online An afternoon of talks by experts in all sorts of art, design, games and technology, all talking about procedural generation and generative software.

    Privacy and Security Forum

    Washington, DC Monday-Wednesday, Oct 24-26, The Marvin Center
    (800 21st Street Northwest).

    IEEE Collaboration and Internet Computing 2016

    Pittsburgh, PA Nov 1-3, 2016. [$$$]

    SC16 – The International Conference for High Performance Computing, Networking, Storage and Analysis

    Salt Lake City, UT November 13-18 [$$$]

    Data Transparency Lab Hack Day

    New York, NY Saturday, 19 November 2016; Columbia University. A 12 hour hack day to build tools that help educate users on privacy and transparency.

    Utah Symposium for the Digital Humanities (DHU2)

    Salt Lake City, UT February 10-11, 2017 at the University of Utah, J. Willard Marriott Library. [$$]

    The Atlanta CAA 2017 CFP is now on!

    deadline: Conference

    Atlanta, GA The conference will be held between March 14th and 16th, 2017 at Georgia State University. Deadline for paper submissions is Friday, October 28.

    Provide Input on the Future of Data Science in Agriculture

    deadline: Survey

    NIFA is seeking input from the agricultural and broader data science communities through 5 p.m. EDT Oct. 31 via an online Ideas Engine.

    Workshop: Work, Labor and Automation

    deadline: Conference

    New York, NY On January 23, 2017 Data & Society will host a workshop on the intersection of technology and work/labor. Participation in this event is limited. Those who are interested in participating should apply by 1 November 2016.

    NYC Media Lab’s Verizon Start-Up Challenge 2016-2017, Connected Futures II

    deadline: Contest/Award

    New York, NY NYC university teams are invited to apply to Connected Futures II. Accepted teams will be awarded up to $25,000 to develop their prototype. Deadline for applications is Thursday, November 10.

    Call for rstudio::conf lightning talks

    deadline: Conference

    Kissimmee, FL rstudio::conf will be January 11-14. Lightning talks are short (5 minute) high energy presentations about an interesting project in R. Deadline for submissions is Thursday, 1 December 2016.

    NYU Center for Data Science News


    Foster Provost’s book tops the list of Best 30 Data Science Books

    A.I. Optify, AI & Optimization blog

    from October 15, 2016

    1. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking by Foster Provost and Tom Fawcett

    “Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.”

    Tools & Resources

    Best Practices for Conducting Risky Research and Protecting Yourself from Online Harassment

    Data & Society, Alice E. Marwick, Lindsay Blackwell, Katherine Lo

    from October 18, 2016

    “A set of best practices for researchers – especially junior researchers – who do research that may make them susceptible to online harassment.”

    NextDraft vs REDEF vs Digg: The Greatest Panel On Earth at the Priceonomics Conference

    Priceonomics blog

    from October 18, 2016

    The Priceonomics Content Conference is just two weeks away! … One of the parts of the conference we are most excited about is the panel we put together to answer a question: what is interesting? We invited the people who run the biggest and most influential content curation sites to talk about how they identify and feature interesting content.

    NSF Big Data Hub newsletters

    from October 18, 2016

  • South Hub
  • Midwest Hub
  • Northeast Hub
  • West Hub

  • Convert REST APIs to subscribable streams

    GitHub – beaucronin

    from October 17, 2016

    I want a system that has the following properties:

  • I don’t want to worry about maintaining processes; it should use mature, declarative devops tooling
  • Implementing a new stream does not require infrastructure effort beyond what is particular to that stream – just a bit of config, and the API-specific code. And API-specific logic should be stateless and as simple as possible; preferably just transforming some JSON
  • Multiple connections can be made to a stream without increasing load on the backing REST API; this is both a matter of behaving properly toward the underlying API providers, as well as a necessity for staying within the usage limits typically associated with a given API key
  • Speaking of which, API keys are securely managed by a central store

  • Welcoming OpenAPI Spec v3.0

    CapitalOne DevExchange, Dennis Brennan

    from October 14, 2016

    Capital One engineers have been working hard to continuously improve and evolve the way we define, build, and document our REST APIs. The OpenAPI Specification (OAS), and related tooling, has been front and center in that evolution. The upcoming 3.0 release of the OAS will enable us, and all who have adopted the open format, to continue improving both our API development practices and platform.

    A Story in Every Direction

    IDEO Labs

    from October 18, 2016

    Thanks to native support on Facebook and YouTube, virtual reality and immersive 360? video have reached a tipping point in which production, distribution, and consumption are now available to almost everyone. From enabling new forms of empathy to exploring social issues in an on/off-line world, we are more empowered than ever to experiment with new ways to convey stories and ideas. There’s one problem though: Many creators are still struggling to find the best way to tell stories with this medium.

    In this post, I’ll be going into a few of the storytelling and technical challenges of 360? video, and share some production methods I’ve been prototyping for this new medium. Even better, you won’t need a 360? camera to try it for yourself.

    Microsoft introduces two new data science utilities on GitHub


    from October 18, 2016

    Microsoft introduced two new data science utilities on GitHub to help boost productivity; Interactive Data Exploration, Analysis, and Reporting (IDEAR) and Automated Modeling and Reporting (AMAR). Both IDEAR and AMAR run in CRAN-R are accessible via GitHub.”


    Full-time positions outside academia

    Data Scientist, modeling analyst, or data analyst

    Knowledge is Power Program (KIPP); Chicago, IL

    71 Marie Skodowska-Curie Fellowships in Ireland

    Trinity College Dublin; Dublin, Ireland
    Tenured and tenure track faculty positions

    Assistant Professor; 3D form, Fabrication, and/or Critical Making

    UCLA Design Media Arts; Los Angeles, CA

    Leave a Comment

    Your email address will not be published.