NYU Data Science newsletter – April 8, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 8, 2016

GROUP CURATION: N/A

Data Science News

First, design for data sharing

Nature Biotechnology; John Wilbanks & Stephen H Friend

from April 07, 2016

To upend current barriers to sharing clinical data and insights, we need a framework that not only accounts for choices made by trial participants but also qualifies researchers wishing to access and analyze the data.

Website Seeks to Make Government Data Easier to Sift Through

The New York Times

from April 04, 2016

For years, the federal government, states and some cities have enthusiastically made vast troves of data open to the public. … A project coming out of the M.I.T. Media Lab on Monday seeks to ease that challenge and to make the value of government data available to a wider audience. The project, called Data USA, bills itself as “the most comprehensive visualization of U.S. public data.” It is free, and its software code is open source, meaning that developers can build custom applications by adding other data.

What social media data could tell us about the future

news @ Northeastern

from April 07, 2016

Northeastern’s Alessandro Vespignani, Sternberg Family Distinguished University Professor of physics, computer science, and health sciences, has teamed up with an interdisciplinary group of scientists to develop an innovative method to map how tweets about large-scale social events spread. Using massive Twitter datasets and sophisticated quantitative measures, it tracks how information about political protests, large business acquisitions, and other “collective phenomena” gather momentum, peak, and fall over time, from city to city, and where the impetus comes from for that trajectory.

Hot Commodity

MIT Technology Review

from March 28, 2016

For Andrej Karpathy, a Stanford PhD candidate who makes software that can imitate us, landing a job after graduation was easy.

Stanford CS224N/Ling284 Final Projects 2012-2013

Stanford University, Christopher Manning

from April 06, 2016

This course introduces the fundamental concepts and ideas in natural language processing (NLP), otherwise known as computational linguistics. … We consider word-level, syntactic, and semantic processing from both a linguistic and an algorithmic perspective, aiming to get up to speed with current research in the area. The course focuses on modern quantitative techniques in NLP — using large corpora, statistical models for acquisition, disambiguation, and parsing — and the construction of representative systems. (course website)

Is Fashion Ready for the AI Revolution?

Business of Fashion

from April 07, 2016

If artificial intelligence has its way, discounting could disappear, thanks to software that tells retailers exactly what and how many products to buy, and when to put them on sale to sell them at full price. Online shopping could become a conversation, where the shopper describes the dress of their dreams, and, in seconds, an AI-powered search engine tracks down the closest match. Designers, merchandisers and buyers could all work alongside AI, to predict what customers want to wear, before they even know themselves.

Events

DS Industry Speaker Series — April 12, Dave Thomas, Kx Labs — The Future of Database – Revolution or Evolution?

In the last decade years there have been significant improvements in computer processors, storage and networking. … In this talk we briefly explain the essential features, similarities and differences between these different database perspectives. We examine evolution of Kx Kdb+ given the data challenges of our customers and the market. We then discuss the impact of modern hardware on database architecture and data languages.

Tuesday, April 12, starting at 12:30 p.m. at the Center for Data Science, located at 726 Broadway, 7th Floor

NYU Computer Science Department Colloquium — Marginalization is not Marginal: Non-Convex, Bayesian-Inspired Algorithms for Sparse and Low-Rank Estimation

Speaker: David Wipf, Microsoft Research Beijing

Many practical applications of sparsity and low rank matrices do not benefit from this luxury; rather, because of intrinsic correlations in the signal dictionary (or related structure in rank minimization problems), convex algorithms must be used in regimes where theoretical support no longer holds. Moreover, in some situations it has been shown that convex relaxations are in fact provably bad. Consequently, non-convex algorithms, while perhaps theoretically less accommodating, may nonetheless produce superior results. Here we will examine non-convex estimation algorithms, many of which originate from Bayesian machine learning ideas, that thrive in environments where more popular convex alternatives fail. In all cases, theoretical model justification will be provided independent of any assumed prior distributions.

Wednesday, April 13, starting at 11:30 a.m., in Warren Weaver Hall 1302

Deadlines

ICML 2016 Workshops: #Data4Good: Machine Learning in Social Good Applications

deadline: subsection?

The goal of this workshop is to bring together experts from different fields of machine learning, statistics, data science, social sciences and social activism to explore the opportunities for machine learning in applications with social impact. The workshop will consist of: 1) invited presentations from the leading practitioners in the field, and 2) a series of 20 minute presentations on research that fits the theme of machine learning for social good; broadly construed, this could be machine learning related social good applications, or new machine learning methods or theory of particular interest for social good applications.

Deadline for submissions is Sunday, May 1.

Also, 2016 Workshop on Human Interpretability in Machine Learning at ICML 2016 (same May 1 deadline)

SBP-BRiMS 2016 Grand Data Challenge

deadline: subsection?

Fundamental research problems exist in how to fuse data, how to identify the relevant portions of the data, how assess change in the data, how to sample the data, and how to visualize the data. These issues must be met to advance social theorizing and improve policy analysis. This year’s SBP-BRiMS challenge problem invites you to take part in addressing one or more of these challenges.

Using at least one of four political event datasets (GDELT, KEDS, ICEWS, Phoenix) and one other data set (which may be a second one of these event datasets, or any other relevant dataset), this year’s challenge problem asks participants to address any issue of interest to you or your team that involves events and their distribution over time or space. All entries must have both a strong social theory, political theory or policy perspective and a strong methodology perspective.

Deadline for abstract submissions is Sunday, May 1.

Tools & Resources

Deep Learning for Chatbots, Part 1 – Introduction – WildML

Denny Britz, WildML blog

from April 06, 2016

In this series I want to go over some of the Deep Learning techniques that are used to build conversational agents, starting off by explaining where we are right now, what’s possible, and what will stay nearly impossible for at least a little while. This post will serve as an introduction, and we’ll get into the implementation details in upcoming posts.

ForeCache: Raising the Bar in Big Data Visual Exploration

MIT CSAIL, Intel Science & Technology Center for Big Data

from April 06, 2016

To push beyond the limitations of current DBMSs and support interactivity, we developed the ForeCache visual exploration system (see Figure 1). ForeCache uses a client-server architecture: The user interacts with a visualization interface running on the client machine (i.e., the user’s laptop), and the client retrieves the corresponding data by issuing requests to a DBMS running on a remote server. For its extensive support for scientific analysis operations, we use the array-based DBMS SciDB as our back-end. To further boost back-end performance, ForeCache includes a server-side middleware layer inserted in front of the DBMS, which pre-fetches data into a main memory cache in anticipation of the user’s future interactions.

api-packages.Rmd

GitHub – hadley/httr (Hadley Wickham)

from April 07, 2016

So you want to write an R client for a web API? This document walks through the key issues involved in writing API wrappers in R. If you’re new to working with web APIs, you may want to start by reading “An introduction to APIs” by zapier.

dbpatterns – create, share, explore database patterns

Fatih Erikli

from April 01, 2013

Dbpatterns is a service that allows you to create, share, explore database models on the web. GitHub: https://github.com/fatiherikli/dbpatterns.

NASA, Japan Make ASTER Earth Data Available At No Cost

NASA

from April 01, 2016

Beginning today, all Earth imagery from a prolific Japanese remote sensing instrument operating aboard NASA’s Terra spacecraft since late 1999 is now available to users everywhere at no cost.

The public will have unlimited access to the complete 16-plus-year database for Japan’s Ministry of Economy, Trade and Industry (METI) Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) instrument, which images Earth to map and monitor the changing surface of our planet. ASTER’s database currently consists of more than 2.95 million individual scenes.

Building Geodata Block by Block

SIGNAL Magazine

from April 01, 2016

Researchers are preparing to release technology designed to overcome the challenges of coping with large amounts of geospatial data. The Web-based system makes it easier to layer blocks of information, allowing a wide variety of users to quickly understand and share complex data sets.

The National Science Foundation’s (NSF’s) Geospatial Building Blocks (GABBs) project is creating a system for hosting, processing, analyzing and sharing geospatial information. The system is built on HUBzero, an open source platform developed at Purdue University that lets individuals build feature-rich websites to advance research and education.
– See more at: http://www.afcea.org/content/?q=Article-building-geodata-block-block#sthash.bOugdulN.dpuf

Careers

The next hot job in Silicon Valley is for poets

The Washington Post

Fast Forward Labs — Client-focused Data Scientist

Fast Forward Labs

Postdoctoral Fellow in Center for Complex Networks and Systems

IU Bloomington School of Informatics and Computing

Sports.BradStenger.com

NYU Data Science newsletter – April 8, 2016

Leave a Comment Cancel reply