|
|
Data Science News
|
Introducing Seeq – We want to make genomics inexpensive, interactive, and fun.
|
Medium, The Seeq Blog, Joe Pickrell
from September 06, 2016
If a 30x, clinical-grade whole genome sequence is a Hummer, what type of genome sequencing is the Smart car in this analogy? That is, could you imagine a genome sequence with just a few important features, but with a dramatically lower price point? And could getting this genome sequence even be…fun?
The outcome of this line of thinking is our latest project, Seeq. We’ve stripped genome sequencing of a bunch of features, leaving only the things we find fun?—?ancestry analysis, microbiome profiling, and research into the quirks and traits that make you unique. In doing so, we’ve gotten the price to a point we’re fairly happy with, currently $50.
|
|
Building a Team from the Inside Out: Alok Gupta on the Evolution of Data Science at Airbnb
|
Kaggle, No Free Hunch blog, Alok Gupta
from September 06, 2016
In an interview with Alok Gupta, a Data Science Manager at Airbnb and former algorithmic trader, I learned about the introspective efforts the company has made to scale its rapidly growing data science team into what it is today and how they (and other data teams) face the future.
While the evolution of the team’s organizational structure has permitted Airbnb’s data scientists to flourish, the company’s level of accomplishment derives from a “laser focus” on two things: truly caring for their employees and making highly intentional data-driven decisions. Whether it’s developing open-source tools for reproducible research or striving to improve the status of diversity in data science, Alok makes it clear that Airbnb pursues efforts which converge on these two guiding principles.
|
|
[1609.01704] Hierarchical Multiscale Recurrent Neural Networks
|
arXiv, Computer Science > Learning; Junyoung Chung, Sungjin Ahn, Yoshua Bengio
from September 06, 2016
“Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism.”
|
|
The Future of AI: CCC’s Response to OSTP RFI
|
Computing Community Consortium, CCC Blog, Greg Hager
from September 07, 2016
In June, the White House Office of Science and Technology Policy (OSTP) announced a new Request for Information (RFI) on Artificial Intelligence (AI), to solicit feedback on how the United States can best prepare for the future of AI. According to the OSTP Blog, they “received 161 responses from a range of stakeholders, including individuals, academics and researchers, non-profit organizations, and industry.” All of the responses are now public and can be found here.
The Computing Community Consoritum‘s (CCC) submitted a response, which can be found here, after co-sponsoring the Artificial Intelligence for Social Good workshop with OSTP and Association for the Advancement of Artificial Intelligence (AAAI) in June 2016.
|
|
Smart medicine is coming of age, but will doctors bite?
|
New Scientist, Technology News
from August 31, 2016
There are just too many health conditions and the literature is changing too rapidly for a primary care physician to retain it all, says Herbert Chase, who works on biomedical informatics at Columbia University in New York City. “We’ve exceeded where it’s humanly possible for doctors to know what they need to know,” he says. “There are dozens of conditions that are being missed that could easily be diagnosed by a machine.”
|
|
Cornell Tech Startup Builds a Plug-and-Play Robot Intelligence
|
Cornell Tech, News & Views
from September 06, 2016
Robots aren’t the stuff of science fiction any more — they’re vacuuming homes and helping people take aerial selfies. But they’d be a whole lot more useful if they were just a little bit smarter.
While robot hardware is constantly improving — from Boston Dynamics’ Atlas to Sphero’s SPRK and Parrot drones — software remains a thorny challenge, particularly for autonomous or intelligent functions.
This realization led Wilson Pulling, Yang Hu, Kuowei Tseng, and Inna Kim — all Masters in Computer Science ’16 — to develop Aatonomy (formerly known as RoboTC), a plug-and-play solution that can give instant autonomous capabilities to any kind of robot, from Roombas to drones.
|
|
New Lab is launching an incubator for civic-minded hardware companies
|
Technical.ly Brooklyn
from September 01, 2016
Meet the Urban Tech Hub, a partnership between the Navy Yard’s massive shared workspace New Lab and the New York City Economic Development Corporation (NYCEDC) that aims to support hardware companies addressing challenges such as transportation, energy and air quality.
Run by Varun Adibhatla, a programmer analyst who left Wall Street for civic tech, it’s a year-long program in which participants get dedicated space within New Lab’s 84,000 square-foot facility and access to all its resources, including a 3D printing lab, an electronics lab and a wood and metal lab. (Members are eligible for New Lab office space at a lower cost, said New Lab spokeswoman Molly Erman. New Lab declined to share prices for office space.) The companies will also receive mentorship from experts in industry, academia, and government, as well as assistance in building their teams and raising additional capital.
|
|
Precision medicine: Power to the patients?
|
Nature, Nature Research, Katherine Bourzac
from September 07, 2016
“For precision medicine to live up to its potential, millions of people must share their genomic data, their health records, and their experiences. To researchers, all of it is precious. The richer the databases, the better patient care will become.” But who holds power in these systems? The patients may benefit from machine learning medical research, but will they be able to speak up about which questions to ask and how their data are managed?
More precision medicine and computational genomics:
Introducing Seeq – We want to make genomics inexpensive, interactive, and fun. (September 06, Medium, The Seeq Blog, Joe Pickrell)
Smart medicine is coming of age, but will doctors bite? (August 31, New Scientist, Technology News, Aviva Rutkin)
Reproducible Science: Cancer Researchers Embrace Containers in the Cloud (September 06, Google Research Blog, Nicole Deflaux and Jonathan Bingham)
|
|
Audio data – Noise pollution research takes off
|
National Science Foundation
from September 06, 2016
NSF awarded $4.6 m over five years to the Sounds of New York City (SONYC) project based at NYU and The Ohio State University.
More audio data news and tools:
How Can Data Collection Be Used to Map City Sounds? (September 07, NYU Center for Data Science, research by Tae Hong Park, Claudio Silva and Juan Bello)
Urban Sound Classification (September 03, a howto by Aaqib Saeed)
WaveNet: A Generative Model for Raw Audio (September 08, Google DeepMind, Aaron van den Oord, Heiga Zen, Sander Dieleman)
|
|
A wealth of data on wealth management
|
McKinsey & Company
from September 07, 2016
We’re delighted to announce the acquisition of PriceMetrix, a Toronto-based company which helps wealth management firms harness huge amounts of data to improve client service, drive growth, and reduce risk.
Founded 16 years ago, PriceMetrix today holds data on 60,000 financial advisers across North America—everything from the types of clients they serve to the financial products they recommend and the commissions and fees they receive. Sliced, diced, and presented in smart ways, this enables companies to better understand and manage their advisory networks.
|
|
Technology Design and Consumer Privacy
|
Berkeley Law
from September 07, 2016
Chris Hoofnagle explains how the intentional design decisions of technology companies can erode the privacy of users. Advertisers’ appetite for data can lead companies to use even harder to detect means of collecting data. In this environment, Hoofnagle concludes, the rule of law and an awareness of how technology can erode privacy are more important than ever. [video, 2:48]
|
|
We Need Data to Re-Build Trust in Our Police Departments
|
Pacific Standard, Eric Liu
from September 07, 2016
The lack of data undermines the ability of the community to review the activities of their police and ultimately widens the trust gap between citizens and police. The paucity of police data extends well beyond Baltimore and runs deep throughout the nation.
According to the Federal Bureau of Investigation, only 3 percent of the United States’ 18,000 police departments report use of force data to the public. FBI Director James Comey calls the lack of data “ridiculous and embarrassing.” Stephen Morris, head of the FBI’s data collection efforts, acknowledges the “widespread frustration over lack of reliable data on incidents” and that “it is now the highest priority.”
|
|
Big data’s deluge in higher ed
|
Stanford News
from September 07, 2016
As college students click, swipe and tap through their daily lives – both in the classroom and outside of it – they’re creating a digital footprint of how they think, learn and behave that boggles the mind.
“We’re standing under a waterfall, feasting on information that’s never existed before,” said Mitchell Stevens, a sociologist and associate professor at Stanford Graduate School of Education (GSE). “All of this data has the power to redefine higher education.”
To Stevens and others, this massive data is full of promise – but also peril. The researchers talk excitedly about big data helping higher education discover its Holy Grail: learning that is so deeply personalized that it both keeps struggling students from dropping out and pushes star performers to excel.
|
|
Events
|
2016 RecSys – ACM Recommender Systems
Boston, MA The ACM Recommender Systems conference (RecSys). 15-19 September 2016.
|
|
Data for Good Exchange 2016
New York, NY The theme is “better governance” Sunday, September 25 [free]
|
|
Deadlines
|
Increasing Openness and Transparency in Research
|
deadline: Contest/Award
|
RWJF wants to use these grants to help increase our understanding of reproducible research work flows, and the understanding of the research community, about the potential for these approaches to accelerate the discovery process and expand access to findings.
Deadline for letters of intent is Wednesday, 5 October 2016.
|
|
CFP: Neural Abstract Machines & Program Induction workshop @ NIPS 2016
|
deadline: Conference
|
Barcelona, Spain The aim of the NAMPI workshop is to bring together researchers and practitioners from both academia and industry, in the areas of deep learning, program synthesis, probabilistic programming, inductive programming and reinforcement learning, to exchange ideas on the future of program induction with a special focus on neural network models and abstract machines. Through this workshop, we look to identify common challenges, exchange ideas and lessons learned from the different fields, as well as establish a (set of) standard evaluation benchmark(s) for approaches that learn with abstraction and/or reason with induced programs.
Deadline for paper submissions is Friday, October 14.
|
|
The Digital Mammography DREAM Challenge
|
deadline: Contest/Award
|
The Digital Mammography DREAM Challenge will attempt to improve the predictive accuracy of digital mammography for the early detection of breast cancer.
Deadline for entries is Monday, March 13, 2017.
|
|
NYU Center for Data Science News
|
Summer’s notable departures and arrivals at NYU Center for Data Science
- Foster Provost stepped down from the Moore-Sloan steering committee and handed over the CDS Interim Directorship to Claudio Silva.
- Roy Lowrance, CDS Managing Director, moved into industry.
- Dan Cervone, Moore-Sloan Fellow, took a job with the LA Dodgers starting in October.
- Pablo Barberá, a Moore-Sloan Fellow, started his tenure-track position at USC in July.
- Andrea Rooy-Jones joined Moore-Sloan in a research and outreach capacity.
- Michael Gill joined us as a Moore-Sloan Fellow.
- Our top-notch administrator David Clark started an MBA program this fall; we welcome Kathryn Angeles in his position.
|
|
How Can Data Collection Be Used to Map City Sounds?
|
NYU Center for Data Science
from September 07, 2016
If you want to avoid traffic in a city or congested area, there are applications such as Google Maps and Waze to help you get around. But what about avoiding noise and distracting sounds? The Citygram-Sound Project—a joint collaboration between NYU Steinhardt, NYU’s Center for Urban Studies and Progress, and CalArts—is trying to combat this gap in available data, by mapping the acoustic soundscapes of cities to better understand how noise pollution affects urban dwellers. The project even includes two faculty members from the Center for Data Science, Claudio Silva and Juan Bello.
Citygram was started in 2011, when Tae Hong Park, an Associate Professor of Music Technology and the Director of Music Composition at NYU Steinhardt, began mapping spatio-acoustic energy through a network of sensors spread throughout New York, Los Angeles, and other major cities.
|
|
Tools & Resources
|
What is auto-sklearn? — AutoSklearn 0.0.1 documentation
|
Machine Learning for Automated Algorithm Design group at the University of Freiburg
from May 08, 2016
auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction. Learn more about the technology behind auto-sklearn by reading this paper published at the NIPS 2015 .
|
|
Awesome Network Analysis
|
GitHub – briatte
from September 06, 2016
An awesome list of resources to construct, analyze and visualize network data.
|
|
Urban Sound Classification
|
Aaqib Saeed
from September 03, 2016
We all got exposed to different sounds every day. Like, the sound of car horns, siren and music etc. How about teaching computer to classify such sounds automatically into categories!
In this blog post, we will learn techniques to classify urban sounds into categories using machine learning. Earlier blog posts covered classification problems where data can be easily expressed in vector form. For example, in the textual dataset, each word in the corpus becomes feature and tf-idf score becomes its value. Likewise, in anomaly detection dataset we saw two features “throughput” and “latency” that fed into a classifier. But when it comes to sound, feature extraction is not quite straightforward. Today, we will first see what features can be extracted from sound data and how easy it is to extract such features in Python using open source library called Librosa.
|
|
MoonMail: Shoot billions of emails using AWS – SES – Lambda. Serverless Framework powered project
|
GitHub – microapps
from September 07, 2016
Send email marketing campaigns with Amazon SES. Let Amazon Lambda compose email by email and literaly scale it to infinite.
With MoonMail you can: create & edit lists of recipients (email addresses) and store them within a DynamoDB. Create & edit html email marketing campaigns, send them and track their opens and clicks.
|
|
Careers
|
Full-time, non-tenured academic positions |
Astronomer – Space Telescope Science Institute
Johns Hopkins University; Baltimore, MD
|
|
Postdocs |
Postdoc – State of Alaska’s Salmon & People Synthesis
University of California, Santa Barbara’s National Center for Ecological Analysis and Synthesis; Santa Barbara, CA
|
|
Postdoc – rOpenSci
rOpenSci Project at University of California, Berkeley; Berkeley
|
|
|