NYU Data Science newsletter – September 8, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 8, 2016

GROUP CURATION: N/A

 
Data Science News



Introducing Seeq – We want to make genomics inexpensive, interactive, and fun.

Medium, The Seeq Blog, Joe Pickrell


from September 06, 2016

If a 30x, clinical-grade whole genome sequence is a Hummer, what type of genome sequencing is the Smart car in this analogy? That is, could you imagine a genome sequence with just a few important features, but with a dramatically lower price point? And could getting this genome sequence even be…fun?

The outcome of this line of thinking is our latest project, Seeq. We’ve stripped genome sequencing of a bunch of features, leaving only the things we find fun?—?ancestry analysis, microbiome profiling, and research into the quirks and traits that make you unique. In doing so, we’ve gotten the price to a point we’re fairly happy with, currently $50.

 

Building a Team from the Inside Out: Alok Gupta on the Evolution of Data Science at Airbnb

Kaggle, No Free Hunch blog, Alok Gupta


from September 06, 2016

In an interview with Alok Gupta, a Data Science Manager at Airbnb and former algorithmic trader, I learned about the introspective efforts the company has made to scale its rapidly growing data science team into what it is today and how they (and other data teams) face the future.

While the evolution of the team’s organizational structure has permitted Airbnb’s data scientists to flourish, the company’s level of accomplishment derives from a “laser focus” on two things: truly caring for their employees and making highly intentional data-driven decisions. Whether it’s developing open-source tools for reproducible research or striving to improve the status of diversity in data science, Alok makes it clear that Airbnb pursues efforts which converge on these two guiding principles.

 

[1609.01704] Hierarchical Multiscale Recurrent Neural Networks

arXiv, Computer Science > Learning; Junyoung Chung, Sungjin Ahn, Yoshua Bengio


from September 06, 2016

“Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism.”

 

The Future of AI: CCC’s Response to OSTP RFI

Computing Community Consortium, CCC Blog, Greg Hager


from September 07, 2016

In June, the White House Office of Science and Technology Policy (OSTP) announced a new Request for Information (RFI) on Artificial Intelligence (AI), to solicit feedback on how the United States can best prepare for the future of AI. According to the OSTP Blog, they “received 161 responses from a range of stakeholders, including individuals, academics and researchers, non-profit organizations, and industry.” All of the responses are now public and can be found here.

The Computing Community Consoritum‘s (CCC) submitted a response, which can be found here, after co-sponsoring the Artificial Intelligence for Social Good workshop with OSTP and Association for the Advancement of Artificial Intelligence (AAAI) in June 2016.

 

Smart medicine is coming of age, but will doctors bite?

New Scientist, Technology News


from August 31, 2016

There are just too many health conditions and the literature is changing too rapidly for a primary care physician to retain it all, says Herbert Chase, who works on biomedical informatics at Columbia University in New York City. “We’ve exceeded where it’s humanly possible for doctors to know what they need to know,” he says. “There are dozens of conditions that are being missed that could easily be diagnosed by a machine.”

 

Cornell Tech Startup Builds a Plug-and-Play Robot Intelligence

Cornell Tech, News & Views


from September 06, 2016

Robots aren’t the stuff of science fiction any more — they’re vacuuming homes and helping people take aerial selfies. But they’d be a whole lot more useful if they were just a little bit smarter.

While robot hardware is constantly improving — from Boston Dynamics’ Atlas to Sphero’s SPRK and Parrot drones — software remains a thorny challenge, particularly for autonomous or intelligent functions.

This realization led Wilson Pulling, Yang Hu, Kuowei Tseng, and Inna Kim — all Masters in Computer Science ’16 — to develop Aatonomy (formerly known as RoboTC), a plug-and-play solution that can give instant autonomous capabilities to any kind of robot, from Roombas to drones.

 

New Lab is launching an incubator for civic-minded hardware companies

Technical.ly Brooklyn


from September 01, 2016

Meet the Urban Tech Hub, a partnership between the Navy Yard’s massive shared workspace New Lab and the New York City Economic Development Corporation (NYCEDC) that aims to support hardware companies addressing challenges such as transportation, energy and air quality.

Run by Varun Adibhatla, a programmer analyst who left Wall Street for civic tech, it’s a year-long program in which participants get dedicated space within New Lab’s 84,000 square-foot facility and access to all its resources, including a 3D printing lab, an electronics lab and a wood and metal lab. (Members are eligible for New Lab office space at a lower cost, said New Lab spokeswoman Molly Erman. New Lab declined to share prices for office space.) The companies will also receive mentorship from experts in industry, academia, and government, as well as assistance in building their teams and raising additional capital.

 

Precision medicine: Power to the patients?

Nature, Nature Research, Katherine Bourzac


from September 07, 2016

“For precision medicine to live up to its potential, millions of people must share their genomic data, their health records, and their experiences. To researchers, all of it is precious. The richer the databases, the better patient care will become.” But who holds power in these systems? The patients may benefit from machine learning medical research, but will they be able to speak up about which questions to ask and how their data are managed?

More precision medicine and computational genomics:

  • Introducing Seeq – We want to make genomics inexpensive, interactive, and fun. (September 06, Medium, The Seeq Blog, Joe Pickrell)
  • Smart medicine is coming of age, but will doctors bite? (August 31, New Scientist, Technology News, Aviva Rutkin)
  • Reproducible Science: Cancer Researchers Embrace Containers in the Cloud (September 06, Google Research Blog, Nicole Deflaux and Jonathan Bingham)
  •  

    Audio data – Noise pollution research takes off

    National Science Foundation


    from September 06, 2016

    NSF awarded $4.6 m over five years to the Sounds of New York City (SONYC) project based at NYU and The Ohio State University.

    More audio data news and tools:

  • How Can Data Collection Be Used to Map City Sounds? (September 07, NYU Center for Data Science, research by Tae Hong Park, Claudio Silva and Juan Bello)
  • Urban Sound Classification (September 03, a howto by Aaqib Saeed)
  • WaveNet: A Generative Model for Raw Audio (September 08, Google DeepMind, Aaron van den Oord, Heiga Zen, Sander Dieleman)
  •  

    A wealth of data on wealth management

    McKinsey & Company


    from September 07, 2016

    We’re delighted to announce the acquisition of PriceMetrix, a Toronto-based company which helps wealth management firms harness huge amounts of data to improve client service, drive growth, and reduce risk.

    Founded 16 years ago, PriceMetrix today holds data on 60,000 financial advisers across North America—everything from the types of clients they serve to the financial products they recommend and the commissions and fees they receive. Sliced, diced, and presented in smart ways, this enables companies to better understand and manage their advisory networks.

     

    Technology Design and Consumer Privacy

    Berkeley Law


    from September 07, 2016

    Chris Hoofnagle explains how the intentional design decisions of technology companies can erode the privacy of users. Advertisers’ appetite for data can lead companies to use even harder to detect means of collecting data. In this environment, Hoofnagle concludes, the rule of law and an awareness of how technology can erode privacy are more important than ever. [video, 2:48]

     

    We Need Data to Re-Build Trust in Our Police Departments

    Pacific Standard, Eric Liu


    from September 07, 2016

    The lack of data undermines the ability of the community to review the activities of their police and ultimately widens the trust gap between citizens and police. The paucity of police data extends well beyond Baltimore and runs deep throughout the nation.

    According to the Federal Bureau of Investigation, only 3 percent of the United States’ 18,000 police departments report use of force data to the public. FBI Director James Comey calls the lack of data “ridiculous and embarrassing.” Stephen Morris, head of the FBI’s data collection efforts, acknowledges the “widespread frustration over lack of reliable data on incidents” and that “it is now the highest priority.”

     

    Big data’s deluge in higher ed

    Stanford News


    from September 07, 2016

    As college students click, swipe and tap through their daily lives – both in the classroom and outside of it – they’re creating a digital footprint of how they think, learn and behave that boggles the mind.

    “We’re standing under a waterfall, feasting on information that’s never existed before,” said Mitchell Stevens, a sociologist and associate professor at Stanford Graduate School of Education (GSE). “All of this data has the power to redefine higher education.”

    To Stevens and others, this massive data is full of promise –­­ but also peril. The researchers talk excitedly about big data helping higher education discover its Holy Grail: learning that is so deeply personalized that it both keeps struggling students from dropping out and pushes star performers to excel.

     
    Events



    2016 RecSys – ACM Recommender Systems



    Boston, MA The ACM Recommender Systems conference (RecSys). 15-19 September 2016.
     

    Data for Good Exchange 2016



    New York, NY The theme is “better governance” Sunday, September 25 [free]
     
    Deadlines



    Increasing Openness and Transparency in Research

    deadline: Contest/Award

    RWJF wants to use these grants to help increase our understanding of reproducible research work flows, and the understanding of the research community, about the potential for these approaches to accelerate the discovery process and expand access to findings.

    Deadline for letters of intent is Wednesday, 5 October 2016.

     

    CFP: Neural Abstract Machines & Program Induction workshop @ NIPS 2016

    deadline: Conference

    Barcelona, Spain The aim of the NAMPI workshop is to bring together researchers and practitioners from both academia and industry, in the areas of deep learning, program synthesis, probabilistic programming, inductive programming and reinforcement learning, to exchange ideas on the future of program induction with a special focus on neural network models and abstract machines. Through this workshop, we look to identify common challenges, exchange ideas and lessons learned from the different fields, as well as establish a (set of) standard evaluation benchmark(s) for approaches that learn with abstraction and/or reason with induced programs.

    Deadline for paper submissions is Friday, October 14.

     

    The Digital Mammography DREAM Challenge

    deadline: Contest/Award

    The Digital Mammography DREAM Challenge will attempt to improve the predictive accuracy of digital mammography for the early detection of breast cancer.

    Deadline for entries is Monday, March 13, 2017.

     
    NYU Center for Data Science News


      Summer’s notable departures and arrivals at NYU Center for Data Science

    • Foster Provost stepped down from the Moore-Sloan steering committee and handed over the CDS Interim Directorship to Claudio Silva.
    • Roy Lowrance, CDS Managing Director, moved into industry.
    • Dan Cervone, Moore-Sloan Fellow, took a job with the LA Dodgers starting in October.
    • Pablo Barberá, a Moore-Sloan Fellow, started his tenure-track position at USC in July.
    • Andrea Rooy-Jones joined Moore-Sloan in a research and outreach capacity.
    • Michael Gill joined us as a Moore-Sloan Fellow.
    • Our top-notch administrator David Clark started an MBA program this fall; we welcome Kathryn Angeles in his position.
     

    How Can Data Collection Be Used to Map City Sounds?

    NYU Center for Data Science


    from September 07, 2016

    If you want to avoid traffic in a city or congested area, there are applications such as Google Maps and Waze to help you get around. But what about avoiding noise and distracting sounds? The Citygram-Sound Project—a joint collaboration between NYU Steinhardt, NYU’s Center for Urban Studies and Progress, and CalArts—is trying to combat this gap in available data, by mapping the acoustic soundscapes of cities to better understand how noise pollution affects urban dwellers. The project even includes two faculty members from the Center for Data Science, Claudio Silva and Juan Bello.

    Citygram was started in 2011, when Tae Hong Park, an Associate Professor of Music Technology and the Director of Music Composition at NYU Steinhardt, began mapping spatio-acoustic energy through a network of sensors spread throughout New York, Los Angeles, and other major cities.

     
    Tools & Resources



    What is auto-sklearn? — AutoSklearn 0.0.1 documentation

    Machine Learning for Automated Algorithm Design group at the University of Freiburg


    from May 08, 2016

    auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction. Learn more about the technology behind auto-sklearn by reading this paper published at the NIPS 2015 .

     

    Awesome Network Analysis

    GitHub – briatte


    from September 06, 2016

    An awesome list of resources to construct, analyze and visualize network data.

     

    Urban Sound Classification

    Aaqib Saeed


    from September 03, 2016

    We all got exposed to different sounds every day. Like, the sound of car horns, siren and music etc. How about teaching computer to classify such sounds automatically into categories!

    In this blog post, we will learn techniques to classify urban sounds into categories using machine learning. Earlier blog posts covered classification problems where data can be easily expressed in vector form. For example, in the textual dataset, each word in the corpus becomes feature and tf-idf score becomes its value. Likewise, in anomaly detection dataset we saw two features “throughput” and “latency” that fed into a classifier. But when it comes to sound, feature extraction is not quite straightforward. Today, we will first see what features can be extracted from sound data and how easy it is to extract such features in Python using open source library called Librosa.

     

    MoonMail: Shoot billions of emails using AWS – SES – Lambda. Serverless Framework powered project

    GitHub – microapps


    from September 07, 2016

    Send email marketing campaigns with Amazon SES. Let Amazon Lambda compose email by email and literaly scale it to infinite.

    With MoonMail you can: create & edit lists of recipients (email addresses) and store them within a DynamoDB. Create & edit html email marketing campaigns, send them and track their opens and clicks.

     
    Careers


    Full-time, non-tenured academic positions

    Astronomer – Space Telescope Science Institute



    Johns Hopkins University; Baltimore, MD
     
    Postdocs

    Postdoc – State of Alaska’s Salmon & People Synthesis



    University of California, Santa Barbara’s National Center for Ecological Analysis and Synthesis; Santa Barbara, CA
     

    Postdoc – rOpenSci



    rOpenSci Project at University of California, Berkeley; Berkeley
     

    Leave a Comment

    Your email address will not be published.