NYU Data Science newsletter – April 22, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 22, 2016

GROUP CURATION: N/A

 
Data Science News



The Real Story of How Amazon Built the Echo

Bloomberg Business


from April 19, 2016

The talking speaker started as part of a secret augmented-reality project and ended up as a surprise hit.

 

New Project Aims To Help Cities Finally Get Health Data They Can Use

WNPR News


from April 18, 2016

A new initiative is working to create a data dashboard that almost any city could use to get a handle on the health of its citizens. City-level health data can be critical when it comes to measures like reducing smoking or deciding where to build new parks and health clinics. Yet most health data is collected at the county, not the city level. That means city leaders looking to improve residents’ health lack a baseline of information to work from.

The Municipal Health Data for American Cities Initiative proposes to solve this. The initiative is a collaboration between New York University and the National Resource Network, a White House and HUD-funded agency that assists cities facing economic hardship.

 

Science AMA Series: I’m George Church, professor at Harvard and MIT, founder of PersonalGenomes.org.

reddit.com/r/science


from April 18, 2016

Hi Reddit! I’m George Church and my lab is developing technologies for genome sequencing, gene editing, and DNA nanotechnology (bio).

One area that has attracted a lot of attention recently is the CRISPR technology for editing the genetic information in living cells, a sort of ‘nano-surgery’ that can be used to treat genetic disease at the root cause.

 

Interacting with Robots? Mais Oui!

NYU Tandon School of Engineering


from April 19, 2016

The French-American Doctoral Exchange Seminar (FADEx), a program organized by the Office for Science & Technology at the Embassy of France in the United States, is aimed at encouraging collaboration among American and French doctoral students with similar research interests and forging what could turn into long-lasting scientific partnerships.

The theme of FADEx 2016 is cyber-physical systems, which integrate computation, networking, and physical processes and are used in the fields of transportation, manufacturing, healthcare, and more.

Also, in University data science:

  • Advancing ingenuity (April 27, Harvard John A. Paulson School of Engineering and Applied Sciences)
  • Is Data Science a Liberal Art? (April 25, SmartData Collective, George Mount)
  • Data Science Across Disciplines and the program’s Final Poster Session (April 26, University of Illinois, University Library)
  •  

    Are Algorithms Ruining How We Discover Music?

    FiveThirtyEight


    from April 21, 2016

    On this week’s episode of our podcast What’s The Point, New York Times jazz and pop critic Ben Ratliff discusses his new book Every Song Ever and how the everything-at-our-fingertips era is changing the way we listen to music. [audio, 36:04]

    Also, this August in NYC:

  • 17th International Society for Music Information Retrieval Conference (August 7-11, organized by NYU and Columbia University)
  •  

    How Officials Distorted Flint’s Water Testing

    The New York Times


    from April 21, 2016


    Local and state officials claimed for months that tests showed that Flint’s water had safe levels of lead. But the officials used flawed testing methods, making the levels of lead in the water supply appear far less dangerous than they were.

    Three of those officials were charged with crimes on Wednesday, accused of covering up glaring deficiencies in two rounds of lead testing conducted in 2014 and 2015.

     

    Amazon Doesn’t Consider the Race of Its Customers. Should It?

    Bloomberg


    from April 21, 2016


    As Amazon has expanded rapidly to become “the everything store,” it’s offered the promise of an egalitarian shopping experience. On Amazon and other online retailers, a black customer isn’t viewed with suspicion, much less followed around by store security. Most of Amazon’s services are available to almost every address in the U.S. “We don’t know what you look like when you come into our store, which is vastly different than physical retail,” says Craig Berman, Amazon’s vice president for global communications. “We are ridiculously prideful about that. We offer every customer the same price. It doesn’t matter where you live.”

     

    Information Gaps in Our Understanding of Threats to Global Biodiversity

    UC Santa Barbara, National Center for Ecological Analysis and Synthesis


    from April 21, 2016

    In the age of information, where massive and widespread collections of data are being digitized and shared, one would expect that there would be extensive information on the threats to global biodiversity. Having such information would lead to better understating on how to reduce biodiversity loss and achieve environmental goals. However, a Policy Forum piece recently published in Science found that threat data is not readily available, making it difficult to fully understand the threats to biodiversity and difficult to identify the steps necessary to alleviate those threats.

     
    Events



    BIDS Spring 2016 Data Science Faire



    This year’s Data Science Faire will feature a wide variety of poster/demo exhibits from students and researchers on campus as well as a series of data science–related lightning talks from BIDS fellows. The event will culminate with keynote address from Lucas Merrill Brown, Data Scientist and Digital Expert at the US Digital Service at the White House.

    Berkeley, CA. Tuesday, May 3, starting at 1:30 p.m.,
    190 Doe Library

     

    Google I/O 2016



    Google I/O is for developers—the creative coders who are building what’s next. Together we’ll explore the latest in tech, mobile, and beyond.

    Mountain View, CA. Wednesday-Friday, May 18-20, at the Shoreline Amphitheater near Google Headquarters.

     

    Get ready for R/Finance 2016



    R/Finance 2016 is less than a month away and, as always, I am very much looking forward to it. In past years, I have elaborated on what puts it among my favorite conferences even though I am not a finance guy. R/Finance is small, single track and intense with almost no fluff. And scattered among the esoterica of finance and trading there has, so far, always been a rich mix of mathematics, time series applications, R programming, stimulating conversation and attitude. When it comes down to it, it’s the people, the organizers and participants who make a conference. Looking over the agenda for this year, I am sure that once again, for two days at least, Chicago will be the center of the R world.

    Chicago, IL Friday-Saturday, May 20-21, at the University of Illinois at Chicago.

     

    17th International Society for Music Information Retrieval Conference



    The annual Conference of the International Society for Music Information Retrieval (ISMIR) is the world’s leading research forum on processing, searching, organizing and accessing music-related data. The ISMIR Conference provides a meeting place for the discussion of MIR-related research. Its main goal is to foster the exchange of ideas across disciplines by bringing together researchers, developers, educators, librarians, students, and professional users.

    Sunday-Thursday, August 7-11, in New York City.

     
    Deadlines



    Submitting Papers, CSCW 2017

    deadline: subsection?

    This year in particular CSCW would like to invite papers that make a contribution to building CSCW systems including (but not limited to) technical enablers for CSCW applications; methods and techniques for new CSCW services and applications; and evaluation of fully-built CSCW systems and lab and field settings. Authors will be able to direct such submissions to a dedicated subcommittee.

    Deadline for submissions is Friday, May 27.

     

    BoFs | SciPy 2016

    deadline: subsection?

    In an effort to increase community building, SciPy emphasizes the birds of a feather (BoFs) sessions to discuss primary or tangentially related topics in an interactive setting. These sessions usually include short presentations by a moderator and panel followed by an open discussion with everyone in attendance.

    Austin, TX Deadline for BoF submissions is Monday, June 27. SciPy 2016 is Wednesday-Friday, July 13-15.

     
    Tools & Resources



    The Numenta Anomaly Benchmark

    GitHub – numenta


    from March 22, 2016

    Welcome. This repository contains the data and scripts comprising the Numenta Anomaly Benchmark (NAB). NAB is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications.

    Included are the tools to allow you to easily run NAB on your own anomaly detection algorithms; see the NAB entry points info. Competitive results tied to open source code will be posted in the wiki on the Scoreboard. Let us know about your work by emailing us at nab@numenta.org or submitting a pull request.

     

    How-to: Use Impala and Kudu Together for Analytic Workloads – Cloudera Engineering Blog

    Cloudera Engineering Blog; David Alves


    from April 20, 2016

    Apache Kudu (incubating) is the newest addition to the set of storage engines that integrate with the Apache Hadoop ecosystem. The promise of Kudu is to deliver high-scan performance, targeting analytical workloads, while allowing users to concurrently insert, update, and delete records. With these properties, Kudu becomes a viable alternative to existing combinations of HDFS and/or Apache HBase to achieve similar results with less complicated ETL pipelines, comparable performance, and a lighter maintenance burden.

    In this blog post, we want to describe how Apache Impala (incubating) integrates with Kudu for analytic SQL queries on Hadoop and how it takes full advantage of the distinct properties of Kudu.

     

    NYC subway math

    Erik Bernhardsson


    from April 04, 2016

    Apparently MTA (the company running the NYC subway) has a real-time API….I started tracking all subway trains one day and completely forgot about it. Several weeks later I had a 3GB large data dump full of all the arrivals for 1, 2, 3, 4, 5, 6, L, SI and GC (the latter two being Staten Island railway and Grand Central Shuttle).

     

    Leave a Comment

    Your email address will not be published.