NYU Data Science newsletter – July 13, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 13, 2016

GROUP CURATION: N/A

 
Data Science News



The Multiworld Testing Decision Service

John Langford, Machine Learning (Theory) blog


from July 11, 2016

We need a system that explores over appropriate choices with logging of features, actions, probabilities of actions, and outcomes. These must then be fed into an appropriate learning algorithm which trains a policy and then deploys the policy at the point of decision. Naturally, this is what we’ve done and now it can be used by anyone. This drops the barrier to use down to: “Do you have permissions? And do you have a reasonable idea of what a good feature is?”

A key foundational idea is Multiworld Testing: the capability to evaluate large numbers of policies mapping features to action in a manner exponentially more efficient than standard A/B testing. This is used pervasively in the Contextual Bandit literature and you can see it in action for the system we’ve made at Microsoft Research.

 

Rice wins interdisciplinary ‘big data’ grant

Rice University News & Media


from July 12, 2016

The three-year program will serve as a point of contact for six graduate students, two postdoctoral researchers and several undergraduates as they pursue statistics and computer science projects in the Rice research groups to which they’re assigned.

More university data science:

  • U-M launches two specializations for new generation of data scientists (July 11, University of Michigan, The University Record)
  • NYU Steinhardt and StartEd Launch NY Edtech Accelerator and Incubator (July 06, NYU Steinhardt)
  •  

    Data-Driven Discovery of Models (D3M) presolicitation

    Federal Business Opportunities


    from July 07, 2016

    DARPA is soliciting innovative research proposals in the area of semi-automated discovery of machine learning and statistical models and processing pipelines. Proposed research should investigate innovative approaches that enable revolutionary advances in science, devices, or systems. Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice.

     

    NYU Steinhardt and StartEd Launch NY Edtech Accelerator and Incubator | At a Glance

    NYU Steinhardt


    from July 06, 2016

    NYU Steinhardt and StartEd Companies, Inc., a public-benefit corporation, announced a collaboration to galvanize the education innovation ecosystem and provide programming in the fields of education entrepreneurship and education technologies. The centerpiece of this collaboration will be an EdTech Accelerator and Incubator on NYU’s Washington Square campus.

     

    U-M launches two specializations for new generation of data scientists

    University of Michigan, The University Record


    from July 11, 2016

    Recognizing that new career pathways require new approaches to education and training, the University of Michigan is launching two new series of courses on Coursera as part of its commitment to developing curricula and lifelong learning opportunities for a new generation of data science students.

    “Learners who engage in these skills-based specializations will become data storytellers,” said James DeVaney, associate vice provost for digital education & innovation. “An increasing number and range of organizations across sectors of the global economy want to engage talented individuals who bring structure to complex problems and see possibilities in a world of messy data.”

    These specializations, Applied Data Science with Python and Data Collection and Analysis, will help learners thrive in data science roles by equipping them with the skills to collect, mine and analyze big data.

     

    Combining Machine Learning With Expert Human Judgement // Eric Colson, Stitch Fix

    CreativeAI, YouTube, Data Driven NYC


    from March 18, 2016

    Eric Colson, Chief Algorithms Officer at Stitch Fix, presented at FirstMark’s Data Driven NYC on March 16, 2016. Colson discussed the benefits of combining machine learning and humans for better recommendations.

    Stitch Fix is a personal styling platform that delivers curated and personalized apparel and accessory items of perfect fit.

     

    Dispatch: The White House’s and NYU’s Artificial Intelligence Workshop #AINow

    LinkedIn, Khurram Nasir GoreKhurram Nasir Gore


    from July 12, 2016

    While we often fantasize about the fallout from the coming robot apocalypse, that is simply not today’s challenge. Today, we need to focus on the near-term impact of smart-er automation systems on labor and social structures. That’s exactly what the workshop’s incredible hosts Kate Crawford (Microsoft Research and NYU) and Meredith Whitaker (Google Open Research) did exceedingly well.

    Here are my highlights and takeaways.

     

    The genetic architecture of type 2 diabetes

    Nature; Christian Fuchsberger et al.


    from July 11, 2016

    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects.

     

    Developing SocArXiv — a new open archive of the social sciences to challenge the outdated journal system.

    London School of Economics, Impact of Social Sciences blog


    from July 11, 2016

    Philip Cohen argues a cultural shift is taking place in the social sciences. He introduces SocArxiv, a fast, free, open paper server to encourage wider open scholarship in the social sciences.

    Another new Open publishing platform:

  • The ReScience Journal (July 14, Tiziano Zito)
  •  

    Tech coalition targets financial startups’ regulatory hurdles

    TheHill


    from July 11, 2016

    A coalition of major technology companies are drawing attention to the regulatory challenges faced by emerging financial services startups ahead of a key congressional hearing.

    Financial Innovation Now released a report Monday evening detailing how new financial technology (“FinTech”) companies struggle with a patchwork system of state laws and federal laws geared toward traditional institutions.

    Those regulations don’t account for the unique reach and services FinTech companies provide, says the coalition.

     

    [1607.03057] Learning from the News: Predicting Entity Popularity on Twitter

    arXiv, Computer Science > Social and Information Networks; Pedro Saleiro, Carlos Soares


    from July 11, 2016

    In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learn- ing approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different entities in a dataset of over 150M tweets and 5M news and obtained F1 scores over 0.70. Error analysis indicates that news perform better on predicting entity popularity on Twitter when they are the primary information source of the event, in opposition to events such as live TV broadcasts, political debates or football matches.

     

    R in the data journalism workflow at FiveThirtyEight | FlowingData

    Nathan Yau, Flowing Data blog, and Andrew Flowers, Five Thirty Eight


    from July 12, 2016

    R is used in every step of the data journalism process: for cleaning and processing data, for exploratory graphing and statistical analysis, for models deploying in real time as and to create publishable data visualizations. We write R code to underpin several of our popular interactives, as well, like the Facebook Primary and our historical Elo ratings of NBA and NFL teams. Heck, we’ve even styled a custom ggplot2 theme. We even use R code on long-term investigative projects. [video, 22:12]

     

    77 | Polygraph and The Journalist Engineer Matt Daniels – Data Stories

    Data Stories; Enrico Bertini, Moritz Stefaner and guest, Matt Daniels


    from July 01, 2016

    We have Matt Daniels on the show, the “journalist engineer” behind Polygraph, a blog featuring beautiful journalistic pieces based on data. If you are not familiar with the site, stop now and take a look. [audio, 51:57]

     
    CDS News



    JuliaCon 2016 | Julia 1.0 | Stefan Karpinski

    YouTube, JuliaLanguage


    from July 11, 2016

    Karpinski discusses the Julia 1.0 roadmap.

     
    Tools & Resources



    Network Repository | The First Interactive Data Repository with Visual Analytics for Understanding Data Easily

    Network Repository


    from July 12, 2016

    The first interactive data and network repository with real-time analytics. Network repository is not only the first interactive repository, but also the largest network and graph data repository with over 500+ donations. This large comprehensive collection of network graph data is useful for making significant research findings as well as benchmark data sets for a wide variety of applications and domains (e.g., network science, bioinformatics, machine learning, data mining, physics, and social science) and includes relational, attributed, heterogeneous, streaming, spatial, and time series data as well as non-relational machine learning data.

     

    Kaggle Progression System & Profile Redesign Launch

    Kaggle, no free hunch blog


    from July 11, 2016

    Kaggle was founded on the principles of meritocracy, and our community has thrived as a place where anyone—regardless of background or degree—can come to earn accolades for their performance in machine learning competitions. Today, we’re excited to announce the launch of the new Kaggle Progression System and profile design. It uses the same core value of meritocracy to expand our recognition and rewards to include contributions to the community through valuable comments and code. (It does not make any changes to the existing competitions points system.) We believe the Progression System and updated profile design provide a more holistic view of the quality and quantity of a data scientist’s work on Kaggle.

     
    Careers



    The New Data Scientist Venn Diagram | What’s The Big Data?
     

    Gil Press, What is the Big Data? blog
     

    Civic Analytics Postgraduate Fellowship Program
     

    NYU Center for Urban Science and Progress
     

    Leave a Comment

    Your email address will not be published.