NYU Data Science newsletter – August 19, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 19, 2016

GROUP CURATION: N/A

 
Data Science News



Another AI startup wants to replace hedge funds

Recode


from August 07, 2016

A machine intelligence system, dubbed Emma AI, is starting a fund that hopes to outsmart the humans and computers that make a living trading stocks. It’s part of a wave of tech startups aiming advanced machine learning at financial markets.

 

How schools that obsess about standardized tests ruin them as measures of success

Vox, Mark Palko and Andrew Gelman


from August 16, 2016

As statisticians with experience teaching at the high school and college level, we recognize a familiar problem: A test that overshadows the ultimate outcomes it is intended to measure turns into an invalid test.

Back in the old Soviet Union, factories would produce masses of unusable products as a result of competition to meet unrealistic production quotas. Analogously, many charter schools, under pressure to deliver unrealistic gains in test scores, are contorting themselves to get the numbers they’ve promised. They’re being rewarded for doing so. But that monomaniacal focus on test scores undermines the correlation between test scores and academic accomplishment that originally existed.

 

Ian Goodfellow – Quora

Quora, Ian Goodfellow


from August 12, 2016

Example question: What is the difference between Neocognitron and Convolutional neural network?

The main difference is that the Neocognitron wasn’t trained with backprop.

 

University of Akron puts the brakes on data science center

cleveland.com


from August 17, 2016

The University of Akron, facing financial difficulties, has put the brakes on a highly promoted data science center.

 

Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features

Nature Communications; Michael Snyder et al.


from August 16, 2016

Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients’ prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs. [full text]

 

10 Rules Of Thumb For Hockey Analysts

Hockey Graphs, Jack Han


from August 16, 2016

“1. The point of hockey is to create goal differential. The point of hockey analysis is to find ways to improve it.”

Also, upcoming Hockey Analysis Conferences:

  • RIT Hockey Analytics Conference (on Saturday, September 10)
  • Babson College Hockey Analytics Conference “Analytics on Ice: The Long Change” (on Saturday, October 1)
  •  

    The Women Changing The Face Of AI

    Fast Company


    from August 18, 2016

    The idea was born in a hotel room.

    In 2005, Hanna Wallach, a machine-learning researcher, found herself bunking with colleagues to attend the Neural Information Systems Processing (NIPS) conference. Wallach had been working in the field since 2001 and had attended numerous conferences, but this was the first time she had roomed with other women who specialized in machine learning, a branch of artificial intelligence that researches how computer programs can learn and grow

     

    Germany joins ELIXIR

    ELIXIR – A distributed infrastructure for life-science information


    from August 19, 2016

    Germany today became the latest country to join ELIXIR, following the ratification of the ELIXIR Consortium Agreement by the Federal Ministry for Education and Research (BMBF). … The ELIXIR Node in Germany will be run by de.NBI, the German Network for Bioinformatics Infrastructure, which has been established to coordinate bioinformatics service provision across Germany.

     

    A Genomic Map of the Effects of Linked Selection in Drosophila

    PLOS Genetics; Eyal Elyashiv , Shmuel Sattath, Tina T. Hu, Alon Strutsovsky, Graham McVicker, Peter Andolfatto, Graham Coop, Guy Sella


    from August 18, 2016

    Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of “linked selection” on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of other modes of linked selection and of adaptation in particular. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.

     

    Uber Will Start Driverless Service in Pittsburgh—This Month

    IEEE Spectrum


    from August 18, 2016

    Driverless taxis are coming—right now. Later this month Uber will offer the world’s first ride-hailing service in Pittsburgh, using a test fleet of 100 Volvo XC90 SUVs.

     
    Events



    AoIR 2016 – BrowseSessions



    Berlin, Germany “The theme for this year’s conference is Internet Rules!”

    Wednesday-Saturday, October 5-8, hosted by Alexander von Humboldt Institute for Internet and Society and the Hans Bredow Institute for Media Research. [$$$]

     

    Machine Learning Unconference



    San Francisco, CA “Please join us [Open AI] for our first Machine Learning Unconference, an experimental gathering driven by its participants rather than an organizing committee.”

    Friday-Saturday, October 7-8, at OpenAI.

     
    CDS News



    Pablo Barberá Wins the 2016 Franklin L. Burdette/Pi Sigma Alpha Award

    Political Science Now


    from August 18, 2016

    Examining individuals using Twitter accounts in Spain, Germany, and the U.S., Barberá constructs a dynamic measure of the political ideology of Twitter users based on who they follow. His analysis shows that users joined by weak ties are incidentally exposed to diverse political opinions and become more moderate over time.

     
    Tools & Resources



    Colorgorical

    Connor Gramazio


    from August 18, 2016

    “To generate a palette with n colors, just enter the number of colors you want and click Generate. Bigger palettes will take longer than smaller palettes to make. Results will automatically appear when ready.”

    “For greater detail, please consult our paper or the source code.”

     

    Bit By Bit: Social Research in the Digital Age

    Matthew Salganick


    from August 18, 2016

    “The central premise of this book is that the digital age creates new opportunities for social research. Researchers can now observe behavior, ask questions, run experiments, and collaborate in ways that were simply impossible in the quite recent past. Along with these new opportunities also come new risks; researchers can now harm people in ways that were impossible in the quite recent past. The source of these opportunities and risks is the transition from the analog age to the digital age. This transition did not happen all at once—like a light-switch turning on—and, in fact, the transition is not yet complete. But, by this point we’ve seen enough to know that something big is happening.”

     

    Building a Data Pipeline with Airflow

    Mark Litwintschik


    from August 01, 2016

    “When I first began using Airflow I was relieved to see that at it’s core is a plain and simple Flask project. I was able to read through it’s Python codebase in a morning and have confidence that I could work my way through it’s architecture.”

    “In this blog post I’ll setup a data pipeline that takes currency exchange rates, stores them in PostgreSQL and then caches the latest exchange rates in Redis.”

     

    NASA’S DATA PORTAL

    NASA


    from August 19, 2016

    This site is a continually growing catalog of publicly
    Available NASA Datasets, APIs, Visualizations, and More.

     
    Careers


    Full-time, non-tenured academic positions

    Senior Informatics Researcher – Renaissance Computing Institute
     

    University of North Carolina-Chapel Hill; Chapel Hill, NC
     

    Leave a Comment

    Your email address will not be published.