NYU Data Science newsletter – May 24, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 24, 2016

GROUP CURATION: N/A

 
Data Science News



Josh Wills Keynote @ DataEngConf SF16

YouTube, Hakka Labs


from May 17, 2016

Josh Wills, Dir of Data Engineering @ Slack, gives an awesome keynote called ‘Bridging the Gap Between Data Science and Data Engineering’ at DataEngConf SF16 @ Galvanize.

 

Visualization of Publication Impact

arXiv, Computer Science > Digital Libraries; Eamonn Maguire, Javier Martin Montull, Gilles Louppe


from May 20, 2016

Measuring scholarly impact has been a topic of much interest in recent years. While many use the citation count as a primary indicator of a publications impact, the quality and impact of those citations will vary. Additionally, it is often difficult to see where a paper sits among other papers in the same research area. Questions we wished to answer through this visualization were: is a publication cited less than publications in the field?; is a publication cited by high or low impact publications?; and can we visually compare the impact of publications across a result set?

 

“We don’t know nearly enough about migration”

Science, Max Planck Institute for Demographic Research


from May 20, 2016

In the current issue of the scientific journal “Science”, which was published today, Frans Willekens of the Max Planck Institute for Demographic Research and three further experts on migration summarize in a review article the current state of knowledge. The article appears shortly before the first World Humanitarian Summit, which has been organized by the United Nations and will take place May 23–24, 2016 in Istanbul, Turkey. The summary paints a rather sobering picture of the state of knowledge in the area of migration research. On the one hand, it demonstrates that the knowledge base for the management and governanceof migration flows is limited because the necessary data are lacking. On the other hand, it makes clear that this lack of data has allowed a false perception of the extent of current migrations to arise.

 

A few words about Google’s TPU announcement this week.

Facebook, Bryan Catanzaro


from May 19, 2016

I’m glad Google is investing in systems for deploying deep learning at scale, which is a problem that’s near and dear to my heart, and also one that we care deeply about at Baidu. I’d love to see more granular data about exactly what the TPU does, and what advantages it has.

It’s well-known that custom chips can be more efficient than more general programmable processors when applied to narrow problem domains. However, deep learning is not a narrow problem domain.

 

Embrace Your Fallibility

Nick Eubank


from May 18, 2016

Two years ago, I wrote a piece about my experiences over two years testing the code for papers being published in the Quarterly Journal of Political Science, which found problems in the code of many papers. The piece was first published in The Political Methodologist, and later in PS: Politics and Political Science. This piece is an extension of that article based on conversations that article sparked and my own experiences over the past two years.

 

Recommendations for open data science

BioMed Central, GigaScience; Melissa Gymrek and Yossi Farjoun


from May 18, 2016

Life science research increasingly relies on large-scale computational analyses. However, the code and data used for these analyses are often lacking in publications. To maximize scientific impact, reproducibility, and reuse, it is crucial that these resources are made publicly available and are fully transparent. We provide recommendations for improving the openness of data-driven studies in life sciences

 

NASA Helps Launch Data Science Grad Program

datanami


from May 23, 2016

Another U.S. university is adding a data science specialization to its curriculum, this one as part of an online Masters of Science degree in engineering. The University of California at Riverside said the data science track was developed in collaboration with NASA’s Jet Propulsion Laboratory science staff.

Also, in university data science:

  • Artificial Intelligence Authority Named Jacobs Technion-Cornell Institute Director (May 25, The Cornell Daily Sun )
  • Why I Chose Cornell Tech and Jacobs (May 25, Cornell Tech, News & Views; Ron Brachman)
  •  

    Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.

    ProPublica


    from May 23, 2016

    … something odd happened when Borden and Prater were booked into jail: A computer program spat out a score predicting the likelihood of each committing a future crime. Borden — who is black — was rated a high risk. Prater — who is white — was rated a low risk.

    Two years later, we know the computer algorithm got it exactly backward. Borden has not been charged with any new crimes. Prater is serving an eight-year prison term for subsequently breaking into a warehouse and stealing thousands of dollars’ worth of electronics.

     

    Administration Issues Strategic Plan for Big Data Research and Development

    The White House, Keith Marzullo


    from May 23, 2016

    Summary:
    This Plan is an important milestone in the Administration’s Big Data Research and Development (R&D) Initiative.

     

    Scientists Are Just as Confused About the Ethics of Big-Data Research as You

    WIRED, Science


    from May 20, 2016

    When a rogue researcher last week released 70,000 OkCupid profiles, complete with usernames and sexual preferences, people were pissed. When Facebook researchers manipulated how stories appear in News Feeds for a mood contagion study in 20141, people were really pissed. OkCupid filed a copyright claim to take down the dataset; the journal that published Facebook’s study issued an “expression of concern.” Outrage has a way of shaping ethical boundaries. We learn from mistakes.

    Shockingly, though, the researchers behind both of those big data blowups never anticipated public outrage.

     

    HIPAA doesn’t apply to Precision Medicine Initiative, sparking privacy concerns

    Becker's Health IT and CIO Review


    from May 20, 2016

    A central concern to genomics is the aggregation of personal health information in one place. A report from the World Privacy Forum expounds upon this concern, and others, and suggests the federal government’s Precision Medicine Initiative is too ambiguous and lax on its privacy guidelines.

    Chief among the concerns is that medical record data and biospecimen data contributed to the initiative are not covered under HIPAA.

    Also, in Precision Medicine:

  • White House releases final Precision Medicine Initiative data security framework (May 26, Healthcare IT News)
  •  
    Events



    Salon: Big Data—Power, Potential, and Perils



    We live in the era Big Data. Its algorithms pervade our lives—shaping our purchases, our finances, our health care, our education, our communities, our public policy. Armed with phones, computers, and countless other devices, society has produced more data in the past two years—a zettabyte—than the prior span of human civilization. Yet the promise of Big Data lies not only in quantity, but in the quality of our analyses and the foresight of our applications. Is Big Data the future of scientific inquiry? Are we giving too much power to algorithms, seeking large-scale patterns, with the risk of losing the core of our humanity? Join us to explore the potential and perils of Big Data.

    New York, NY Friday, June 3, starting at 6 p.m., NYU Global Center Grand Hall (part of the World Science Festival)

     

    Moral Math of Robots: Can Life and Death Decisions be Coded?



    A self-driving car has a split second to decide whether to turn into oncoming traffic or hit a child who has lost control of her bicycle. An autonomous drone needs to decide whether to risk the lives of busload of civilians or lose a long-sought terrorist. How does a machine make an ethical decision? Can it “learn” to choose in situations that would strain human decision making? Can morality be programmed? We will tackle these questions and more as the leading AI experts, roboticists, neuroscientists, and legal experts debate the ethics and morality of thinking machines.

    New York, NY Saturday, June 4, starting at 4 p.m., Gerald W. Lynch Theater at John Jay College (part of the World Science Festival)

     
    Tools & Resources



    How predictive APIs are used at Upwork, Microsoft and BigML (and how they could be standardized)

    Papis.io, PAPIs stories


    from May 23, 2016

    Predictive Application Programming Interfaces (APIs) are receiving a lot of interest in the industry as they accelerate the development of predictive applications, by making it easier for developers to use predictive models in production settings. They are a means of exposing predictive models to other programs, and they can also expose model learning capability, in which case one may speak of Machine Learning (ML) APIs. They exist in commercial offerings, such as Microsoft Azure ML, BigML, Amazon ML, Datagami and Google Prediction API, where ML algorithms run on cloud platforms and are accessed “as a service” (MLaaS). Predictive APIs can also be created from open-source or custom frameworks and be self-hosted, as presented by Upwork and PSI (see below), as well as Seldon and PredictionIO (recently acquired by Salesforce).

     

    Using Network Dynamical Influence to Drive Consensus

    Nature, Scientific Reports; Giuliano Punzo et al.


    from May 23, 2016

    Consensus and decision-making are often analysed in the context of networks, with many studies focusing attention on ranking the nodes of a network depending on their relative importance to information routing. Dynamical influence ranks the nodes with respect to their ability to influence the evolution of the associated network dynamical system. In this study it is shown that dynamical influence not only ranks the nodes, but also provides a naturally optimised distribution of effort to steer a network from one state to another. An example is provided where the “steering” refers to the physical change in velocity of self-propelled agents interacting through a network. Distinct from other works on this subject, this study looks at directed and hence more general graphs. The findings are presented with a theoretical angle, without targeting particular applications or networked systems; however, the framework and results offer parallels with biological flocks and swarms and opportunities for design of technological networks.

     

    Feather: fast, interoperable data import/export for R

    Microsoft, Revolutions


    from May 23, 2016

    Unlike most other statistical software packages, R doesn’t have a native data file format. You can certainly import and export data in any number of formats, but there’s no native “R data file format”. The closest equivalent is the saveRDS/loadRDS function pair, which allows you to serialize an R object to a file and then load it back into a later R session. But these files don’t hew to a standardized format (it’s essentially a dump of R in-memory representation of the object), and so you can’t read the data with any software other than R.

    The goal of the feather project, a collaboration of Wes McKinney and Hadley Wickham, is to create a standard data file format that can be used for data exchange by and between R,

     

    Deploying Elasticsearch at Scale for Social Media Analytics

    Spinn3r blog


    from May 23, 2016

    Earlier today we launched a major new release of Spinn3r. This has been in development for about a year so it’s really great to get it over the fence and released and in front of customers.

    I finally wanted to take some time and write up some details of our Elasticsearch infrastructure which I think would be interesting to other startups and companies in the space.

     

    Leave a Comment

    Your email address will not be published.