NYU Data Science newsletter – July 26, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 26, 2016

GROUP CURATION: N/A

 
Data Science News



search.bioPreprint: a discovery tool for cutting edge, preprint biomedical research articles

F1000Research, Ansuman Chattopadhyay et al.


from July 20, 2016

The time it takes for a completed manuscript to be published traditionally can be extremely lengthy. Article publication delay, which occurs in part due to constraints associated with peer review, can prevent the timely dissemination of critical and actionable data associated with new information on rare diseases or developing health concerns such as Zika virus. Preprint servers are open access online repositories housing preprint research articles that enable authors (1) to make their research immediately and freely available and (2) to receive commentary and peer review prior to journal submission. There is a growing movement of preprint advocates aiming to change the current journal publication and peer review system, proposing that preprints catalyze biomedical discovery, support career advancement, and improve scientific communication. While the number of articles submitted to and hosted by preprint servers are gradually increasing, there has been no simple way to identify biomedical research published in a preprint format, as they are not typically indexed and are only discoverable by directly searching the specific preprint server websites. To address this issue, we created a search engine that quickly compiles preprints from disparate host repositories and provides a one-stop search solution. Additionally, we developed a web application that bolsters the discovery of preprints by enabling each and every word or phrase appearing on any web site to be integrated with articles from preprint servers. This tool, search.bioPreprint, is publicly available at http://www.hsls.pitt.edu/resources/preprint.

 

Tweet of the Week

Twitter


from July 26, 2016

 

DSI launch research collaboration with Thomson Reuters

Imperial College London


from July 20, 2016

The Data Science Institute, Imperial College London and Thomson Reuters are working together to bring data science expertise from both corporate and academic worlds to bear on global big data challenges impacting financial and legal professionals. The initiative aims to drive faster innovation in FinTech and RegTech industries through collaborative research projects.

The projects will bring together academics from across Imperial’s departments to do cutting edge research informed by real industry problems. The collaboration will utilise the KPMG Data Observatory to visualise data, enabling novel exploration of data and unearthing fresh insights.

 

Rules of the game

Science


from July 22, 2016

[David] Baker’s team and collaborators report making cages that assemble themselves from as many as 120 designer proteins, which could open the door to a new generation of molecular machines.

If the ability to read and write DNA spawned the revolution of molecular biology, the ability to design novel proteins could transform just about everything else. “Nobody knows the implications,” because it has the potential to impact dozens of different disciplines, says John Moult, a protein-folding expert at the University of Maryland, College Park. “It’s going to be totally revolutionary.”

 

The Ethics of Artificial Intelligence in Intelligence Agencies

The National Interest, Cortney Weinbaum


from July 18, 2016

In today’s intelligence community, an analyst would likely write a paper, spend days (or hours, in a truly urgent scenario) vetting the assessment with colleagues and senior officers, and then share the assessment with policymakers. Yet the value of such algorithms resides in their ability to initiate immediate responses. Once an AI system detects an imminent attack, its value would reside in the ability to respond before the attack could take place.

Like stock market speed traders looking for their competitive edge, adversaries of the United States can be expected to relentlessly seek ways to use the nation’s algorithms against it. AI could provide strategic advantages but also create enormous vulnerabilities when the United States negotiates new treaties and trade agreements or tries to peacefully resolve a conflict.

 

Better Screening Using Big Data

Journal of Oncology Practice, Debra Patt


from July 05, 2016

The American Cancer Society has estimated that 53,070 Americans will be diagnosed with pancreatic cancer in 2016, and a majority of these patients will die as a result of their disease.1 Most patients with pancreatic cancer present at a late stage because of the timing of the onset of symptoms; therapies effective at treating advanced disease are limited, and consequentially, the average survival remains quite poor.

At present, there is no evidence-based asymptomatic screening for pancreatic cancer. The utility of any tool to screen for a cancer is dependent on many factors: identification of patients at risk, which contributes to the pretest probability a cancer will occur; sensitivity and specificity of a screening test; timing and frequency of use of a screening tool; and ability to alter the outcome of disease or death when a disease is identified early. Receiver operating characteristic curves are often used to aggregate some of these factors in optimizing implementation of a screening tool.

 

Meet 6 Artists Who Have Swept Data Art into the Digital Age

Medium, Infogram


from July 25, 2016

The ability to see patterns, make sense of data, and also create beautiful works of art is a skill. Data visualizations have officially become common in every facet of our lives, even the art world.

Meet 6 artists who have found the beauty in numbers?—?using spreadsheets, datasets, and computers as their paint and canvas.

 

How Twitter data is combined with AI and machine learning to give hedge funds an edge

International Business Times


from July 18, 2016

Extracting value from a universe of data, analysing sentiment around company names (equities) or about anything else (macro), is a complex journey and we are only about 5% down that road.

The parameters are evolving by which an ever-expanding data set, including the likes of Twitter, pictures, text, video is processed; relying on experts versus the wisdom of the crowd; sentiment derived from a “bag of words”, as opposed to structured linguistic analysis.

Last week’s Unicom conference, AI, Machine Learning and Sentiment Analysis Applied to Finance (July 14) brought together a group of experts in this area. Professor Gautum Mitra, OptiRisk Systems introduced Elijah DePalma and James Cantarella, Thomson Reuters; Pierce Crosby, StockTwits; Anders Bally, Sentifi; Peter Hafez, RavenPack; Stephen Morse, Twitter.

 

Welcoming BIDS 2016 Data Science Fellows

Berkeley Institute for Data Science


from July 25, 2016

We are thrilled to introduce our Berkeley BIDS 2016 cohort of data science fellows!

  • Rebecca Barter, Statistics
  • Orianna DeMasi, Electrical Engineering & Computer Science
  • Chris Holdgraf, Neuroscience
  • Dmitriy Morozov, Computational Research, Lawrence Berkeley National Lab
  • Laura Nelson, Digital Humanities & Sociology
  • Alexandra Paxton, Cognitive and Brain Science
  • Lauren Ponisio, Environmental Science, Policy, and Management
  • Nelle Varoquaux, Statistics
  •  

    Offline: The crisis in scientific publishing

    The Lancet, Richard Horton


    from July 23, 2016

    The state of scientific publishing today has never been more precarious. And publishers (and editors) have few solutions. Instead, they are preoccupied by a host of anxieties. Will innovative start-ups, such as ResearchGate, eliminate the need for journals? Will predatory open access destroy public trust in science? Is copyright dead? Editors too are lost. Is peer review meeting the needs of modern multidisciplinary science? Does the threat of research misconduct mean we should be endemically suspicious of authors? Is lack of reproducibility in some categories of research proof of fatal flaws in the scientific method? These are important questions. But they are peripheral to the main danger. The more serious question (that should be) gnawing away at the soul of the modern publisher/editor is: what am I here for?

     

    The Genomics Inflection Point: Implications for Healthcare

    Rock Health; Lauren Devos, Teresa Wang, Sandya Iyer


    from July 25, 2016

    The genomics industry has tremendous potential to move the needle in health. Delivering on the promise of genomics depends on three main factors—many of which are within the purview of digital health. Through our survey of one thousand consumers, we provide novel data and insights regarding adoption, willingness to pay for specific use cases, and explore concerns around privacy and ownership.

     
    Deadlines



    Call for Participation — Visualization in Data Science workshop

    deadline: subsection?

    Baltimore, MD The workshop [preceding IEEE VIS 2016 on Monday, October 26] will feature a series of keynote presentations by leading data scientists covering visualization in data science. In addition, there will be presentations by speakers selected from submitted abstracts.

    Deadline for submissions is Friday, August 5.

     

    CFP ICDM-2016 Workshop on Data Mining Systems and their Applications on the Cloud: CLOUDMINE

    deadline: subsection?

    Barcelona, Spain The CLOUDMINE workshop aims to bring together researchers and practitioners working on cloud based data mining systems and applications. Part of ICDM 2016.

    Deadline for workshop papers is Friday, August 12.

     
    Tools & Resources



    Quickly Write and Deploy an R API Client

    Medium, Alexandra Johnson


    from July 25, 2016

    There are many different definitions of a data scientist, but there are two very common languages: Python and R. At SigOpt I work primarily in Python, so I had no trouble writing code for our Python API Client. However, it soon became apparent that an R API Client would be very useful for many of our customers! The only problem was that I did not know R. Luckily, R has an avid open source community that has written lots of tools and tutorials. Here are some articles and step-by-step guides that I founds extremely helpful while developing SigOpt’s R API Client.

     

    Language modeling a billion words

    Torch, Nicholas Leonard


    from July 25, 2016

    In this Torch blog post, we use noise contrastive estimation (NCE) [2] to train a multi-GPU recurrent neural network language model (RNNLM) on the Google billion words (GBW) dataset [7]. The work presented here is the result of many months of on-and-off work. The enormity of the dataset caused us to contribute some novel open-source Torch modules, criteria and even a multi-GPU tensor. We also provide scripts so that you can train and evaluate your own language models.

     

    Approaching (Almost) Any Machine Learning Problem

    Kaggle, no free hunch blog; Abhishek Thakur


    from July 21, 2016

    An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I’ve taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals.

     

    Introduction to Zipline: A Trading Library for Python

    Quant Insti


    from July 18, 2016

    Zipline is a Python library for trading applications that powers the Quantopian service. It is an event-driven system that supports both backtesting and live-trading. We demo how to install Zipline and how to implement Moving Average Crossover strategy, and calculate P&L, Portfolio value, etc.

     
    Careers



    Fast Forward Labs: Fall Internships: Research Engineering and Prototype + Data Visualization
     

    Fast Forward Labs
     

    Assistant Professor or Associate Professor, Department of Communication
     

    University of California-Davis; Davis, CA
     

    Leave a Comment

    Your email address will not be published.