NYU Data Science newsletter – March 15, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 15, 2016

GROUP CURATION: N/A

 
Data Science News



Steve Ruffenach: Electronic health records—time for machines to start sharing

BMJ Blogs: The BMJ


from March 10, 2016

Las Vegas hosted the Healthcare Information and Management System Society (HiMSS) annual meeting again last week. With more than 45,000 people in attendance, it is at once intense and unwieldy. It is also the meeting where every company and organization that controls, distributes, or touches medical information of any sort or in any way shows up to hawk its wares, brag about its effectiveness, and deliver promises of healthcare information management nirvana.

While such information is vitally important to patient’s lives, it is typically cloistered by proprietary software in electronic health record (EHR) silos unavailable to patients or healthcare workers outside of the originating system. And although concerns of privacy and challenges of software “interoperability” (i.e. incompatibility) were often sited as the reasons for the information lockup, US health officials have long suspected otherwise. Indeed in a report to Congress last year, the Office of Health Information Technology held that “persons or entities knowingly and unreasonably interfere with the exchange or use of electronic health information” for their financial own benefit. When interviewed in a recent New York Times article, a former Health and Human Services privacy officer explained, “It may be contrary to the financial interests of health care providers to give patients broad access to their medical records. Once patients have that information, they can share it with competing health care providers.”

Well, the lockup may soon be over. It seems there is a new sheriff in town. His name is the US government and he is taking names and kicking passes. Specifically, the U.S. Department of Health and Human Services flew into the big Vegas meeting and brought new federal rules giving them the ability to revoke certification of established medical information platforms if they are not compliant with projected standards of transparency and accountability.

 

Health Apps Routinely Sell User’s Data With Little Notice

Bloomberg BNA


from March 09, 2016

Mobile health apps commonly collect and sell users’ data to third parties with little or no notice, two health policy researchers told me recently.

While the use of mobile health applications continues to grow, Americans generally know little about how the companies that create these apps are using their personal and health data, Sarah Blenner, a project manager in the Public Health Training Program on Population Health Advocacy at the University of California, Los Angeles, told me. Many people falsely believe that federal health laws require app developers to keep their health information private, she said.

“People generally have the idea that their information is being kept private, but that’s not really the case,” Blenner said.

 

“A lot of the data at rest will also move to data in motion as this technology matures.”

ReadWrite


from March 10, 2016

Confluent co-founder Neha Narkhede talks about the value of real time data processing, how data at rest may be data in motion but you just don’t know it, and how that may benefit us all.

 

The Scientific Impact of Positive and Negative Phase 3 Cancer Clinical Trials

JAMA Oncology


from March 10, 2016

Phase 3 trials provide the highest level of evidence for showing the efficacy of new treatments or interventions. The phase 3 trial programs of the National Cancer Institute’s (NCI’s) National Clinical Trials Network and the NCI’s Community Oncology Research Program (NCORP) are vital national resources and represent a substantial investment on the part of federal agencies. Given the size of the investment, negative trials—that is, those that fail to show that a new treatment is superior to standard treatment—may be incorrectly regarded as poor investments. However, negative trials are also important if they show that new treatments, which might otherwise be adopted into clinical practice, in fact do not work.

The rate at which trials are positive (showing that a new treatment is superior to standard treatment) has previously been examined, as has the relationship between trial results and publication rates in the context of publication bias.1- 4 But the comparative scientific impact of positive vs negative trials using citation data has not been investigated. In this article, we use the phase 3 trial database of SWOG, a major national cooperative group, in combination with its trial publication database and citation data from Google Scholar, to compare the scientific impact of positive and negative cancer clinical trials.

 

Netflix: Using Audience Data to Achieve World Domination

TV[R]EVIVAL


from March 09, 2016

In January, Netflix announced an aggressive 130-country expansion of its streaming service solidifying them as a key player in the TV revolution. In addition to their global expansion, Netflix is emerging as a content creation company with an ever-growing list of originally produced series and documentaries. Having roots in the US, how does Netflix repeat its local success on a global scale while remaining relevant to its subscribers?

 

Would You Give All Your Personal Data to Science?

New York Magazine, Science of Us blog


from March 10, 2016

… In 2017, researchers at New York University will begin to assemble a similarly massive database with the Kavli HUMAN Project, an audaciously ambitious study that will track the biology and behavior of 10,000 New Yorkers for the next 20 years. Their humble goal: to gather enough data over time to learn “everything there is to know about a group of people,” said Paul Glimcher, a neuroscientist and economist, who is the director of the project. The project will begin by recruiting 2,500 New York volunteers, from across all five boroughs, whose entire households will also need to agree to the terms of the research project. For the next two decades, practically everything that happens in their lives will turn to data, which will be made accessible by researchers in a wide variety of fields, such as medicine, psychology, sociology, economics, and public policy.

More Data Collection & Security:

  • Mobile data sharing in emergencies – consent, care and control (Linnet Taylor, March 15)
  • NYU professor questions big data collection (The Brown Daily Herald, March 16)
  • Health Apps Routinely Sell User’s Data With Little Notice (Bloomberg BNA, March 9)
  • Court rulings threaten to upset defences against data breach claims (Financial Times, March 16)
  • Ebola: A Big Data Disaster (The Centre for Internet and Society, Sean McDonald, March 1)
  •  

    Rapid assessment of disaster damage using social media activity

    Science Advances; Yury Kryvasheyeu, Haohui Chen, Nick Obradovich, Esteban Moro, Pascal Van Hentenryck, James Fowler and Manuel Cebrian


    from March 11, 2016

    Could social media data aid in disaster response and damage assessment? Countries face both an increasing frequency and an increasing intensity of natural disasters resulting from climate change. During such events, citizens turn to social media platforms for disaster-related communication and information. Social media improves situational awareness, facilitates dissemination of emergency information, enables early warning systems, and helps coordinate relief efforts. In addition, the spatiotemporal distribution of disaster-related messages helps with the real-time monitoring and assessment of the disaster itself. We present a multiscale analysis of Twitter activity before, during, and after Hurricane Sandy. We examine the online response of 50 metropolitan areas of the United States and find a strong relationship between proximity to Sandy’s path and hurricane-related social media activity.

     

    Genealogical Trees of Scientific Papers

    PLOS ONE; Michaël Charles Waumans and Hugues Bersini


    from March 08, 2016

    Many results have been obtained when studying scientific papers citations databases in a network perspective. Articles can be ranked according to their current in-degree and their future popularity or citation counts can even be predicted. The dynamical properties of such networks and the observation of the time evolution of their nodes started more recently. This work adopts an evolutionary perspective and proposes an original algorithm for the construction of genealogical trees of scientific papers on the basis of their citation count evolution in time. The fitness of a paper now amounts to its in-degree growing trend and a “dying” paper will suddenly see this trend declining in time. It will give birth and be taken over by some of its most prevalent citing “offspring”. Practically, this might be used to trace the successive published milestones of a research field.

     

    “Office Hours” with Social Analytics Superstar, Professor Sinan Aral

    SoundCloud, Business of Being Awesome podcast


    from March 08, 2016

    Technology has big implications for humanity–and few people know the numbers behind it like Professor Sinan Aral. In the #bizoba Season 2 Premiere, Sinan shares with us some of his projects where “Big Data” is more than just a b-school buzzword–it’s a means to positively influence society. His work with companies from Nike to Facebook to social enterprises in Africa is on the “bleeding edge” of using data on human behavior to support healthier, happier living around the world. In #bizoba fashion of embracing the journey towards doing what you love, Sinan also shares some of the twists and turns of his career path, from taking the LSAT to taking a seat in Kanye West’s apartment to talk “breaking the internet.” [audio, 28:45]

     

    How We Learned (Almost) Everything That’s Wrong with U.S. Census Data

    Harvard Busines Review, David S. Evans and Richard Schmalensee


    from March 11, 2016

    … We have always had great respect for the professionalism of the federal statistical agencies, and we recognize their chronic budget problems, but the government does suck up massive amounts of time from businesses and consumers collecting data. Relative to that effort, the costs of turning all that data into information useful to businesses and researchers and providing it in a timely fashion must be tiny. The agencies clearly do exactly this in some cases, but we have learned that they don’t do it in all cases. And the public has no idea which published, official data sets are as meaningless and potentially misleading as the $88 million figure that first caught our eyes.

     

    Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval

    ACM Digital Library


    from March 14, 2016

    We are delighted to welcome you to the first ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR, pronounced “cheer”), held in Chapel Hill, North Carolina, USA from March 13 to March 17, 2016. CHIIR provides a forum for the dissemination and discussion nof research on the user-centered aspects of information interaction and information retrieval. CHIIR focuses on elements such as human involvement in search activities, and information seeking and use in context. The conference represents a merger of two successful past events: the Information Interaction in Context conference (IIiX) and the Human Computer Information Retrieval symposium (HCIR), which have run since 2006 and 2007 respectively. We are pleased to present here the proceedings of their successor, CHIIR 2016.

     

    Hogg’s Research: the literature as revealed by arXiv, nucleosynthesis

    David Hogg, Hogg's Research blog


    from March 11, 2016

    My day started with coffee with Paul Ginsparg (Cornell), who is the originator of the arXiv. He is also a faculty member in both Information Science and in Physics. We discussed a wide range of things, but we ended up at experiments we could do inside the arXiv, which is not just a project that transformed all of scientific publishing, but which is a huge repository of information about how literature is written and ideas are propagated. We discussed the things that NASA ADS and INSPIRE have that arXiv doesn’t, like, for instance, a citation graph and a concordance of different versions of papers. Completely randomly, we ran into Josh Greenberg (Sloan Foundation) at the Ithaca-to-NYC bus, and he agreed that the arXiv is an amazing source of empirical data about how publishing and science works (perhaps not surprisingly!). We tentatively agreed to explore ideas by email and see if anything catches.

     
    Tools & Resources



    R for Data Science: Chapter 16, Functions

    Hadley Wickham


    from March 11, 2016

    Some advice from Hadley on writing functions in R.

     

    Breadboard

    Yale Institute for Network Science


    from February 29, 2016

    breadboard is a software platform for developing and conducting human interaction experiments on networks developed by Nicholas Christakis at Yale.

     

    State of the Art JavaScript in 2016

    Medium, JavaScript and Opinions, Francois Ward


    from February 28, 2016

    … the good news is the ecosystem is starting to slow down. Projects are merging. Best practices are starting to become clear. People are building on top of existing stuff instead of building new frameworks.

    As a starting point, here’s my personal picks for most pieces of a modern web application. Some choices are likely controversial and I will only give basic reasoning behind each choices. Keep in mind they’re mostly my opinion based on what I’m seeing in the community and personal experiences. Your mileage may vary.

     

    What’s New — pandas 0.18.0 documentation

    pandas


    from March 13, 2016

    This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.

     

    Leave a Comment

    Your email address will not be published.