NYU Data Science newsletter – July 14, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 14, 2016

GROUP CURATION: N/A

 
Data Science News



Deep Learning: The Future of Healthcare Data

Samsung Insights, Megan Williams


from July 12, 2016

Big data in healthcare can now be measured in exabytes, and every day more data is being thrown into the mix in the form of patient-generated information, wearables and EHR systems. Traditional methods of analysis are no longer enough to handle, let alone take proper advantage of, the potential that healthcare data holds. This is where deep machine learning (or simply, “deep learning”) comes in. However, its greatest power lies in its ability to extract value from data in ways that humans and traditional machine learning methods cannot.

Deep machine learning has applications in a number of healthcare areas.

 

Interview: Brian Granger, Leader of the Project Jupyter Notebook

Partially Derivative blog


from July 11, 2016

For more than a decade Brian Granger has been developing some of the most widely-used technologies in data science. He was an early core contributor of the IPython Notebook and now leads the Project Jupyter Notebook team. We talked about the massive adoption of Project Jupyter across academia and in industry, exciting new features on the horizon, and the future of data visualization. [mp3 audio download, 31:40]

 

The ReScience Journal

Tiziano Zito (editor), Benoît Girard (reviewer) and Mehdi Khamassi (reviewer)


from July 14, 2016

ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible. … ReScience lives on GitHub where each new implementation of a computational study is made available together with comments, explanations and tests.

 

Foursquare President: ‘Huge’ Industry Developing Around Location Intelligence

Street Fight magazine


from July 12, 2016

Over the past five years, Foursquare has had quite an evolution. Once a buzzy darling of the location-media industry known for its consumer check-ins and gamification, the company has evolved into a location data powerhouse, connecting consumers’ digital actions to where they are and what they do in the real-world — and giving marketers the ability to reach out to targeted consumers based on their location history. CEO Jeff Glueck has said that the company’s rich location data gives it the opportunity to become “the Nielsen of the real world,” poising it to hit profitability in the next few years.

 

[1607.03320] What Happens After You Both Swipe Right: A Statistical Description of Mobile Dating Communications

arXiv, Computer Science > Social and Information; Jennie Zhang, Taha Yasseri


from July 12, 2016

This paper looks at one of these sets of data: metadata of approximately two million conversations, containing 19 million messages, exchanged between 400,000 heterosexual users on an MDA. Through computational analysis methods, this study offers the very first large scale quantitative depiction of mobile dating as a whole. We report on differences in how heterosexual male and female users communicate with each other on MDAs, differences in behaviors of dyads of varying degrees of social separation, and factors leading to “success”-operationalized by the exchange of phone numbers between a match.

 

[1607.03483] Block Models and Personalized PageRank

arXiv, Computer Science > Social and Information Networks; Isabel Kloumann, Johan Ugander, Jon Kleinberg


from July 12, 2016

Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods though the seed set expansion problem: given a subset S of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of landing probabilities of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model, and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter ? that depends on the block model parameters. This connection provides a novel formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance despite being simple linear classification rules, and are even competitive with belief propagation.

 

DARPA Challenge Tests AI as Cybersecurity Defenders

IEEE Spectrum


from July 13, 2016

Today’s malicious hackers have an average of 312 days to exploit “zero-day” computer software flaws before human cybersecurity experts can find and fix those flaws. The U.S. military’s main research agency focused on disruptive technologies aims to see whether artificial intelligence can do a better job of finding and fixing such exploits within a matter of seconds or minutes.

This summer, seven finalist teams in the Cyber Grand Challenge the U.S. Defense Advanced Research Projects Agency (DARPA) will do battle with AI systems that can autonomously scan rivals’ network servers for exploits and protect their own servers by actively finding and fixing software flaws. The immediate rewards comes in the form of a US $2 million prize for first place, $1 million for second place, and $750,000 for third place.

 

Artificial Intelligence Used to Predict Onset of Alzheimer’s

Medgadget


from July 13, 2016

To help identify these easy to miss characteristics [of Alzheimer’s disease], the team used a computer to analyze hundreds of existing patient scans to point characteristics that repeatedly come up in Alzheimer’s patients. Specifically, machine learning was applied to arterial spin labeling (ASL) scans that produce perfusion maps of the brain. The team used these to review patient scans without knowing the actual state of the patients in order to evaluate the capability of the system.

 

Join us in Supporting a Tax Credit to Develop Games in New York State

NYU Game Center


from July 13, 2016

Lending support to the ongoing campaign to provide tax credits to New York-based game development, the NYU Game Center, members of the NYC IGDA, and representatives from the New York City games industry joined Senator Martin Golden and Assemblymember Joseph Lentol at a City Hall press conference, urging Governor Cuomo to sign the Empire State Digital Gaming Media Production Credit (A.10083/S.7485) bill into law.

 

Pharma’s digital health ambitions: Where are the opportunities and what’s hindering progress?

MedCity News


from July 12, 2016

A panel discussion on the future of digital health in pharma at the MedCity CONVERGE conference in Philadelphia this week highlighted how Pfizer, Eli Lilly and Roche are using telemedicine, mobile health and connected devices as part of their drug development strategy. They also called attention to the impact of compliance on implementing these technologies.

 

Liberating Data: How libraries and librarians can help researchers with text and data mining.

London School of Economics, Impact of Social Sciences blog


from July 12, 2016

With advances in computational methods and the proliferation of data sources, text and data mining offers exciting new directions for research. Neil Stewart, Jane Secker, Chris Morrison and Laurence Horton look at the role of libraries in providing support to researchers for these projects, particularly to help with rights issues and to digitise material for scholarly re-use. Librarians should be bold about the advice they give to researchers and encourage them to use the new copyright exceptions.

 

How technology disrupted the truth

The Guardian, Media


from July 12, 2016

Social media has swallowed the news – threatening the funding of public-interest reporting and ushering in an era when everyone has their own facts. But the consequences go far beyond journalism.

Also in data journalism:

  • 77 | Polygraph and The Journalist Engineer Matt Daniels (July 01, Data Stories; Enrico Bertini, Moritz Stefaner and guest, Matt Daniels)
  • R in the data journalism workflow at FiveThirtyEight (July 12, Nathan Yau, Flowing Data blog, and Andrew Flowers, Five Thirty Eight)
  • Robot Sports Journalism: Is This The End Or A Fresh Start? (July 14, Vocativ, Joe Lemire)
  • [1607.03057] Learning from the News: Predicting Entity Popularity on Twitter (July 11, arXiv, Computer Science > Social and Information Networks; Pedro Saleiro, Carlos Soares)
  • What will the Internet of Things do to journalism? (July 14, Columbia University, Tow Center for Digital Journalism, Francesco Marconi)
  •  

    Gun Deaths In America

    FiveThirtyEight


    from July 13, 2016

    This interactive graphic is part of our project exploring the more than 33,000 annual gun deaths in America and what it would take to bring that number down.

     
    Events



    Artificial Intelligence And The Law



    Join the Wikimedia Foundation for a discussion on the intersection of the law and emerging technologies, such as driverless cars, web crawlers, and lethal autonomous weapons. Panelists will explore the legal challenges presented by these technologies in the areas of international law, employment, intellectual property, and tort liability.

    San Francisco, CA Tuesday, July 19 at Wikimedia Foundation (149 New Montgomery Street)

     

    2016 SAE Battelle CyberAuto Challenge



    The SAE Battelle CyberAuto challenge brings together students and engineers from different backgrounds, industries, and organizations to collaboratively seek new information on automotive cybersecurity.

    Warren, MI Monday-Friday, July 25-29.

     

    Science of Music Hackathon



    Presented in collaboration with HAMR: Hacking Audio and Music and the 17th International Society for Music Information Retrieval Conference: Science of Music Hackathon!

    New York, NY Friday-Saturday, August 5-6 at 45 W 18th St, 3rd Floor

     
    Deadlines



    Call For Papers – SocInfo’16

    deadline: subsection?

    Bellevue, WA The International Conference on Social Informatics (SocInfo16) is an interdisciplinary venue that brings together researchers from the computational and social sciences to help fill the gap between the two communities.

    The deadline for full paper submissions is Wednesday, July 20.

     

    Call for Papers – The Conference on Digital Experimentation @ MIT

    deadline: subsection?

    Cambridge, MA The purpose of the Conference on Digital Experimentation at MIT (CODE Conference) is to bring together leading researchers conducting and analyzing large scale randomized experiments in digitally mediated social and economic environments, in various scientific disciplines including economics, computer science and sociology, in order to lay the foundation for ongoing relationships and to build a lasting multidisciplinary research community.

    The deadline for paper submissions is Friday, August 12.

     
    Tools & Resources



    A Future for R: A Comprehensive Overview

    Henrik Bengtsson


    from June 25, 2016

    The purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user.

    In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly how and when futures are resolved depends on what strategy is used to evaluate them.

     

    New Connected Vehicle Data Environments from the Following Projects of the Dynamic Mobility Application (DMA) Program are Now Available in the Research Data Exchange

    Computing Community Consortium, CCC Blog


    from July 13, 2016

    The Research Data Exchange (RDE) is a web-based data resource provided by the USDOT Intelligent Transportation Systems (ITS) Program. It collects, manages, and provides access to archived and real-time multi-source and multi-modal data to support the development and testing of ITS applications.

     
    Careers



    Lead Product Designer at Fieldbook
     

    Fieldbook
     

    Leave a Comment

    Your email address will not be published.