NYU Data Science newsletter – August 22, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 22, 2016

GROUP CURATION: N/A

 
Data Science News



Quantifying decision making for data science: from data acquisition to modeling

EPJ Data Science; Saurabh Nagrecha and Nitesh V Chawla


from August 20, 2016

Organizations, irrespective of their size and type, are increasingly becoming data-driven or aspire to become data-driven. There is a rush to quantify value of their own internal data or the value of integrating their internal data with external data, and performing modeling on such data. A question that analytics teams often grapple with is whether to acquire more data or expend additional effort on more complex modeling, or both. If these decisions can be quantified a priori, it can be used to guide budget and investment decisions. To that end, we quantify the Net Present Value (NPV) of the tasks of additional data acquisition or more complex modeling, which are critical to the data science process. We develop a framework, NPVModel, for a comparative analysis of various external data acquisition and in-house model development scenarios using NPVs of costs and returns as a measure of feasibility. We then demonstrate the effectiveness of NPVModel in prescribing strategies for various scenarios. Our framework not only acts as a suggestion engine, but it also provides valuable insights into budgeting and roadmap planning for Big Data ventures. [full text]

 

CrimeRadar app uses advanced machine learning to predict crime rates

Wired UK


from August 18, 2016

It may sound like something from the Minority Report, but this app can predict where crimes will take place. The software, called CrimeRadar, has just launched its prototype in the Olympic host city of Rio de Janeiro.

The app uses advanced machine learning to predict crime rates in the city’s neighbourhoods at different times of the day and night.

 

Automating genetic analysis helps keep up with rapid discovery of new diseases

Stanford Medicine, News Center


from August 15, 2016

Stanford researchers are devising ways to have computers help perform some of the intensive genetic analysis now performed manually when scientists study a patient’s genome to diagnose a disease.

 

Telling Data Stories: An Interview with CIA’s Dr. Calvin Andrus

Central Intelligence Agency


from August 16, 2016

Dr. Calvin Andrus—a 30+ year CIA veteran—has made a career of discovering patterns in data. From covering European politics around the fall of the Berlin Wall to pioneering cloud computing at the Agency, Calvin has used his subject matter and technical expertise to help answer pressing national security questions. Calvin was one of the first winners of the Director of National Intelligence’s Galileo Awards—an annual intelligence community competition on workplace innovation—and is the first CIA employee who has been invited to TED.

We sat down with Calvin to learn more about his work and the role of data science at the Agency. Here is an edited and condensed version of that conversation.

 

A new research trend? DARPA wants to fund XAI (Explainable Artificial Intelligence)

McGill University, Newsroom


from August 18, 2016

It’s nice to see that more transparency, less black-boxing is in store: DARPA’s technology could be “making life or death decisions”. New funding is available for “research in Explainable AI, that is, AI that will output an answer and the criteria that led the system to its decision.”

 

98 personal data points that Facebook uses to target ads to you

The Washington Post, The Intersect blog


from August 19, 2016

Facebook, in its omniscience, knows that you’re wondering — and it would like to reassure you. The social network just revamped its ad preference settings to make them significantly easier for users to understand. They’ve also launched a new ad education portal, which explains, in general terms, how Facebook targets ads.

“We want the ads people see on Facebook to be interesting, useful and relevant,” a Facebook spokesperson said.

But it remains to be seen whether users are pleased or frightened by the new information they suddenly have.

 

Strengthening the Ocean Health Index by Mapping Sources of Uncertainty

UC Santa Barbara, National Center for Ecological Analysis and Synthesis


from August 17, 2016

Researchers at the Ocean Health Index distill massive amounts of data into scores evaluating marine health for 220 countries and territorial jurisdictions — and ultimately, for the globe as a whole. These scores, also called environmental indicators, help scientists clearly and simply communicate information about issues like overfishing and water quality to broad audiences. This simplicity, however, comes at a cost: indicators do not reflect uncertainty. In a recent PLOS ONE paper, NCEAS’ Ocean Health Index team evaluated potential influence of missing data on the uncertainty around the scores from the latest global ocean health assessment.

 

Chip Technology Keeps Checked Luggage in Check

The New York Times


from August 21, 2016

On a recent Southwest Airlines flight from Baltimore, Shvilla Rasheem arrived in Indianapolis, but her luggage did not.

Ms. Rasheem, a 34-year-old consultant, said she always checks her bag when flying on Southwest because there is no fee. “I never thought of the possibility that I would not get my luggage,” she said.

She had good reason not to worry. Statistics compiled by Sita, an aviation technology company, show a steadily decreasing likelihood of bags going astray.

 

How adware invades your computer and steals your personal data

The Week, Vocativ


from August 20, 2016

It’s happened to everyone: You go to download some kind of new software, and before you do, a long list of terms and conditions in tiny font pops up. You don’t read a word and hit “agree.” What you might not realize is that along with the program you think you’re getting, you’re downloading a bit of unwanted adware, too.

So what does that unwelcome guest do? Researchers from New York University and Google teamed up to figure out just how adware operates on a user’s computer.

[ADWARE BUNDLE TK]

  • ()
  • ()
  • ()
  •  

    NYU Study Looks at How Racial Inequality is Produced Online

    NYU News


    from August 15, 2016

    Internet users tend to navigate between websites in a racially segregated way, despite pathways that provide equitable access to different sites, finds a new study by NYU’s Steinhardt School of Culture, Education, and Human Development. The findings are published online in the journal Information, Communication, and Society.

     

    Stanford scientists combine satellite data, machine learning to map poverty

    Stanford News


    from August 18, 2016

    Accurate and reliable information on the location of impoverished zones is surprisingly lacking for much of the world. Applying machine learning to satellite images could identify impoverished regions in Africa.

     

    23andMe Crowdsources Customers’ Genetic Data To Find Depression

    KQED Future of You


    from August 18, 2016

    A study from the consumer genetic-testing company 23andMe and pharmaceutical giant Pfizer has found 15 new DNA regions associated with mutations that could predispose individuals to major depression.
    The study was huge, looking at the DNA of over 300,000 people. Pfizer and other companies have been lining up to gain access to 23andMe’s trove of genetic data.

    The study was huge: Researchers took the DNA of over 75,000 people who reported being clinically diagnosed with depression and compared it with more than 230,000 customers who reported no such diagnosis.

     
    Events



    Kaizen Data Conference



    San Francisco, CA “Kaizen Data is an applied data science conference focused on data analytics, processing, management, visualization and machine learning.” — Friday-Saturday, September 16-17, at the Galvanize SOMA campus. [$$$]

     

    NIPS Annual Meeting 2016



    Barcelona, Spain NIPS 2016 Program workshops listed for Friday-Saturday, December 9-10.
     
    Tools & Resources



    Rehearsal: A Configuration Verification Tool for Puppet

    ACM SIGPLAN; Rian Shambaugh, Aaron Weiss, and Arjun Guha


    from May 17, 2016

    Large-scale data centers and cloud computing have turned system configuration into a challenging problem. Several widely-publicized outages have been blamed not on software bugs, but on configuration bugs. To cope, thousands of organizations use system configuration languages to manage their computing infrastructure. Of these, Puppet is the most widely used with thousands of paying customers and many more open-source users. The heart of Puppet is a domain-specific language that describes the state of a system. Puppet already performs some basic static checks, but they only prevent a narrow range of errors. Furthermore, testing is ineffective because many errors are only triggered under specific machine states that are difficult to predict and reproduce. With several examples, we show that a key problem with Puppet is that configurations can be non-deterministic.

    This paper presents Rehearsal, a verification tool for Puppet configurations.

     

    Overview — opveclib

    Hewlett Packard


    from August 18, 2016

    “The Operator Vectorization Library, or OVL, is a python library for defining high performance custom operators for the TensorFlow platform. OVL enables TensorFlow users to easily write, test, and use custom operators in pure python without sacrificing performance. This circumvents the productivity bottleneck of implementing, building, and linking custom C++ and CUDA operators or propagating them through the Eigen code base.”

     

    Introducing OpenStreetView

    OpenStreetMap


    from August 15, 2016

    After almost a year of thinking, development and testing, the OSM team at Telenav is ready to present OpenStreetView to all OSM mappers! OpenStreetview (OSV) is the free and open street level imagery platform designed 100% with OSM and mappers in mind.

     
    Careers


    Full-time positions outside academia

    Sports Intelligence Programme Manager
     

    UK’s High Performance Sport System; London, England
     

    Leave a Comment

    Your email address will not be published.