NYU Data Science newsletter – August 17, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 17, 2016

GROUP CURATION: N/A

 
Data Science News



Chatbot lawyer that overturned 170,000 parking tickets now helps fight homelessness

Ars Technica UK, Tom Mendelsohn


from August 16, 2016

“The young British coder whose simple ‘robot lawyer’ chatbot has overturned more than 170,000 unlawful parking tickets in the UK and America has set his sights on solving homelessness by providing people with basic legal advice.”

 

The Perks and Perils of Interdisciplinary Research

The Social Science Research Council


from August 16, 2016

“In this essay, Erin Leahey discusses how interdisciplinary research affects academic careers, the visibility of research, and scholarly productivity. She also reports on an ongoing project that explores the ways in which universities support interdisciplinary work.”

More on practicing interdisciplinary research:

  • [1608.03251] Choosing Collaboration Partners. How Scientific Success in Physics Depends on Network Positions (August 10, arXiv, Physics > Physics and Society; Raphael H. Heiberger, Oliver J. Wieczorek)
  •  

    Industry insight: Fighting Cyber Fraud with Analytics

    AnalyticsVidhya, Dave Palmer


    from August 15, 2016

    Dave Palmer, CTO of Darktrace, a global leader in cyber threat defence believes that technological progress has propelled society in to a “golden age of criminality”. This is not hard to believe considering news headlines about ransomware, stolen credit cards, identity thefts and DDos attacks have become a daily occurrence. Given that public knowledge of being compromised by a cybercrime can be severely damaging to an organization’s brand, many breaches are swept under the rug or an agreement is reached with the perpetrators to avoid negative publicity. Although the majority of the media attention is devoted to attacks on the financial services sector, the truth is that energy, defense, technology, telecom and even the education sector are vulnerable.

     

    Facebook V: Predicting Check Ins, Winner’s Interview: 1st Place, Tom Van de Wiele

    Kaggle, no free hunch blog


    from August 16, 2016

    From May to July 2016, over one thousand Kagglers competed in Facebook’s fifth recruitment competition: Predicting Check-Ins. In this challenge, Kagglers were required to predict the most probable check-in locations occurring in artificial time and space. As the first place winner, Tom Van de Wiele, notes in this winner’s interview, the uniquely designed test dataset contained about one trillion place-observation combinations, posing a huge difficulty to competitors. Tom describes how he quickly rocketed from his first getting started competition on Kaggle to first place in Facebook V through his remarkable insight into data consisting only of x,y coordinates, time, and accuracy using k-nearest neighbors and XGBoost.

     

    How Uber Manages Drivers Without Technically Managing Drivers

    Fast Company


    from August 09, 2016

    If Uber exerts too much control over its drivers, it risks providing evidence to the many lawyers who accuse the company of treating its independent contractors as employees. But like any company, Uber also wants to provide great service to its customers, which usually involves telling workers what to do. A new case study details how the company has figured out how to walk a fine line by using its app and notifications as management tools.

    The study, which was recently published in the International Journal of Communication, was coauthored by Alex Rosenblat of the Data & Society Research Institute and Luke Stark of New York University. It looked at archival and real-time posts in five online forums for Uber drivers to investigate how ride-sharing apps and other “gig economy” companies nudge workers toward specific behaviors. This is known as “algorithmic management.”

     

    [1608.03251] Choosing Collaboration Partners. How Scientific Success in Physics Depends on Network Positions

    arXiv, Physics > Physics and Society; Raphael H. Heiberger, Oliver J. Wieczorek


    from August 10, 2016

    Physics is one of the most successful endeavors in science. Being a prototypic big science it also reflects the growing tendency for scientific collaborations. Utilizing 250,000 papers from ArXiv.org a prepublishing platform prevalent in Physics we construct large coauthorship networks to investigate how individual network positions influence scientific success. In this context, success is seen as getting a paper published in high impact journals of physical subdisciplines as compared to not getting it published at all or in rather peripheral journals only. To control the nested levels of authors and papers, and to consider the time elapsing between working paper and prominent journal publication we employ multilevel eventhistory models with various network measures as covariates. Our results show that the maintenance of even a moderate number of persistent ties is crucial for scientific success. Also, even with low volumes of social capital Physicists who occupy brokerage positions enhance their chances of articles in high impact journals significantly. Surprisingly, inter(sub)disciplinary collaborations decrease the probability of getting a paper published in specialized journals for almost all positions.

     

    Yoshua Bengio’s answer to What are the pros and cons of Generative Adversarial Networks vs Variational Autoencoders?

    Quora, Yoshua Bengio


    from August 16, 2016

    “An advantage for VAEs (Variational AutoEncoders) is that there is a clear and recognized way to evaluate the quality of the model (log-likelihood, either estimated by importance sampling or lower-bounded). Right now it’s not clear how to compare two GANs (Generative Adversarial Networks) or compare a GAN and other generative models except by visualizing samples.”

    “A disadvantage of VAEs is that, because of the injected noise and imperfect reconstruction, and with the standard decoder (with factorized output distribution), the generated samples are much more blurred than those coming from GANs.”

    More experts making sense of things:

  • Answer to How does Keras compare to other Deep Learning frameworks like Tensor Flow, Theano, or Torch? (August 15, Quora, Francois Chollet)
  • Introducing Variational Autoencoders (in Prose and Code) (August 12, Fast Forward Labs Blog, Miriam Shiffman)
  • Ian Goodfellow on Quora (August 12, Quora, Ian Goodfellow)
  • The Hardest Part About Microservices: Your Data (July 14, Christian Posta, Software Blog)
  •  

    Ranking Relevance in Yahoo Search

    KDD 2016, KDD Topics


    from August 15, 2016

    “In this paper, we give an overview of the solutions for relevance in the Yahoo search engine. We introduce three key techniques for base relevance – ranking functions, semantic matching features and query rewriting.” [Yahoo search engineers are giving away the merchandise before their store closes; download pdf for full text.]

     

    Reproducibility case studies

    GitHub – BIDS


    from August 19, 2016

    Justin Kitzes (jkitzes@berkeley.edu) is seeking case studies that provide concrete workflow examples that embody reproducible research. Berkeley Institute for Data Science (BIDS) plans to publish the case studies and credit authors accordingly while also collecting the studies in a GitHub repository. The procedural and formatting instructions for the case studies are rather specific, and project reviewers will appreciate your attention to detail.

    More upcoming reproducibility workshops:

  • Fall 2016 Center for Open Science Workshop (October 06 at University of Wisconsin-Madison)
  • 1-day Reproducibility Conference Coming to Columbia University December 2016! (December 09 at Columbia University)
  •  

    New data science research shows how we manage our long-term happiness

    Imperial College London


    from August 16, 2016

    Most theories of motivation have championed the pleasure principle, where our choices of daily activities aim to maximize our short-term happiness. However, it was not clear to researchers how to reconcile this idea with the fact that we all have to engage routinely in unpleasant, yet necessary activities.

    To address this question a team of researchers, including an Imperial academic, developed a smartphone application to monitor in real-time the activities and moods of approximately 30,000 people.

     

    Use Big Data to Create Value for Customers, Not Just Target Them

    Harvard Business Review, Niraj Dawar


    from August 16, 2016

    Big data holds out big promises for marketing. Notably, it pledges to answer two of the most vexing questions that have stymied marketers since they started selling: 1) who buys what when and at what price? and 2) can we link what consumers hear, read, and view to what they buy and consume?

    Answering these makes marketing more efficient by improving targeting and by identifying and eliminating the famed half of the marketing budget that is wasted. To address these questions, marketers have trained their big-data telescopes at a single point: predicting each customer’s next transaction. In pursuit of this prize marketers strive to paint an ever more detailed portrait of each consumer, memorizing her media preferences, scrutinizing her shopping habits, and cataloging her interests, aspirations and desires. The result is a detailed, high-resolution close-up of each customer that reveals her next move.

    But in the rush to uncover and target the next transaction, many industries are quickly coming up against a disquieting reality: Winning the next transaction eventually yields only short term tactical advantage, and it overlooks one big and inevitable outcome.

     
    Events



    Babson College Hockey Analytics Conference “Analytics on Ice: The Long Change”



    Babson Park, MA Saturday, October 1, starting at 9 a.m. [$$]
     

    Fall 2016 Center for Open Science Workshop



    Madison, WI Please join us for a workshop, hosted by the Center for Open Science, to learn easy, practical steps researchers can take to increase the reproducibility of their work. The workshop will be hands-on. Using example studies, attendees will actively participate in creating a reproducible project from start to finish. — Thursday, October 6
     
    CDS News



    NSF Award Search: Award#1637108 – RIDIR: Collaborative Research: Computational and Historical Resources on Nations and Organizations for the Social Sciences (CHRONOS)

    National Science Foundation


    from August 11, 2016

    “The project will demonstrate how computational techniques can aid both qualitative and quantitative social science research on a range of areas of major public interest, expanding knowledge about terrorism, intelligence, international trade and aid.”

     
    Tools & Resources



    Predictive Algorithms to Track Your Health Data In Real Time

    Algorithmia


    from August 15, 2016

    We’ve shown how to use predictive algorithms to track economic development. In this tutorial, we’re going to build a real-time health dashboard for tracking a person’s blood pressure readings, do time series analysis, and then graph the trends over time using predictive algorithms. This tutorial is the starting point for creating your own personal health dashboard using time series algorithms and predictive APIs.

    We’ll be creating this dashboard in Python, using the Withings API for our data, the Forecast and Simple Moving Average microservices from Algorithmia, and Plotly to graph the data.

     

    Gensim LDA: Tips and Tricks

    Mining the Details blog, Chris Tufts


    from August 12, 2016

    Gensim is an easy to implement, fast, and efficient tool for topic modeling. The purpose of this post is to share a few of the things I’ve learned while trying to implement Latent Dirichlet Allocation (LDA) on different corpora of varying sizes. This post is not meant to be a full tutorial on LDA in Gensim, but as a supplement to help navigate around any issues you may run into. If you are getting started with Gensim, or just need a refresher, I would suggest taking a look at their excellent documentation and tutorials.

     

    Avoiding Over-Engineering

    Stitch Fix Technology – Multithreaded blog


    from August 15, 2016

    What does “over-engineering” even mean? A basic definition might be “doing more work than is necessary to solve the problem at hand.” … There are two key points here: necessary and problem at hand.

     
    Careers


    Career Advice

    The 1-hour workday
     

    Science, Working Life
     

    I Hire Engineers At Google–Here’s What I Look For (And Why)
     

    Fast Company
     

    Leave a Comment

    Your email address will not be published.