NYU Data Science newsletter – May 3, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 3, 2016

GROUP CURATION: N/A

 
Data Science News



How New York Startups Like Bitly Are Taking On Bigger Platforms

Bloomberg


from April 25, 2016

For Madison Avenue advertising agencies, giant platforms like Facebook and Google are natural partners. But smaller tech companies that gather user data are rising in value and could disrupt the giants.

“The $1 billion opportunity is to take ownership of customer experience away from the [competing] platform,” says Bitly CEO Mark Josephson. “Marketers have all the money, that’s why it’ll happen in New York and not [Silicon] Valley.” Josephson was being interviewed by Bloomberg Television’s Vonnie Quinn at the Cornell Tech @ Bloomberg series in New York.

 

Wearables data becoming a marketer’s goldmine?

ReadWrite


from May 01, 2016

Wearables are still a niche market for marketers, but in the future the data procured from smartwatches might be much more valuable than what is currently available from laptop and mobile users.

In a survey conducted by Ovum for Criteo, marketers gave their thoughts on what data would be the most valuable in the future.

Daily routine and precision location data were top of the survey at 38 and 37 percent, respectively.

 

Can an Online Game Help Create a Better Test For TB?

KQED Science, NPR Shots


from May 02, 2016

Though it’s the world’s top infectious killer, tuberculosis is surprisingly tricky to diagnose. Scientists think that video gamers can help them create a better diagnostic test.

Players of a Web-based game called EteRNA will try to design a sensor molecule that could potentially make diagnosing TB as easy as taking a home pregnancy test. The TB puzzle marks the launch of “EteRNA Medicine.”

 

Yuliy Sannikov Wins John Bates Clark Young Economist Award

Bloomberg


from April 29, 2016

Yuliy Sannikov, a professor at Princeton University, won the John Bates Clark young economist award.

Sannikov was honored for theoretical work that has expanded the ability of economists to analyze a whole range of issues from the design of securities to collusion in markets, the Nashville, Tennessee-based American Economic Association said Friday in a statement on its website.

 

The Big Spenders on R&D

Bloomberg View, Justin Fox


from April 29, 2016

Amazon.com and Facebook had great earnings reports this week. They also reported spending a lot of money on the future. Capital spending was up 35 percent at Amazon and 125 percent at Facebook in the first quarter compared with the same quarter last year. Research and development spending was up 28 percent at Amazon and 26 percent at Facebook.

In an age when much of corporate America seems focused on buying back shares, cutting costs and meeting quarterly earnings targets, Jeff Bezos and Mark Zuckerberg are thinking big, and long-term — and it seems to be working out for them.

 

[1604.02096] Dynamics of beneficial epidemics

arXiv, Physics > Physics and Society; Andrew Berdahl et al.


from April 07, 2016

Pathogens can spread epidemically through populations. Beneficial contagions, such as viruses that enhance host survival or technological innovations that improve quality of life, also have the potential to spread epidemically. How do the dynamics of beneficial biological and social epidemics differ from those of detrimental epidemics? We investigate this question using three theoretical approaches as well as an empirical analysis of concept propagation. First, in evolutionary models, we show that a beneficial horizontally-transmissible element, such as viral DNA, spreads super-exponentially through a population, substantially more quickly than a beneficial mutation. Second, in an epidemiological social network approach, we show that infections that cause increased connectivity lead to faster-than-exponential fixation in the population. Third, in a sociological model with strategic rewiring, we find that preferences for increased global infection accelerate spread and produce super-exponential fixation rates, while preferences for local assortativity halt epidemics by disconnecting the infected from the susceptible. Finally, in an investigation of the Google Ngram corpus, we find that new words and phrases spread super-exponentially, as anticipated by our models. We conclude that the dynamics of beneficial biological and social epidemics are characterized by the remarkably rapid spread of beneficial elements, which can be facilitated in biological systems by horizontal transmission and in social systems by active spreading strategies of infected individuals.

 

Seeing double: Duplicated images plague research papers

STAT


from April 29, 2016

Four in 100 doesn’t seem like a big number, and in many instances it isn’t very impressive. After all, if Steph Curry sank only 4 percent of his three-pointers, the Golden State Warriors wouldn’t have set a record this year for most wins in an NBA season.

But 4 percent might trigger alarm bells in science publishing. In this case, the number is the share of published papers in biomedical fields containing at least one inappropriately duplicated image — which means that scientific results are being misrepresented to appear better than they truly are.

The finding appears in a new analysis by Elisabeth Bik, a microbiologist at Stanford University, and coauthors Ferric Fang and Arturo Casadevall. The last two have been leading figures in research integrity for some time, having contributed to a paper in 2012 that first identified misconduct as the reason for two-thirds of retractions of scientific articles.

Bik and her colleagues’ undertaking was impressive and massive. They analyzed, by hand, more than 20,600 articles that had appeared in 40 science journals between 1995 and 2014. To focus their search, they looked for three types of issues: “simple” duplications, in which researchers used the same images to represent different experiments; “duplication with repositioning,” in which authors rotated or otherwise moved an image to make it appear new in a subsequent publication; and “duplication with alteration,” in which scientists tinkered with elements of duplicated images to generate new figures.

 

Artistic Style Transfer For Videos Graphics…

prosthetic knowledge


from May 02, 2016


Graphics research from the University of Freiburg Computer Vision group have adapted the artistic Style Transfer method for video / moving image to great effect.

 
Events



A Week of R at Iowa State



For a whole week we will be teaching different aspects of the statistical software R. Monday is for everybody, who always wanted to get started with R, but couldn’t find the time. Tuesday is a day of graphics. Wednesday we will be doing some data wrangling. On Thursday morning, we have a DIY web application session with shiny. In the afternoon, we will finish our series with a cool new module on webscraping of data!

 

Attend In-Memory Computing Summit, May 23-24, San Francisco



The second annual In-Memory Computing Summit is May 23-24, 2016 at the Grand Hyatt San Francisco. This conference is the only industry-wide event of its kind, tailored to in-memory computing related technologies and solutions. The IMC Summit is a multi-track conference that brings together in-memory computing visionaries, decision makers, experts and developers for the purpose of education, discussion and networking. We are pleased to announce that the full conference schedule is now available. For more information or to register, visit the conference website.

Monday-Tuesday, May 23-24 in San Francisco

 

JSM Diversity Workshop & Mentoring Program



The 2016 JSM Diversity Workshop and Mentoring Program brings historically-underrepresented minority (African/African-American, Hispanic/Latino, and Native American) graduate and undergraduate students, post-doctoral scholars, and junior professionals together with senior-level statisticians and faculty in academia, government, and the private sector in a structured program at the annual Joint Statistical Meetings. This year in Chicago, we host both a Workshop and a Mentoring Program as separate but related activities.

Chicago, IL. Sunday, July 31, at JSM in Chicago (McCormick Place)

 
Tools & Resources



When Documents Become Databases – Tabulizer R Wrapper for Tabula PDF Table Extractor

Tony Hirst, OUseful.Info, the blog…


from May 02, 2016

Although not necessarily the best way of publishing data, data tables in PDF documents can often be extracted quite easily, particularly if the tables are regular and the cell contents reasonably space.

 

Force-directed graph layout using velocity Verlet integration.

GitHub – d3


from April 30, 2016

This module implements a velocity Verlet numerical integrator for simulating physical forces on particles. The simulation is simplified: it assumes a constant unit time step t = 1 for each step, and a constant unit mass m = 1 for all particles. As a result, a force F acting on a particle is equivalent to a constant acceleration a over the time interval t, and can be simulated simply by adding to the particle’s velocity, which is then added to the particle’s position.

In the domain of information visualization, physical simulations are useful for studying networks and hierarchies!

Also:

  • D3v4 Constraint-Based Layout (April 29, bl.ocks.org, Elijah Meeks)
  • Parallel Graph Analytics (May 1, Communications of the ACM; Andrew Lenharth, Donald Nguyen, Keshav Pingali )
  •  

    Sentosa – An Open Source High Frequency Automatic Algorithmic Trading System and Research Platform

    Wu Fuheng Henry


    from March 15, 2016

    In recent years, online trading platform also becomes a hot spot of financial engineering innovation. Many financial technology companies, have raised considerable funds from Wall Street. … Some of these platforms are beautifully designed and very user friendly. But when you backtest your strategies, they are actually running on the servers, hence totally transparent to the company. To avoid the risk of exposing the strategies, it is safer to do research in local machine and trade through reliable brokers or DMA. In addition, in the online platforms, data are transferred in Internet with HTTP protocol, which may be OK for low frequency trading but not efficient or feasible for high frequency trading.

    Sentosa is named after the most popular island resort in Singapore. The languages I used to write Sentosa includes C++, Python, R, Go and Javascript. The project is hosted at www.Quant365.com, where you can download source code and follow all the updates.

     

    Visualizing Machine Learning with Plotly and Domino

    Domino, Chelsea Douglas


    from April 26, 2016

    I recently had the chance to team up with Domino Data Lab to produce a webinar that demonstrated how to use Plotly to create data visualizations inside of Domino notebooks. In this post, I’ll share a few of the benefits that I discovered while using Plotly and Domino together.

    Plotly is a web-based data visualization platform for data scientists and engineers. The engine behind our platform is plotly.js, an open source charting library built on D3.js and stack.gl. We have APIs for R, Python, and MATLAB to make it easy for data scientists to work in the programming language of their choice, and accessible for teams that work in multiple languages. This works really well with Domino’s platform, where you can create notebooks in a variety of languages.

     
    Careers



    Graphiq – Data Scientist (Data team)
     

    Graphiq
     

    Catenus: Data Science Apprenticeship Program
     

    KDnuggets
     

    Leave a Comment

    Your email address will not be published.