NYU Data Science newsletter – September 26, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 26, 2016

GROUP CURATION: N/A

 
 
Data Science News



Headline:


The importance of data strategy for startups

Stelios Kampakis, The Data Scientist bog


from September 24, 2016

Data strategy is the one thing many companies need, especially startups, but they don’t know it.

So, what is a data strategy? How can we define it?

A data strategy is defined as the strategy around the collection, storage and usage of a data, in a way that data can serve not only the purpose behind the selling point a startup, but also open up additional potential monetisation avenues in the future.


Headline:


How Experian is turning big data into big dollars

The San Diego Union-Tribune


from September 25, 2016

At Experian DataLabs in Carmel Valley, a team of scientists is thwarting bad guys with math.

A top-five U.S. credit card issuer recently dumped about 6 billion transaction records on Experian DataLabs to see if its fancy machine learning mathematical formulas could do a better job of rooting out credit card fraud than the bank’s existing system.

Experian scientists used neuro-embedding/natural language processing techniques to understand the “syntax” of the credit card data, said Honghao Shan, a Ph.D. computer scientist.


Headline:


[1609.06982] The fundamental diagram of urbanization

arXiv, Physics > Physics and Society; Giulia Carra, Marc Barthelemy


from September 22, 2016

The recent availability of geolocalized historical data allows to address quantitatively spatial features of the time evolution of urban areas. Here, we discuss how the number of buildings evolves with population and we show on different datasets (Chicago, 1930?2010; London, 1900?2015; New York City, 1790?2013; Paris, 1861?2011) that this curve evolves in a ‘universal’ way with three distinct phases. After an initial pre-urbanization phase, the first phase is a rapid growth of the number of buildings versus population. In a second regime, where residences are converted into offices and stores, the population decreases while the number of buildings stays approximatively constant. In another subsequent — modern — phase, the number of buildings and the population grow again and correspond to a re-densification of cities. We propose a simple model based on these simple mechanisms to explain the first two regimes and show that it is in excellent agreement with empirical observations. These results bring evidences for the possibility of constructing a simple model that could serve as a tool for understanding quantitatively urbanization and the future evolution of cities.


Headline:


Here’s what 82 fintech pitches say about the future of Wall Street

Business Insider, Tina Wadhwa


from September 22, 2016

At Finovate Fall 2016 earlier this month, 1,600 people gathered over two days in midtown Manhattan to discuss the future of finance. It was the 10th anniversary of the conference, and this was its biggest event to date, eight times as large as the first Finovate conference, in 2007.

Seventy-two startups demoed their products at Finovate, in real time, to an audience of investors, bankers, payment companies, and potential customers. After listening to almost nine hours of bright-eyed startup founders talk about the future of the finance, it struck me that they all had one thing in common.

These startups weren’t trying to take the place of established players; they were looking to sell to them.


Headline:


Human factors limit smart cities more than technology

ReadWrite, Donal Power


from September 25, 2016

While the latest smart gizmo tends to grab headlines, industry experts are urging urban leaders to focus more on smart city challenges with their citizens, rather than the technology. That’s according to attendees at the recent VERGE 16 conference in Santa Clara, Calif. where leaders in the smart cities space gathered.

A key sentiment that emerged from the conference was that leaders in government and industry need to stay focused on the larger smart city picture and not get caught up in the latest gee-whiz technology.

Specifically, there needs to be greater focus on meshing emerging tech with the current political and economic systems that affect citizens.


Checking in on major Data Science initiatives

  • The AI Now Report – The Social and Economic Implications of AI Technologies in the Near-Term (The White House and NYU’s Information Law Institute)
  • The White House published a factsheet of 8 years of progress on Opening Government Data (Office of the Press Secretary, The White House)
  • Mid-term Evaluation of the Data-Driven Discovery Initiative: Part II (Medium, Moore Data, Chris Mentzel) plus a new announcement of $2.1 m to fund a Data Task Force at NCEAS
  • White House puts real-time climate data online for city planners (Reuters, Graphiq)

  • Headline:


    Measuring exposure to air pollution using mobile phone data

    Rafael Pereira, Urban Demographics blog


    from September 13, 2016

    Tijs Neutens and colleagues have a new paper where they use mobile phone data to assess people’s exposure to air pollution in Belgium in high spatio-temporal resolution.


    White House puts real-time climate data online for city planners

    Reuters, Graphiq

    A new website powered by the White House and partners including Google and Microsoft, aims to pull real-time climate data from NASA satellites and other sources for planners preparing for the effects of climate change, its founders said on Thursday.


    Headline:


    Tweet densities of #rstats in the US

    RPubs, Winston Saunders


    from September 22, 2016

    The question “what’s the best city for data science?” was asked on the Sept “Not So Standard Deviations” podcast. To inject some analysis in the discussion, I used the twitteR package to measure interest in R by computing the “flux” of tweets with the #rstats hashtag.

    The top metro areas are New York, Boston, and the SF Bay area, with a tweet flux of about 50 #rstat tweets per million residents per day (“twipermipeds”). Other leading cities include Long Beach, Washington DC, Seattle, Raleigh NC, and Henderson NV.


    Headline:


    Poll Aggregation Fight

    PredictWise, David M Rothschild


    from September 25, 2016

    Last week I got into a little Twitter fight with Nat Silver. I tweeted that I am concerned about the FiveThirtyEight forecast, both how much volatility it has and how predictable the forecast is. Here is the picture I tweeted:


    He responded with this tweet: “Never seen otherwise-smart people in so much denial about something as they are about Trump’s chances. Same mistake as primaries, Brexit.” Then he went on a little tweet storm. I then responded. First, the short answer, is that FiveThirtyEight is claiming a toss-up, while I am claiming that Clinton has a small, but meaningful lead. 70-75% for the last 2 weeks.


    NYC + LA open data news; plus graphs of building density in Chicago, New York, Paris & London

  • Praise and Recommendations for New York City’s Open Data Progress (Gotham Gazette, Samar Khurshid)
  • Using open data to learn about land use in Los Angeles (September 26, Stamen Design, Eric Rodenbeck and Alan McConchie)
  • [1609.06982] The fundamental diagram of urbanization (arXiv, Physics > Physics and Society; Giulia Carra, Marc Barthelemy)
  • Mapbox’s New Mentorship Program to Help Smart Cities (CityLab, Linda Poon)

  • Headline:


    Rajat Monga – Session on Sep 28, 2016

    Quora, Rajat Monga


    from September 28, 2016

    What is your question for Rajat Monga?

     
    Events



    Dear Data | Giorgia Lupi & Debbie Millman | Design and Style Series



    New York, NY Saturday, 1 October 2016, 2:30 – 4:30, Mid-Manhattan Library, The Corner Room [free]

    Smarter Crowdsourcing: Zika



    Online In a series of weekly, 2-hour web-based conferences, experts from around the world will discuss innovative and practical ways to tackle the root causes of Zika and mosquito-borne diseases. [free]

    Data Natives 2016: Building A Professional Network



    Berlin, Germany Wednesday-Friday, 26-28 October 2016.
     
    Deadlines



    World Bank Big Data Innovation Challenge

    deadline: Contest/Award

    Join us as we launch a global call to find big data solutions that address issues pertaining to two critical challenge areas: 1) Food: food security and nutrition 2) Landscapes: forestry and watersheds. Deadline for applications is Thursday, November 17.


    Real-Time Crime Forecasting Challenge

    deadline: Contest/Award

    This Challenge will offer a comprehensive comparative analysis between current “off-the-shelf” crime forecasting products used by many police departments and more innovative forecasting methods used by other scientific disciplines. Deadline for submissions is Tuesday, February 28.

     
    NYU Center for Data Science News



    Headline:


    The Global Impact of Open Data [book]

    O'Reilly Media


    from September 23, 2016

    Authors Andrew Young and Stefaan Verhulst, both with The GovLab at New York University, explain how these projects have made governments more accountable and efficient, helped policymakers find solutions to previously intractable public problems, created new economic opportunities, and empowered citizens through new forms of social mobilization.

     
    Tools & Resources



    Which tool should I use?

    Brandon Rohrer, Data Science and Robots Blog


    from August 24, 2016

    As with machine learning algorithms, no tool is inherently better than any other. Each is suited to different circumstances. One may be faster, but the other has a simpler interface. One may be more flexible, but the other is easier to learn. Here’s where the work you did choosing a specific goal really pays off. It will tell you what is most important. The trade-off between performance and ease of use is a common one. Fast and opaque or slow and intuitive? If your goal is to perform a set of calculations for a monthly report, speed may be less important than explainability of the approach and maintainability of the code. But if your goal is to do real-time algorithmic trading on the New York Stock Exchange, then extra speed will be well worth any amount of inscrutable code.

    If you don’t yet know what the strengths and weaknesses of your options are, that’s a fine place to invest some research time. Search the Internet. Read the ranting emails from your co-worker about the pain being inflicted on them by the latest version of the library you’re considering. Chat with your hacker friends. Don’t implicitly trust any one source or website. Collect a small data set and look for themes.


    Software That Writes And Evolves Software

    Medium, The Composition, Rod Johnson


    from September 22, 2016

    At Atomist, we thought long and hard about what works in code generation and what doesn’t. And we believe we’ve found a better way, which can be applied to many developer technologies. … Traditional code generators offer a one way trip. They create a project, allowing basic parameterization (such as project name, base package name or dependencies) and then their contribution ends. The generation mechanism is often opaque: For example, a mix of static and template content and code in the generator itself. Our first attempt at project generation worked like this.

     
    Careers


    Tenured and tenure track faculty positions

    Assistant or Associate Member – Computational Biology



    Fred Hutchinson Cancer Research Center; Seattle, WA

    Assistant/Associate Professor; STS Data Studies



    University of California-Davis; Davis, CA

    Leave a Comment

    Your email address will not be published.