NYU Data Science newsletter – April 27, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 27, 2015

GROUP CURATION: N/A

 
Data Science News



Homeland Security is laying roots in Silicon Valley, and you might not like its reasons

The Washington Post


from April 22, 2015

The Department of Homeland Security plans to open an office in California’s Silicon Valley to recruit talent from the technology sector and build relationships with the industry.

 

Data Science From Scratch: First Principles with Python

Joel Grus


from April 26, 2015

I am super-excited to announce that the book I’ve been working on for more than the last year, Data Science from Scratch: First Principles with Python is finally available! (buy from O’Reilly, use discount code AUTHD to save some money) (buy from Amazon).

My experience learning and teaching data science was that there were two primary paths:

  • The Math Path: “So you want to be a data scientist? Sure, the first thing you need to know is matrix decompositions. How well do you remember your measure theory?”
  • The Tools Path: “So you want to be a data scientist? Great, here’s the most important libraries to know. How well do you know R?”
  •  

    How Airbnb uses machine learning to detect host preferences

    Airbnb Engineering blog


    from April 14, 2015

    … What started as a small research project resulted in the development of a machine learning model that learns our hosts’ preferences for accommodation requests based on their past behavior. For each search query that a guest enters on Airbnb’s search engine, our model computes the likelihood that relevant hosts will want to accommodate the guest’s request. Then, we surface likely matches more prominently in the search results. In our A/B testing the model showed about a 3.75% increase in booking conversion, resulting in many more matches on Airbnb. In this blog post I outline the process that brought us to this model.

     

    How a Computer Can Help Your Doctor Better Diagnose Cancer

    Communications of the ACM


    from April 24, 2015

    … what if [your doctor] could instead plug your test results and medical history into a computer program that has crunched millions of pieces of similar data?

    That sort of future is looking increasingly possible thanks to researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

     

    Bar graphs criticized for misrepresenting data

    Nature News & Comment


    from April 24, 2015

    Dodgy data analysis has come under fire in recent months, and now an article in PLoS Biology1 takes aim at another staple of statistics. It argues that bar graphs used to describe a continuum of data are often uninformative and misleading, and should be purged from much of the scientific literature. The stance gained wide support from commenters on social media.

     

    Bot makes $2.4 million reading the Web: Meet the guy it cost a fortune.

    Slate


    from April 20, 2015

    On the afternoon of Friday, March 27, as several news outlets reported at the time, somebody apparently made $2.4 million from a tweet. That tweet was a bit of breaking news from Wall Street Journal writer Dana Mattioli.

     

    The Bot Bubble: Click Farms Have Inflated Social Media Currency | The New Republic

    The New Republic


    from April 20, 2015

    Every morning, Kim Casipong strolls past barbed wire, six dogs, and a watchman in order to get to her job in a pink apartment building decorated with ornate stonework in Lapu-Lapu City. The building towers above the slums surrounding it—houses made of scrap wood with muddy goat pens in place of yards. She is a pretty, milk-skinned, 17-year-old girl who loves the movie Frozen and whose favorite pastime is singing karaoke. She is on her way to do her part in bringing down Facebook.

     

    The Sensor-Rich, Data-Scooping Future – NYTimes.com

    The New York Times, Bits blog


    from April 26, 2015

    The question is, will this be good for the many, or the mighty few?

    Earlier this month, General Electric announced it was selling GE Capital, its financial arm. With less fanfare, G.E. also unveiled plans for computer-connected L.E.D. streetlights, so cities can collect and analyze performance data, for lower costs and better safety.

     

    Microsoft and NOAA Collaborate to Enable Data Access and Innovation Across Government and Industry

    Microsoft, Government blog


    from April 21, 2015

    … Microsoft will host weather, water, ocean, and climate data provided by NOAA scientists on the Microsoft Azure Government cloud platform. We will host such well-known NOAA datasets as the Global Forecast System, Global Ensemble Forecast System, and Climate Forecast System. We also look forward to working with NOAA and our partners in government and industry to identify and expose other datasets of interest.

    The NOAA CRADA takes a significant step toward realizing the promise of open data. Combined with the massive amount of data available from NOAA, it will allow many segments of government and private industry to take advantage of Azure Government to improve applications and develop new solutions for citizens and customers.

     

    Coming full circle with Bigtable and HBase – O’Reilly Radar

    O'Reilly Radar


    from April 23, 2015

    At least once a year, I sit down with Michael Stack, engineer at Cloudera, to get an update on Apache HBase and the annual user conference, HBasecon. Stack has a great perspective, as he has been part of HBase since its inception. As former project leader, he remains a key contributor and evangelist, and one of the organizers of HBasecon.

    In the beginning: Search and Bigtable

    During the latest episode of the O’Reilly Data Show Podcast, I decided to broaden our conversation to include the beginnings of the very popular Apache HBase project. Stack reminded me that in the early days much of the big data community in the SF Bay Area was centered around search technologies, such as HBase. [audio, 31:05]

     

    Leave a Comment

    Your email address will not be published.