NYU Data Science newsletter – July 1, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 1, 2015

GROUP CURATION: N/A

 
Data Science News



Graphs in the world: Modeling systems as networks

O'Reilly Radar, Russell Jurney


from June 30, 2015

Networks of all kinds drive the modern world. You can build a network from nearly any kind of data set, which is probably why network structures characterize some aspects of most phenomenon. And yet, many people can’t see the networks underlying different systems. In this post, we’re going to survey a series of networks that model different systems in order to understand different ways networks help us understand the world around us.

We’ll explore how to see, extract, and create value with networks. We’ll look at four examples where I used networks to model different phenomenon, starting with startup ecosystems and ending in network-driven marketing.

 

Exploring SparkR

Blag's bag of rants


from June 29, 2015

A colleague from work, asked me to investigate about Spark and R. So the most obvious thing to was to investigate about SparkR -;)

I installed Scala, Hadoop, Spark and SparkR…not sure Hadoop is needed for this…but I wanted to have the full picture -:)

Anyway…I came across a piece of code that reads lines from a file and count how many lines have a “a” and how many lines have a “b”…

 

d3/d3-scale · GitHub

GitHub, d3


from June 30, 2015

Scales are a convenient abstraction for a fundamental task in visualization: mapping a dimension of abstract data to a visual representation. Although most often used for position-encoding quantitative data, such as mapping the height in meters of a sample population to the height in pixels of bars in a bar chart, scales can represent virtually any visual encoding, such as diverging colors, stroke widths, or symbol size. Scales can also be used with virtually any type of data, such as named categorical data or discrete data that needs sensible breaks.

For continuous quantitative data, you often want a linear scale. (For time series data, a time scale.)

 

What Big Data Analytics Professionals Want From IT

Forbes, Meta S. Brown


from June 27, 2015

Big Data is where the paying jobs are. An update posted this week by Forbes Contributor Louis Columbus indicates that the “The [median] advertised salary for technical professionals with Big Data expertise is $104,850 net of bonuses and additional compensation.” And the demand is significant. He points out that IBM alone advertised more than 2300 such jobs in the past 12 months.

Many of these positions are in IT. That’s great for IT people, right? Not necessarily.

 

Thoughts on Time-series Databases

jmoiron.net


from June 30, 2015

Preetam “recently” blogged about catena, a time-series metric store. There was another blog post about benchmarking boltdb by a Fog Creek engineer, also looking to write a time series database. This is something of a pattern in the Go community, which already boasts seriesly, InfluxDB, and prometheus; there are almost certainly others.

Time series data has been de rigueur at least since the Etsy’s seminal blog post on StatsD, though in reality that was just an inflection point. Time series modeling and graphing predates computer systems, but they have been a popular way of tracking and visualizing systems and networking data since at least the early 90s with MRTG. A few factors are converging now to make these kinds of systems more important: “Big Data” is getting much, much bigger; virtualization and containerization has increased the number of independent “nodes” for a typical distributed application; and the economies of the cloud have put the brakes on the types of performance increases typically attributed to “Moore’s Law.”

This topic is relevant to my work at Datadog, and I’ve been thinking about it for a long time now.

 

Bigger, Cheaper, Faster Data in the Cloud

The New York Times, Bits blog


from June 30, 2015

The kind of data science that worked for President Obama is coming to cloud computing, and working at a speed faster than the president ever saw.

A company called Civis Analytics, financed by a group of data wranglers who worked on Mr. Obama’s 2012 campaign, is announcing on Wednesday a comprehensive set of big data tools available through Amazon Web Services.

 

Big Data and Recruitment

SmartData Collective


from June 30, 2015

… Taking on a new employee represents a huge investment for most companies, particularly in a managerial or professional role. A large proportion (40% to 60% by most estimates) of a company’s revenue goes on staff salaries. So in an age where everything can be measured, quantified and analyzed, it makes sense to put a bit more planning and strategic thought into the recruitment process.

Those hoping to rely on the “old school tie network” may be out of luck.

 

As Data Streams Grow, Lawyers Look for New Ways to Find Facts | Big Law Business

Bloomberg BNA


from June 30, 2015

Earlier this month, a thunderstorm rolled over the mountains near Park City, Utah forcing Bennett B. Borden, a Drinker Biddle & Reath partner, who had been hiking in the area, to seek shelter for several hours in a nearby resort’s lodge.

Borden co-chairs Drinker Biddle’s eDiscovery practice group, a position that often requires supervising dozens of contract attorneys who comb through emails and other documents in search of relevant evidence.

But increasingly, U.S. courts are allowing lawyers to use new technology and software platforms during eDiscovery that can drastically reduce the number of documents that need review by humans.

 

Harvard Medical School makes biomedical informatics a full-scale department

Harvard Magazine


from April 15, 2015

Anticipating “radical transformations” in medicine in coming decades, the dean of Harvard Medical School (HMS) has authorized a full-scale department of biomedical informatics, effective July 1. Jeffrey Flier’s move recognizes the growing importance of data in the healthcare professions, and, he said, builds on the school’s “outstanding record of achievement” in the field. Henderson professor of pediatrics and health sciences and technology Isaac “Zak” Kohane will chair the new department. Since 2005, he has co-directed HMS’s Center for Biomedical Informatics (CBMI); five of its associates will become the department’s first core faculty members, and Kohane has committed to recruit 10 more colleagues during the next five to seven years.

 

Zuck Details How Artificial Intelligence Will Show You The Perfect Facebook Posts

TechCrunch


from June 30, 2015

What’s Facebook working on in its three AI labs? A way to understand exactly what you share so it can serve that content to people with matching interests. Today in a public Q&A on his Facebook Page, CEO Mark Zuckerberg gave a closer look at why Facebook is investing in AI, as well as detailing his philosophy on happiness, exercise and the future of the company.

 

Interview Questions for Data Scientist Positions (Part II) — Medium

Medium, Ahmed El Deeb


from June 30, 2015

In a previous post, I argued for the need of a different kind of interview questions for data science and machine learning engineers. I then listed some questions I thought good for gauging data science knowledge and cleverness. I extend the list here with some more question.

 

Announcing the R Consortium

Revolution Analytics, Revolutions blog, Qin Wenfeng


from June 30, 2015

The R community has grown explosively over the past few years, both in terms of the number of R users and the number of companies who rely on R as their data science platform. To serve the needs of this rapidly growing community, and to continue the success of the R Project as a whole, representatives from the R Foundation and from industry have joined forces to create the R Consortium, a new collaborative project of the Linux Foundation.

The R Consortium is a 501(c)6 non-profit organization dedicated to the support and growth of the R user community. The R Consortium will work with and provide support to the R Foundation and other organizations developing, maintaining and distributing R software, and provide a unifying framework for the R user community.

 

Leave a Comment

Your email address will not be published.