NYU Data Science newsletter – August 5, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 5, 2015

GROUP CURATION: N/A

 
Data Science News



Top 10 Machine Learning APIs: AT&T Speech, IBM Watson, Google Prediction

ProgrammableWeb


from August 03, 2015

Machine learning is everywhere these days. It’s on your smartphone automatically classifying and organizing your photos. It’s in your email account filtering out spam and other emails you don’t want to read. It’s on Amazon.com recommending products and personalizing your online shopping experience. It’s in your connected car helping the voice-controlled interface understand you.

Right now, Amazon, Google, IBM, and Microsoft are the biggest players battling to dominate the very fast-growing machine learning cloud services market. IBM further strengthened its position in the market with the recent acquisition of AlchemyAPI, a leading deep learning-based machine learning services platform. Only time will tell which of these companies will succeed in capturing the lion’s share of the machine learning cloud services market.

The APIs that made it to our top 10 machine learning APIs list offer a wide range of capabilities including image tagging, face recognition, document classification, speech recognition, predictive modeling, sentiment analysis, and pattern recognition.

 

UW’s “Advanced Data Science” Ph.D. option launches!

UW CSE News


from August 04, 2015

The University of Washington’s Graduate School has approved the creation of a Ph.D. option in “Advanced Data Science” – an initiative of UW’s $2.8 million National Science Foundation IGERT (Integrative Graduate Education and Research) award in data science, led by UW CSE’s Magda Balazinska.

The goal of the option is not to educate all students in the foundations of data science, but rather to provide advanced education to the students who will push the state-of-the-art in data science methods in their domain – to educate the next generation of thought leaders who will both build and apply new methods of data science.

 

Data Management Threshold Concepts

Data Ab Initio


from August 04, 2015

We’ve been going through the new ACRL “Framework for Information Literacy for Higher Education” recently at work. This document discusses ways to teach students how to search and understand information resources, framing critical skills as “threshold concepts”. While the Framework itself is interesting, I’m really intrigued by the idea of a threshold concept and wonder if there are any threshold concepts for data management.

For those unfamiliar with the term, a “threshold concept” is an idea that, once understood, completely reframes the way you view a topic. It’s like seeing a hidden image in that it’s very difficult to un-see the image afterward. Threshold concepts are so fundament to understanding that it’s actually necessary to understand the concept in order to progress in the field.

 

How baseball’s tech team built the future of television

The Verge


from August 04, 2015

It was the first week of April, 2015, and New York’s Chelsea Market, typically packed with hordes of noisy tourists, was quiet. It was close to midnight, but five stories above, things were tense. The building is a former cookie factory, and the outlines of ovens still scar the brick walls. In their place, a vast array of screens are now mounted, each tuned to a live video feed. Joe Inzerillo and his team had their eyes glued to the glass, hard at work trying to wrangle the internet into doing something it was not built for.

The launch of HBO Now was just a few days away, but the product wasn’t finished. With both HBO and Apple promoting it heavily, the team had no idea how many people might tune in that first day. “The stakes were high,” says Inzerillo. “It was not lost on anybody here how important this was for our company, but also for this industry. Because if HBO Now had come out and face planted, there would be a lot of naysayers who turned around and said, ‘See, this is why the adults need to handle television.’”

 

My summer at hackNY

Medium, Yoni Nachmany


from August 03, 2015

10 weeks ago, when Chris Wiggins and Evan Korth bestowed the oral tradition of hackNY to 34 eager fellows, including me, I latched on to two comments as I scribbled the random walks of my mind in a Muji notebook.

Chris and Evan established hackNY in 2010 to get quantitative college students to more actively question potential career ambitions and paths, expanding beyond the status quo of Wall Street to consider New York’s emerging startup scene.

 

Machine Learning And Human Bias: An Uneasy Pair | TechCrunch

TechCrunch, Jason Baldridge


from August 02, 2015

“We’re watching you.” This was the warning that the Chicago Police Department gave to more than 400 people on its “Heat List.” The list, an attempt to identify the people most likely to commit violent crime in the city, was created with a predictive algorithm that focused on factors including, per the Chicago Tribune, “his or her acquaintances and their arrest histories – and whether any of those associates have been shot in the past.”

Algorithms like this obviously raise some uncomfortable questions. Who is on this list and why? Does it take race, gender, education and other personal factors into account? When the prison population of America is overwhelmingly Black and Latino males, would an algorithm based on relationships disproportionately target young men of color?

 

Making data count

Scientific Data


from August 04, 2015

Science is built on a foundation of data. The production and publication of that data should be recognized as valuable scholarship, but data lacks an essential prerequisite for modern-day scholarly recognition—accepted metrics of significance. The scientific community has traditionally estimated the impact of a journal article by counting the number of subsequent references to it; more recently, a suite of web-based alternative metrics (‘altmetrics’) offer faster assessment and the chance capture other kinds of impact1. Data can be fit into these article-centered assessment systems by proxy, via data descriptor articles in journals like Earth Systems Science Data or Scientific Data2,3. Another approach is to apply familiar metrics directly to datasets published in online databases or repositories. Complicating matters, the same metric may mean different things with respect to articles versus datasets. A researcher can read an article online closely without downloading the PDF version, but if they view a dataset landing page without downloading the data, their level of engagement is almost certainly low. A better understanding of how to measure data impact is critical if we are to reward data creators and incentivize data publication.

 

This is what Big Data really looks like: CERN, the universe and everything

WHICH 50


from August 03, 2015

So you think you know all about analytics? Patting yourself on the back about your latest cross-channel attribution modeling and the terabytes of data you’ve successfully corralled into a database? Time for a little perspective — because there’s big data and then there’s BIG data.

While brand managers the world over complain about the deluge of data they need to make sense of these days, data scientists at CERN are trying to solve the mysteries of the universe using facilities like the Large Hadron Collider (LHC), the world’s largest particle accelerator. Sifting through billions of data points from a fire hose measurable in terabytes per second, the data challenges faced by CERN’s physicists dwarf those of most commercial entities.

Which-50 interviewed Bob Jones, Project Leader at CERN, who is a driving force behind CERN’s information management expertise and was the Head of CERN openlab between January 2012 and December 2014.

 

Mike Franklin Interviewed on Spark, NoSQL and the Future of Data Management

UC Berkeley, AMPLab


from July 30, 2015

errick Harris recently did a wide-ranging interview with AMPLab Co-Director Mike Franklin, covering topics such as the roots of and keys to the success of the AMPLab, the role of open source in data management technology, what we learned from the NoSQL movement, and where things are going in the world of data management. He also provides some insights into the minds of the typical “Grumpy Old Database Person”.

 

Leave a Comment

Your email address will not be published.