NYU Data Science newsletter – September 30, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 30, 2016

GROUP CURATION: N/A

Data Science News

Headline:

Tensorflow Ruby API

Medium, Arafat Khan

from September 27, 2016

“I developed a simple gem tensorflow.rb and started with making use of SWIG.”

Headline:

Google Translate Star Leaves Venter’s Human Longevity For Illumina-Backed Grail

Forbes, Luke Timmerman

from September 27, 2016

Franz Och, a star computer scientist known as the chief architect of Google Translate, has left his job as chief data scientist at J. Craig Venter’s Human Longevity Inc. to join Grail, the ambitious pan-cancer screening company co-founded by Illumina.

Headline:

Watch: Highlights of the Microsoft Data Science Summit

Microsoft, Revolutions blog, David Smith

from September 29, 2016

This was the first year for this new conference, and it was a blast: the energy from the 1,000 attendees was palpable. I covered Joseph Sirosh’s keynote presentation yesterday, but today I wanted to highlight a few other talks from the program now that the recordings are available to stream.

Headline:

Robot eyes are catching up with our exploding cosmos

New Scientist, Colin Stuart

from September 28, 2016

The night sky is anything but tranquil, and a new generation of robot telescopes is getting a handle on the cornucopia of fleeting flashes and flare-ups.

Headline:

Mapbox’s New Mentorship Program to Help Smart Cities

CityLab, Linda Poon

from September 29, 2016

Harnessing the power of open data is key to developing the smart cities of the future. But not all governments have the capacity—be that funding or human capital—to collect all the necessary information and turn it into a tool. That’s where Mapbox comes in.

Mapbox offers open-source mapping platforms, and is no stranger to turning complex data into visualizations cities can use, whether it’s mapping traffic fatalities in the U.S. or the conditions of streets in Washington, D.C., during last year’s East Coast blizzard.

Headline:

Top Data Scientist Claudia Perlich on Biggest Issues in Data Science

KDnuggets, Claudia Perlich

from September 29, 2016

“The by far biggest issue I see is data science solving irrelevant problems. This is a huge waste of time and energy. The reason is typically that whoever has the problem is lacking data science understanding to even express the issue and data scientists end up solving whatever they understood might be be the problem, ultimately creating a solution that is not really helpful (and often far too complicated).”

Headline:

FACT SHEET: Data by the People, for the People — Eight Years of Progress Opening Government Data to Spur Innovation, Opportunity, & Economic Growth

The White House, Office of the Press Secretary

from September 28, 2016

The Administration is releasing a fact sheet of key accomplishments over the past eight years in opening government data to spur innovation, opportunity, and economic growth. The release of this list comes in conjunction with today’s White House Open Data Innovation Summit.

Headline:

Data wranglers wanted: new Data Task Force to improve synthesis outcomes

Gordon and Betty Moore Foundation, National Center for Ecological Analysis and Synthesis

from September 28, 2016

The National Center for Ecological Analysis and Synthesis (NCEAS) is launching an innovative “Data Task Force” to support synthesis research with a new $2.1 million grant from the Gordon and Betty Moore Foundation. In contrast to mining Big Data, synthesis research recognizes the value of small, information rich datasets generated by scientists all over the world. However, collecting, combining, and “wrangling” these many discrete data sets for synthesis presents a daunting challenge.

Headline:

The next milestone in Microsoft’s AI journey Read more at http://blogs.microsoft.com/blog/2016/09/29/the-next-milestone-in-microsofts-journey/#EW68BroSlxFsWC74.99

Official Microsoft Blog, Harry Shum

from September 29, 2016

Microsoft announced a new group that will help accelerate this evolution. It will include Microsoft Research along with the company’s Information Platform Group, Bing and Cortana product groups, and our Ambient Computing and Robotics teams. The combined group will include more than 5,000 computer scientists and engineers.

Headline:

Open Data as a Moving Target: What Does it Take to Allow Reuse? | CODATA Blog

CODATA Blog, Irene Pasquetto

from September 22, 2016

As we all know too well, making all scientific data technically and legally accessible to img_20160913_133322all researchers is an ambitious task complicated by constantly evolving social and technical barriers. It is fair to say that we are making progresses in this direction. At Scidatacon 2016, we examined several concrete solutions that can facilitate openness of scientific data or, if you prefer, make sure data are FAIR (findable, accessible, interoperable and reusable).

However, it seems that the more we learn about how to make data open, the least we know about how exactly data will be reused by the scientific community, which means by the researchers who generated the data and should have a primary interest in accessing it.

Events

MARTA Hackathon Tickets

Atlanta, GA Friday-Saturday, October 28-29, at The Garage under Tech Square [free]

Accepted Papers – SocInfo’16

Bellevue, WA Monday-Thursday, November 14-17 [$$$]

Deadlines

Applying to the Combine

deadline: Contest/Award

Application Deadline: November 10. Current faculty and students at any accredited university located in New York City are eligible to apply to this business plan competition for media technologies in NYC.

Tools & Resources

My quest to responsive visualizations with Julia

juliabloggers, Simon Danisch

from September 28, 2016

GLVisualize, a library for fast and interactive graphics written in Julia.

Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research

Google Research Blog, Sudheendra Vijayanarasimhan and Paul Natsev

from September 28, 2016

We are excited to announce the release of YouTube-8M, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities. This represents a significant increase in scale and diversity compared to existing video datasets. For example, Sports-1M, the largest existing labeled video dataset we are aware of, has around 1 million YouTube videos and 500 sports-specific classes–YouTube-8M represents nearly an order of magnitude increase in both number of videos and classes.

How to Avoid the Anti-Pattern in Analytics: Three Keys for Machine Learning

Research Triangle Institute, RTInsights, Kai Waehner

from September 28, 2016

When a different analytic model is used in training versus deployment, results can be disastrous. Here’s how to avoid the anti-pattern.

Building & Maintaining a Master Data Dictionary: A Two Part Series – The Data Point

RJMetrics, The Data Point blog, Akash Agrawal

from September 28, 2016

A central repository of true and agreed upon definitions of key metrics means your goals will have the same definition across teams, your conversations with investors will be consistent, and your analytical cycle time will be reduced.

Apache Kafka on Heroku

Heroku, Rand Fitzpatrick

from September 28, 2016

“Kafka is the industry-leading open source solution for delivering data-intensive applications and managing event streams, and is backed by an active open source community and ecosystem. Now combined with Heroku’s developer experience and operational expertise.”

Careers

Tenured and tenure track faculty positions

Tenure-Track Assistant Professors and Hess Faculty Fellow in Computer Science

Wellesley College; Wellesley, MA

Sports.BradStenger.com

NYU Data Science newsletter – September 30, 2016

Leave a Comment Cancel reply