NYU Data Science newsletter – May 7, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 7, 2015

GROUP CURATION: N/A

 
Data Science News



Research Blog: Paper to Digital in 200+ languages

Google Research Blog


from May 06, 2015

Many of the world’s important sources of information – books, newspapers, magazines, pamphlets, and historical documents – are not digital. Unlike digital documents, these paper-based sources of information are difficult to search through or edit, or worse, completely inaccessible to some people. Part of the solution is scanning, getting a digital image of the page, but raw image pixels aren’t yet recognized as textual content from the computer’s point of view.

Optical Character Recognition (OCR) technology aims to turn pictures of text into computer text that can be indexed, searched, and edited. For some time, Google Drive has provided OCR capabilities. Recently, we expanded this state-of-the-art technology to support all of the world’s major languages – that’s over 200 languages in more than 25 writing systems.

 

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.

jwills/exhibit · GitHub


from May 06, 2015

Exhibit is an evolving collection of various projects for executing SQL against things that look like tiny database tables, including: 1. Hive arrays of structs; 2. Collections of Avro and Thrift records; 3. Arrays of BSON objects from MongoDB

 

Can’t Find the Good Stuff on Periscope? Maybe AI Can Help | WIRED

WIRED, Business


from May 05, 2015

Periscope turns anyone with an iPhone into a video broadcaster. And people are paying attention. After Twitter acquired the company earlier this year, the app bolted ahead of competitors in the Apple App Store, bringing the livestream into the mainstream. Many people used Periscope feeds to watch the big Paquiao-Mayweather fight this past Saturday night.

The problem lies in finding the feeds you most want to watch. Most Periscope feeds are just people speaking directly into the camera about whatever happens to be on their mind. If you want the big fight or live streams from the riots in Baltimore or backstage footage from the sets of popular television shows, you have to hunt pretty hard.

But a New York-based startup called Dextro just launched a service designed to help you find the good stuff. It’s called Stream, and it uses machine learning algorithms to automatically categorize videos

 

Computing Needs a Reboot

EE Times


from May 05, 2015

An estimated $200 million in economic activity was lost in New York City during the blizzard of 2015. One reason that snowstorm was not predicted correctly that got lost in the debates and finger pointing: computers are not getting faster.

 

Collaborating to improve efficiency of the cloud in the increasingly connected world of big data

TechNet Blogs, Inside Microsoft Research


from May 06, 2015

… Microsoft researchers joined collaborators from multiple universities this week at the annual USENIX Symposium on Networked Systems Design and Implementation. Their goal: To recommend solutions that push the architectural boundaries of network services.

“The efficient management and operation of networks and data centers is Microsoft’s core strength and priority,” said Victor Bahl, a Microsoft distinguished scientist. “These papers represent the best in systems research, a product of close collaboration between Microsoft researchers, engineers and our colleagues in academia, anticipating and taking care of important issues well before they become problems.”

 

Chief data officer: Insight into a crucial role for the exabyte age

TechRepublic


from May 06, 2015

We’ve published interviews with the UK’s first CDO, Chicago’s first CDO, and the Commerce Dept.’s first CDO. This feature combines their insights with other CDOs to provide a roadmap for the position.

 

Neural network chip built using memristors

Ars Technica


from May 06, 2015

… a team of researchers from Stony Brook University and the University of California Santa Barbara appear to have solved this problem. And they’ve done it in a fairly simple fashion: a systematic trial-and-error search.

Memristors can be made of metal oxides (titanium dioxide is a favorite material). They work because the current influences the number of places in the material where oxygen atoms are missing, which in turn influences resistance.

In this case, the authors used a combination of aluminum oxide and titanium dioxide (Al2O3 and TIO2) to form a memristor. They started with “an exhaustive experimental search over a range of titanium dioxide compositions and layer thicknesses (from 5 nm to 100 nm)” and then paired that with similar adjustments to the thickness of aluminum oxide. The titanium dioxide layer influenced how readily a memristor could be generated at the desired locations, while the aluminum oxide layer influenced the consistency and strength of its operation.

 

The Upshot uses geolocation to push readers deeper into data

Nieman Journalism Lab


from May 04, 2015

The New York Times story changes its text depending on where you’re reading it: “It’s a fine line between a smarter default and being creepy.”

 

Leave a Comment

Your email address will not be published.