NYU Data Science newsletter – August 11, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 11, 2015

GROUP CURATION: N/A

Data Science News

Please join us in welcoming Jose Manuel Magallanes as a Senior Data Science Fellow to the eScience Institute!

Facebook, UW eScience Institute

from August 10, 2015

Jose Manuel Magallanes, Professor of Political Science and Public Policy in the Department of Social Sciences at Pontificia Universidad Catolica del Peru, will be coming to the University of Washington (UW) this Fall for a 2-year appointment as a Visiting Professor. He will be sponsored by the eScience Institute thanks to the Washington Research Fund, the Alfred P. Sloan Foundation, and the Gordon and Betty Moore Foundation. He will be a Senior Data Science Fellow at eScience and will join the Evans School of Public Policy and Governance at UW as a Visiting Professor.

Q&A with Jonathan Pillow on dissecting the brain using math and neuroscience

Princeton University, News at Princeton

from August 10, 2015

The brain is the ultimate big-data problem. Its billions of neurons give rise to numerous abilities, such as making decisions, interpreting color and even recognizing your best friend. Jonathan Pillow, a Princeton University assistant professor of psychology and the Princeton Neuroscience Institute, aims to understand the brain by using math and statistics to make sense of the reams of information collected by brain-imaging studies. Recently arrived from the University of Texas-Austin, Pillow, who also is affiliated with the University’s Center for Statistics and Machine Learning, sat down to talk about how he got into neuroscience, his approach to teaching, and his latest research published earlier this month in the journal Science.

Combating the Life Science Data Avalanche

R&D Magazine

from August 10, 2015

Big data has become a growing issue in science, as these data sets are so large and complex that traditional data processing applications are inadequate. This is especially true for the life science industry, where the growing size of data hasn’t been met with tools for analyzing and interpreting this data at the same rate, leading to what many call a “data avalanche.”

Life science researchers are seeing more next-generation sequencing data generation, more samples and deeper sequencing, and this data is increasing in complexity as researchers move from targeted panels to whole-exome and whole-genome sequencing. However, the tools appearing in the market often fail to incorporate system-level interpretation, leading to further issues.

In enters bioinformatics, an interdisciplinary field—including computer science, statistics, mathematics and engineering—which develops methods and software tools for understanding biological data.

How Google Invented an Amazing Datacenter Network Only They Could Create

High Scalability

from August 07, 2015

Google with justly earned pride recently announced:

Today at the 2015 Open Network Summit, we are revealing for the first time the details of five generations of our in-house network technology. From Firehose, our first in-house datacenter network, ten years ago to our latest-generation Jupiter network, we’ve increased the capacity of a single datacenter network more than 100x. Our current generation — Jupiter fabrics — can deliver more than 1 Petabit/sec of total bisection bandwidth. To put this in perspective, such capacity would be enough for 100,000 servers to exchange information at 10Gb/s each, enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.

Google’s datacenter network is the magic behind what makes much of Google really work. But what is “bisectional bandwidth” and why does it matter? We talked about bisectional bandwidth a while back in Changing Architectures: New Datacenter Networks Will Set Your Code And Data Free. In short, bisectional bandwidth refers to the networks Google servers use to talk to each other.

How Many Scientists Does It Take to Write a Paper? Apparently, Thousands – WSJ

Wall Street Journal

from August 10, 2015

A Frenchman named Georges Aad may have the most prominent name in particle physics.

In less than a decade, Dr. Aad, who lives in Marseilles, France, has appeared as the lead author on 458 scientific papers. Nobody knows just how many scientists it may take to screw in a light bulb, but it took 5,154 researchers to write one physics paper earlier this year—likely a record—and Dr. Aad led the list.

His scientific renown is a tribute to alphabetical order.

Almost every paper by “G. Aad et al.” involves so many researchers that they decided to always list themselves in alphabetical order. Their recent paper, published in the journal Physical Review Letters, features 24 pages of alphabetized co-authors led by Dr. Aad. There is no way to tell how important each contributor might be.

Baidu’s ‘Medical Robot’: Chinese Search Engine Reveals Its AI for Health

Re/code

from August 09, 2015

… [Wei Fan’s] project, called AskADoctor in English, is one of the earliest to emerge from Baidu’s deep-learning division since it hired Andrew Ng, a renowned data scientist and former marquee researcher at Google. And it’s an example of the unique tech interface the company can produce given its privileged access to the world’s biggest nation, which has kept Silicon Valley giants at arm’s length.

The initiative is also another sign of the broader industry trend of tech firms storming into medical sciences with their artificial intelligence guns drawn. Earlier this week, IBM announced plans to acquire medical imaging company Merge Health, turning its data over to IBM’s supercomputers. Google, while not fully public about its medical programs, has similar ambitions. Apple has its wearable health strategy.

Google Research Boosts Deep Learning Detection with GPUs

The Platform

from August 10, 2015

Although GPUs have found a wider market in high performance computing in recent years, an emerging market is exploding in deep learning and computer vision with companies like Baidu and others continuing to push the processing speed and complexity envelope. While they are not always at the fore of general processing for Google, Microsoft, and other search giants, GPUs are fertile ground for research into new algorithms and approaches to machine learning and deep neural networks.

At the Nvidia GPU Technology Conference earlier in the year there were several examples of how GPUs were being used in neural network training and for near-real time execution of complex machine learning algorithms in natural language processing, image recognition, and rapid video analysis. The compelling angle was that all of these applications were pushing into the ever-increasing need for real-time recognition and output.

Outside of providing web-based services (for instance, automatically tagging images or picking out semantic understanding from video) the real use cases for how GPUs will power real-time services off the web are still developing. Pedestrian detection is one of those areas where, when powered by truly accurate and real-time capabilities, could mean an entirely new wave of potential services around surveillance, traffic systems, driverless cars, and beyond.

Turning your R (or Python) models into APIs

Jo-fai Chow, Blend it like a Bayesian! blog

from August 09, 2015

More and more real-world systems are relying on data science and analytical models to deliver sophisticated functionality or improved user experiences. For example, Microsoft combined the power of advanced predictive models and web services to develop the real-time voice translation feature in Skype. Facebook and Google continuously improve their deep learning models for better face recognition features in their photo service.

Some have characterised this trend as a shift from Software-as-a-Service (SaaS) to an era of Models-as-a-Service (MaaS). These models are often written in statistical programming languages (e.g., R, Python), which are especially well suited to analytical tasks.

With analytical models playing an increasingly important role in real-world systems, and with more models being developed in R and Python, we need powerful ways of turning these models into APIs for others to consume.

How one journalist found hidden code in a Google report and turned it into a story

Online Journalism Blog

from August 10, 2015

Sylvia Tippmann wasn’t looking for a story. In fact, she was working on a way that Google could improve the way that it handled ‘right to be forgotten‘ processes, when she stumbled across some information that she suspected the search giant hadn’t intended to make public.

Two weeks ago The Guardian in the UK and Correct!v in Germany published the story of the leaked data, which was then widely picked up by the business and technology press: Google had accidentally revealed details on hundreds of thousands of ‘right to be forgotten’ requests, providing a rare insight into the controversial law and raising concerns over the corporation’s role in judging requests.

But it was the way that Tippmann stumbled across the story that fascinated me: a combination of tech savvy, a desire to speed up work processes, and a strong nose for news that often characterise data journalists’ reporting. So I wanted to tell it here.

Introduction to Machine Learning with Python: Sarah Guido, Andreas Mueller: 9781449369415: Amazon.com: Books

Amazon.com Books, O'Reilly Publishing

from August 10, 2015

Many Python developers are curious about what machine learning is and how it can be concretely applied to solve issues faced in businesses handling medium to large amount of data. Machine Learning with Python teaches you the basics of machine learning and provides a thorough hands-on understanding of the subject.

You’ll learn important machine learning concepts and algorithms, when to use them, and how to use them. The book will cover a machine learning workflow: data preprocessing and working with data, training algorithms, evaluating results, and implementing those algorithms into a production-level system.

Available now for pre-sale. Publication: January 2016.

Sports.BradStenger.com

NYU Data Science newsletter – August 11, 2015

Leave a Comment Cancel reply