NYU Data Science newsletter – August 10, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 10, 2015

GROUP CURATION: N/A

 
Data Science News



Predictive Analytics and Data Science: Same or Different?

Information Management Blogs


from August 06, 2015

My company, Inquidia Consulting, is currently engaged in/completing several predictive analytics and data science projects. While we distinguish PA from DS, there’s often not a hard dividing line between the two with our customers. Indeed, though we demur, some now consider data science to be any application of statistical methods to business problems.

 

World Economic Forum Tech Pioneers & Analytics Winners

KDnuggets


from August 08, 2015

You do not need to be told of the growing importance of machine intelligence for solving complex analytic problems– you see it every day. Still, it helps to take stock of just how far that progression has come in a few short years.

WEFOne of those opportunities occurred on Wednesday when the World Economic Forum announced their class of Technology Pioneers for 2015. This year’s class is, as previous Tech Pioneer classes, was divided into IT, Life Sciences & Health, and Energy/Environment/Infrastructure.

What was notable, however, was the pervasive theme of advanced analytics, for extracting knowledge from data across this class. About 20% of the companies selected as Technology Pioneers talk about themselves in terms of analytics, machine learning, artificial intelligence and virtually every company uses advanced analytics as a key ingredient to their success.

 

Data Science for Social Good: Improving Service to Homeless Familes

UW eScience Institute


from August 05, 2015

We are living in an age where data plays a role in almost everything we do. While the power of big data is already being harnessed in science and technology, the question remains how to bring the same disruptive impact to public policy and social good. Companies like Microsoft, Google and Facebook are using massive amounts of user data to create products that engage and entertain users (and to find the most effective advertisement to display to them). In a variety of scientific fields, new measurement devices are producing larger and larger quantities of data about everything from remote galaxies to our own DNA, accelerating our progress towards a better understanding of the universe. Some have even gone so far as to say that data is “unreasonably effective.” But how does one use data to promote social good? How does one harness the lessons learned in analyzing data from the internet, or data from scientific measurements, to address a social problem as challenging and complex as family homelessness?

This summer, the University of Washington’s eScience Institute is hosting the first installment of a Data Science For Social Good program to address this question.

 

Composing Music With Recurrent Neural Networks

Daniel Johnson, hexahedria blog


from August 03, 2015

It’s hard not to be blown away by the surprising power of neural networks these days. With enough training, so called “deep neural networks”, with many nodes and hidden layers, can do impressively well on modeling and predicting all kinds of data. (If you don’t know what I’m talking about, I recommend reading about recurrent character-level language models, Google Deep Dream, and neural Turing machines. Very cool stuff!) Now seems like as good a time as ever to experiment with what a neural network can do.

For a while now, I’ve been floating around vague ideas about writing a program to compose music. My original idea was based on a fractal decomposition of time and some sort of repetition mechanism, but after reading more about neural networks, I decided that they would be a better fit. So a few weeks ago, I got to work designing my network. And after training for a while, I am happy to report remarkable success!

 

Generating Genre Fiction with Deep Learning

Yves Peirsman


from August 09, 2015

These days Deep Learning is everywhere. Neural networks are used for just about every task in Natural Language Processing — from named entity recognition to sentiment analysis and machine translation. A few months ago, Andrej Karpathy, PhD student at Stanford University, released a small software package for automatically generating texts with a recurrent neural network. I wanted to find out how it performs when it is asked to generate genre fiction, such as fantasy or chick lit.

 

Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot

PLOS One


from December 26, 2014

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.

 

Statistical technique helps cancer researchers understand tumor makeup, personalize care

EurekAlert! Science News, American Statistical Association


from August 09, 2015

A new statistical method for analyzing next-generation sequencing (NGS) data that helps researchers study the genome of various organisms such as human tumors and could help bring about personalized cancer treatments was presented today at a session of the 2015 Joint Statistical Meetings (JSM 2015) in Seattle.

Yuan Ji, director of the Program for Computational Genomics and Medicine Research Institute at NorthShore University HealthSystem and associate professor of biostatistics at The University of Chicago, described the new technique–called Bayesian feature allocation models–during a presentation titled “Bayesian Models for Heterogeneity in Human Cancers.”

 

Artificial Intelligence Is Already Weirdly Inhuman

Nautilus


from August 06, 2015

Nineteen stories up in a Brooklyn office tower, the view from Manuela Veloso’s office—azure skies, New York Harbor, the Statue of Liberty—is exhilarating. But right now we only have eyes for the nondescript windows below us in the tower across the street.

In their panes, we can see chairs, desks, lamps, and papers. They don’t look quite right, though, because they aren’t really there. The genuine objects are in a building on our side of the street—likely the one where we’re standing. A bright afternoon sun has lit them up, briefly turning the facing windows into mirrors. We see office bric-a-brac that looks ghostly and luminous, floating free of gravity.

Veloso, a professor of computer science and robotics at Carnegie Mellon University, and I have been talking about what machines perceive and how they “think”—a subject not nearly as straightforward as I had expected. “How would a robot figure that out?” she says about the illusion in the windows. “That is the kind of thing that is hard for them.”

 

Privacy and accuracy: How Cynthia Dwork is making data analysis better – Next at Microsoft

Microsoft Research, Next at Microsoft blog


from August 07, 2015

Cynthia Dwork has spent much of her career working on ways to ensure that your personal data stay private even when it is being used for scientific research.

Now, she’s also applying those mathematical methods to making certain that the conclusions researchers draw from analyzing big data sets are as accurate as possible.

Dwork, a cryptographer and distinguished scientist at Microsoft Research, and several colleagues recently published a paper in Science magazine showing how their groundbreaking work on differential privacy also can help researchers guarantee the accuracy of their results.

 

Microsoft Cambridge chief to head Turing Institute

Business Weekly, UK


from August 06, 2015

Andrew Blake, who heads up Microsoft Research in Cambridge, has agreed to become the inaugural director of the exciting new Alan Turing Institute. Professor Chris Bishop will succeed him at Microsoft, Prof. Blake tells Business Weekly.

 

Leave a Comment

Your email address will not be published.