NYU Data Science newsletter – July 29, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 29, 2015

GROUP CURATION: N/A

Data Science News

Database expert on why NoSQL mattered — and SQL still matters — > S C A L E — Medium

Medium, S C A L E, Derrick Harris

from July 01, 2015

The first part of this interview with University of California Berkeley professor and AMPLab co-director Michael Franklin focused on the creation of AMPLab and some of the important big data projects that have emerged from it.

Part 2 focuses on database technology, where Franklin spent much of his career. Aside from studying it and teaching it, Franklin also started a database company called Truviso, which Cisco acquired in 2012.

GitXiv.com – Collaborative Open Computer Science.

reddit.com/r/MachineLearning

from July 28, 2015

Reddit weighs in on GitXiv, a new platform for collaborating on Open Source computer science

Analyzing time series data with stream processing and machine learning | The Big Data Hub

IBM Big Data Hub

from July 21, 2015

Real-time detection and classification of signals or events present in time series data is a fairly common need. Stereotypical examples include identifying high-risk conditions in ICU data streams or classifying signals present in acoustic data from diagnostic or monitoring sensors. Using a combination of stream processing and machine learning is an agile and highly capable approach. It can effectively scale to large, fast data streams and adapt to evolving problem spaces.

Can Police Use Data Science to Prevent Deadly Encounters? – Scientific American

Scientific American

from July 22, 2015

Real-time detection and classification of signals or events present in time series data is a fairly common need. Stereotypical examples include identifying high-risk conditions in ICU data streams or classifying signals present in acoustic data from diagnostic or monitoring sensors. Using a combination of stream processing and machine learning is an agile and highly capable approach. It can effectively scale to large, fast data streams and adapt to evolving problem spaces.

WPI to offer doctorate in data science

WBJournal

from July 24, 2015

Worcester Polytechnic Institute will offer an interdisciplinary doctoral program in data science starting in the fall, a year after it launched a master’s program in the growing field of study.

A statement from WPI said the new program will be the first in the United States. It draws on the expertise of WPI faculty in business, computer science and mathematics, and is designed to meet a “growing demand for highly trained scientists with trans-disciplinary technical and scientific expertise who can solve problems at the cutting edge of big-data analytics,” the statement read.

Learning Seattle’s Work Habits from Bicycle Counts (Updated!)

Jake VanderPlas, Pythonic Perambulations

from July 23, 2015

Last year I wrote a blog post examining trends in Seattle bicycling and how they relate to weather, daylight, day of the week, and other factors.

Here I want to revisit the same data from a different perspective: rather than making assumptions in order to build models that might describe the data, I’ll instead wipe the slate clean and ask what information we can extract from the data themselves, without reliance on any model assumptions. In other words, where the previous post examined the data using a supervised machine learning approach for data modeling, this post will examine the data using an unsupervised learning approach for data exploration.

How machine learning will revolutionize the mobile experience

Android Authority

from July 23, 2015

The rise of the smartphone is a serious boost for machine learning because it’s producing an enormous amount of useful data that can be mined, analyzed, and used to make predictions.

Determinism in an Uncertain World — Takeaways from the Data Science Handbook — Medium

Medium, Strikingly

from July 28, 2015

The Data Science Handbook (DSH) is a compilation of in-depth interviews with 25 leading data scientists who share their remarkable insights, stories, and advice. Some of them, like DJ Patil and Hilary Mason, were part of the trailblazing wave of data scientists who catapulted the field into national attention. What mindsets and techniques make a great data scientist? What does it take to build an effective team; what lies in the future of data science?

We had the pleasure of speaking with authors Carl Shan, Henry Wang, William Chen and Max Song?—?and then, handing the pen over to them. Today they share with us their thoughts and reflections on DSH, as well as their outlook and bets on the future of data science, hailed several years ago by the Harvard Business Review as the “sexiest job of the 21st century”.

Internet of Things – a game changer in Insurance industry! | The Big Data Institute

The Big Data Institute

from July 29, 2015

The “Internet of things” (IoT) is becoming an increasingly growing topic in industry forums and boardrooms. It’s a concept that not only has the potential to impact how business interact with consumer but also how consumer interacts with business and business to business. Gartner research reveals the Internet of Things (IoT) as one of the most active areas for innovation in many industries. IoT is a game changer for insurers that embrace and position itself to leverage it. These leaders stand to realize concrete business benefits including increased operating efficiencies, improved customer experience, better risk management, more profitable underwriting / pricing, fast / more accurate claims payout, and accelerated innovation in the traditional insurance transactions. There are challenges ahead, but the payoff is expected to be substantial for both the insurers and their customers.

The Geomblog: The 2nd Workshop on Fairness, Accuracy and Transparency in Machine Learning: A review

Suresh Venkatasubramanian, The Geomblog

from July 28, 2015

I was one of the organizers of the 2nd workshop on Fairness, Accuracy and Transparency in Machine Learning (FATML) at ICML 2015, and in my alternate career as moderator of data mining panels, I moderated the closing panel. The panelists were Fernando Diaz from MSR New York, Sorelle Friedler from Haverford College, Mykola Pechenizkiy from Eindhoven Instt. of Technology and Hanna Wallach from UMass-Amherst and MSR.

While my original intent was to do a review of the panel, it became clear that the panel discussion touched on themes that were bubbling up throughout the day. So what follows is organized by panel questions, but weaves in discussion from outside the panel as well.

The Guy Who Taught AI to ‘Remember’ Is Launching a Startup

WIRED, Business

from July 28, 2015

… Recurrent neural nets, or RNNs, can not only recognize complex moving images, but automatically generate detailed captions for online photos and videos, improve online services that translate from one language to another, and more. They’re pushing into companies like Facebook and Baidu as well as Google, and in recent weeks, this burgeoning technology received another shot in the arm with the arrival of a new startup called Nnaisense.

According to the company’s website, Nnaisense was founded by Jürgen Schmidhuber, a key figure in the development of modern RNNs, and four researchers who work alongside him at the Swiss AI lab called IDSIA (Istituto Dalle Molle di Studi sull’Intelligenza Artificiale). The German-born Schmidhuber helped create a breed of recurrent neural net called LSTM, or Long Short Term Memory, and his work has influenced the latest AI research at the likes of Google, Microsoft, IBM, and others.

As Tech Booms, Workers Turn to Coding for Career Change

The New York Times

from July 28, 2015

After Paul Minton graduated from college, he worked as a waiter, but always felt he should do more.

So Mr. Minton, a 26-year-old math major, took a three-month course in computer programming and data analysis. As a waiter, he made $20,000 a year. His starting salary last year as a data scientist at a web start-up here was more than $100,000.

What is actually happening out there in terms of institutional data repositories? | hangingtogether.org

OCLC Research

from July 27, 2015

There is an awful lot of talk about academic libraries providing data curation services for their researchers. It turns out that in most cases that service amounts to training and advice, but not actual data management services. However, institutions without data repositories are likely thinking about implementing one. We thought it would be helpful to hear from those few who have implemented data repositories. [If you are one of those pioneers and did not get a chance to fill out the survey, feel free to describe your repository program as a comment to this post.]

OCLC Research conducted an unscientific survey about data repositories from 5/19/2015 to 7/16/2015. Initially the survey was sent to twelve institutions that were believed to have a data repository. They were asked to identify other institutions with data repositories. In total, 31 institutions were invited to take the survey. 22 filled out the survey and two of those indicated that they do not have a data repository. The following summarizes the twenty responses from institutions with data repositories.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Nature Biotechnology

from July 27, 2015

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with ‘deep learning’ techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a ‘mutation map’ that indicates how variations affect binding within a specific sequence.

CDS News

Movement Tracking Technology Sheds Light on Different Speech Disorders in Children

NYU News

from July 28, 2015

Facial motion capture – the same technology used to develop realistic computer graphics in video games and movies – has been used to identify differences between children with childhood apraxia of speech and those with other types of speech disorders, finds a new study by NYU’s Steinhardt School of Culture, Education, and Human Development.

“In our study, we see evidence of a movement deficit in children with apraxia of speech, but more importantly, aspects of their speech movements look different from children with other speech disorders,” said study author Maria Grigos, associate professor in the Department of Communicative Sciences and Disorders at NYU Steinhardt. The study, coauthored by Aviva Moss and Ying Lu of NYU Steinhardt, is published in the August issue of the Journal of Speech, Language, and Hearing Research.

Sports.BradStenger.com

NYU Data Science newsletter – July 29, 2015

Leave a Comment Cancel reply