NYU Data Science newsletter – June 10, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 10, 2016

GROUP CURATION: N/A

Data Science News

[1606.01167] How Deep is the Feature Analysis underlying Rapid Visual Categorization?

arXiv, Computer Science > Computer Vision and Pattern Recognition; Sven Eberhardt, Jonah Cader, Thomas Serre

from June 03, 2016

Rapid categorization paradigms have a long history in experimental psychology: Characterized by short presentation times and speedy behavioral responses, these tasks highlight the efficiency with which our visual system processes natural object categories. Previous studies have shown that feed-forward hierarchical models of the visual cortex provide a good fit to human visual decisions. At the same time, recent work in computer vision has demonstrated significant gains in object recognition accuracy with increasingly deep hierarchical architectures. But it is unclear how well these models account for human visual decisions and what they may reveal about the underlying brain processes.

We have conducted a large-scale psychophysics study to assess the correlation between computational models and human participants on a rapid animal vs. non-animal categorization task. We considered visual representations of varying complexity by analyzing the output of different stages of processing in three state-of-the-art deep networks. We found that recognition accuracy increases with higher stages of visual processing (higher level stages indeed outperforming human participants on the same task) but that human decisions agree best with predictions from intermediate stages.
Overall, these results suggest that human participants may rely on visual features of intermediate complexity and that the complexity of visual representations afforded by modern deep network models may exceed those used by human participants during rapid categorization.

Computer Vision Research: The deep “depression”

LinkedIn, Nikos Paragios

from June 05, 2016

… almost all the community now seems to target the development of more complex pipelines (that most likely cannot be reproduced based on the elements presented in the paper) which in most of the cases have almost no theoretical reasoning behind that can add 0,1% of performance on a given benchmark. Is this the objective of academic research? Putting in place highly complex engineering models that simply explore computing power and massive annotated data?

Your Search Engine History Could Actually Lead to a Real Medical Diagnosis

WNYC, The Takeaway

from June 09, 2016

If you like Washington politics, you’re likely to get targeted political stories in your newsfeed. Turning 50 this year? You’ll probably start to notice AARP ads popping up. But imagine if your search history could help you diagnose cancer or depression, and send you information on how to get treatment? [audio, 7:00]

Reflection on the Data Science Profession

YouTube, Work-Bench

from June 03, 2016

Delivered by Drew Conway (CEO, Alluvium) at the 2016 New York R Conference on April 8th and 9th at Work-Bench. Conway reflects on how New York City became a hub for data science and innovation.

CrowdFlower raises $10 million from Microsoft and others to bring A.I. to data science

VentureBeat, Paul Sawers

from June 07, 2016

CrowdFlower, a crowdsourced data-cleaning and tagging platform, has closed a fresh $10 million in funding in a round led by Microsoft Ventures, Canvas Ventures, and Trinity Ventures.

Founded out of San Francisco in 2009, CrowdFlower had previously raised $28 million, including a $12.5 million round back in 2014, but the company says it will use its cash influx to expedite the adoption of a new machine-learning product called CrowdFlower AI.

[1606.02319] Community detection in networks: Modularity optimization and maximum likelihood are equivalent

arXiv, Computer Science > Social and Information Networks; M.E.J. Newman

from June 07, 2016

We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.

NVIDIA Supercharges Deep Learning Innovation With Program to Support AI Startups

Marketwired, NVIDIA press release

from June 08, 2016

NVIDIA today unveiled a comprehensive global program to support the innovation and growth of startups that are driving new breakthroughs in artificial intelligence and data science.

The NVIDIA Inception Program provides unique tools, resources and opportunities to the waves of entrepreneurs starting new companies, so they can develop products and services with a first-mover advantage.

Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results

Journal of Oncology Practice

from June 07, 2016

QUESTION ASKED:
Can signals mined from large-scale anonymized Web search logs about symptom queries over time be harnessed to build a valuable screening methodology for pancreatic adenocarcinoma?

SUMMARY ANSWER:
Search logs can provide valuable signals to predict the later appearance of first-person queries on disease management that are strongly suggestive of a professional diagnosis of pancreatic carcinoma. Performance of the risk stratification holds many weeks in advance and improves when conditioned on the presence of specific symptoms or risk factors found in people’s search histories.

DH at Berkeley Awards More than $200K in Grants

University of California-Berkeley, Digital Humanities

from June 07, 2016

The grants will promote collaborative research and the development of new DH courses. 14 research teams will be funded for projects that range from database development to algorithmic analytical tools.

OK Computer, Write Me a Song

MIT Technology Review

from June 08, 2016

Google says its AI software could make creative suggestions to help musicians, architects, and visual artists.

Let’s Rummage Through Yahoo’s Patent Sale

The Ringer

from June 08, 2016

Yahoo, the multinational technology company that ruthlessly killed Katie Couric, is not having a good year. Its approach to being a corporation is something like “throw everything at the wall and see what sticks,” except it throws only stink bombs at the wall, and the wall is made of farts. CEO Marissa Mayer is likely on her way out, and now it looks like the company is preparing to sell nearly 3,000 patents in a quickie firesale. Yahoo recently transferred many of these patents to a subsidiary called Excalibur IP LLC.

Events

The Rise of Machine Learning

Welcome to the Silicon Valley Robotics Influencer series, hosted at HAX Accelerator and curated by Cory Kidd of Catalia Health, Tim Smith of Element Public Relations and Andra Keay of Silicon Valley Robotics. This topic looks at the state of the machine learning with special guests: Joshua Bloom (Wise.io), Sarah Osentoski (Mayfield Robotics), Carol Reiley (Drive.ai), Quentin Hardy (New York Times – Moderator).

San Francisco, CA Tuesday, June 14, starting at 6 p.m., HAX Accelerator (479 Jessie Street) [$$]

Tools & Resources

10 Useful Python Data Visualization Libraries for Any Discipline

Mode Blog, Melissa Bierly

from June 08, 2016

Today, we’re giving an overview of 10 interdisciplinary Python data visualization libraries, from the well-known to the obscure. We’ve noted the ones you can take for a spin without the hassle of running Python locally, using Mode Python Notebooks.

A Predictive Model to Foresee Flight Delays

Spark.tc

from June 09, 2016

Certain causes of delay like emergency maintenance and crew illness are unpredictable, but for weather-related delays, public collections of flight and weather data offer the promise of insight. The question is how to build and application that could take advantage of that data.

Open Sourcing Photon ML

LinkedIn Engineering, Paul Oglivie

from June 07, 2016

Machine learning is a key component of LinkedIn’s relevance-driven products. We use machine learning to train the ranking algorithms for our feed, advertising, recommender systems (such as People You May Know), email optimization, search engines, and more. For an in-depth example, check out these posts (part one and two) on how LinkedIn applies machine learning for ranking the feed.

These algorithms play an important role in determining user experience for content-rich websites, so it’s critical that we provide our engineers with easy-to-use machine learning tools that create high-quality models that are fast and scale to large datasets. To meet these needs, we have developed Photon ML, a machine learning library for Apache Spark.

Ten Simple Rules for Effective Statistical Practice

PLOS Computational Biology

from June 09, 2016

Several months ago, Phil Bourne, the initiator and frequent author of the wildly successful and incredibly useful “Ten Simple Rules” series, suggested that some statisticians put together a Ten Simple Rules article related to statistics. (One of the rules for writing a PLOS Ten Simple Rules article is to be Phil Bourne. In lieu of that, we hope effusive praise for Phil will suffice.)

Implicit in the guidelines for writing Ten Simple Rules [1] is “know your audience.” We developed our list of rules with researchers in mind: researchers having some knowledge of statistics, possibly with one or more statisticians available in their building, or possibly with a healthy do-it-yourself attitude and a handful of statistical packages on their laptops. We drew on our experience in both collaborative research and teaching, and, it must be said, from our frustration at being asked, more than once, to “take a quick look at my student’s thesis/my grant application/my referee’s report: it needs some input on the stats, but it should be pretty straightforward.”

Careers

The Art of Pivoting

Medium, Boris Adryan

Sports.BradStenger.com

NYU Data Science newsletter – June 10, 2016

Leave a Comment Cancel reply