NYU Data Science newsletter – November 10, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 10, 2015

GROUP CURATION: N/A

 
Data Science News



Machine Learning Isn’t Data Science — Medium

Medium, Nwokedi C. Idika


from October 31, 2015

Too often, Machine Learning is used synonymously with Data Science. Before I knew what both of these terms were, I simply thought that Data Science was just some new faddish word for Machine Learning. Over time though, I’ve come to appreciate the real differences in these terms. I’ve always wondered how misconceptions like these endure for so long?—?my current working hypothesis: people are deathly afraid of looking stupid. Too afraid of asking someone “what is machine learning? What is data science? What is the difference?” So, for those too afraid of asking, I’m going to pretend that you asked. Now, what follows is my hypothetical answers to your hypothetical questions :-). Enjoy.

 

#SciData15: Research Data for Discovery: Prepare to Share : Naturejobs Blog

Naturejobs Blog


from November 04, 2015

“We must engage in the idea of sharing,” said conference chair Iain Hrynaszkiewicz as the 2015 Publishing Better Science through Better Data meeting kicked off at the headquarters of Nature Publishing Group (NPG) in London on 23rd October.

Hrynaszkiewicz, who develops new areas of open research publishing and data policy within NPG/Macmillan, noted that 30 funding bodies — including the Engineering and Physical Sciences Research Council and The Royal Society — have written policies that outline requirements for data-sharing. Examples include detailed methods and protocols, microscopy images and mathematical workings, as well as meta-datasets of, for example, genotypes and microarrays.

The meeting’s aims were to increase awareness of ways to effectively share data and to discuss how to improve the efficiency, implementation and overall impact of sharing among the scientific community.

 

I’m going to make Facebook’s AI predict what happens in videos | New Scientist

New Scientist


from November 06, 2015

This week, Facebook unveiled several artificial intelligence projects. Yann Lecun, the company’s director of AI, reveals what this technology can do.

 

H2O.ai Raises $20M For Its Open Source Machine Learning Platform | TechCrunch

TechCrunch


from November 09, 2015

H2O is an open source platform for data scientists and developers who need a fast machine learning engine for their applications. H2O.ai, the company behind the service, today announced that it has raised a $20 million Series B funding round led by Paxion Capital Partners (the new firm of GoPro board member Michael Marks) and existing investors Nexus Venture Partners and Transamerica. New investor Capital One Growth Ventures also joined this round. In total, the company has now raised $34 million.

The H2O platform the company’s main product, but it also offers a number of tools around that platform. These include Sparkling Water, which combines the Apache Spark data processing engine with the H2O platform (see where that name comes from?), as well as Flow, an open source notebook-style user interface for H2O (similar to iPython notebooks).

 

TensorFlow – Google’s latest machine learning system, open sourced for everyone

Google Research Blog


from November 09, 2015

Deep Learning has had a huge impact on computer science, making it possible to explore new frontiers of research and to develop amazingly useful products that millions of people use every day. Our internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networks and scale training to thousands of cores in our datacenters. We’ve used it to demonstrate that concepts like “cat” can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.

While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure — making it nearly impossible to share research code externally.

Today we’re proud to announce the open source release of TensorFlow — our second-generation machine learning system, specifically designed to correct these shortcomings.

 

Prof. David Bamman to Offer Interdisciplinary Critical Data Science Course Spring 2016 | Digital Humanities

UC Berkeley, Digital Humanities


from November 09, 2015

… Bamman invites students from a variety of disciplines to join his upcoming spring semester course, “Information 290: Deconstructing Data Science.” Students will explore “a range of methods in machine learning and data analysis that leverage information produced by people in order to draw inferences.” Bamman’s example applications include discerning the authorship of documents, examining the political sentiments of social media users, charting the reuse of language in legislative bills, tagging the genres of songs, and extracting social networks from literary texts. Machine learning has seen increasing adoption in Berkeley’s digital humanities community; DH Fellow Elizabeth Honig currently collaborates with researchers at Duke University who are using machine learning to discern common motifs in large collections of paintings (such as images adapted from templates in the Brueghel family workshop).

 

The FCC says it can’t force Google and Facebook to stop tracking their users

The Washington Post, The Switch blog; Brian Fung and Andrea Peterson


from November 06, 2015

The Federal Communications Commission said Friday that it will not seek to impose a requirement on Google, Facebook and other Internet companies that would make it harder for them to track consumers’ online activities.

The announcement is a blow to privacy advocates who had petitioned the agency for stronger Internet privacy rules. But it’s a win for many Silicon Valley companies whose business models rely on monetizing Internet users’ personal data.

 

What are Deep Learning specialists saying about Google’s TensorFlow?

LinkedIn, Marshall Kirkpatrick


from November 10, 2015

Google made a big splash today by open sourcing a version of its 2nd generation Deep Learning framework, called TensorFlow. After being built by the Google Brain team and deployed across more than 100 Google and Alphabet products, the framework will now be usable and extensible by anyone under the common Apache Contributors License Agreement. It is expected to evolve with community participation.

This could prove a milestone in the development of Artificial Intelligence.

The Deep Learning developer community is generally excited about it. They’re excited about its visualization components and they’re looking forward to more support for distributed systems – but they seem very excited already.

 

How to Use Ecological Thresholds to Inform Resource Management | NCEAS

NCEAS, University of California Santa Barbara


from November 09, 2015

As the number, extent, and intensity of human impacts expand, strongly non-linear responses in ecosystems become even more likely. So called “tipping points” can lead to unwanted shifts in ecosystem state, slow recovery of systems back to desired states, and altered ecosystem function and delivery of services. Understanding of the factors that drive threshold dynamics, and when and how rapidly thresholds will be crossed is important to help set management targets.

The Ocean Tipping Points team, many of which are resident at NCEAS, have recently published an illustrative review article in Frontiers in Marine Sciences that outlines with real case studies how resource managers can use various approaches available today—including ecosystem monitoring, statistical methods to identify thresholds and indicators, and threshold-based adaptive management—to help avoid reaching ecological thresholds or restore systems that have crossed them.

 

How computers broke science – and what we can do to fix it

The Conversation, Ben Marwick


from November 09, 2015

Reproducibility is one of the cornerstones of science. Made popular by British scientist Robert Boyle in the 1660s, the idea is that a discovery should be reproducible before being accepted as scientific knowledge.

In essence, you should be able to produce the same results I did if you follow the method I describe when announcing my discovery in a scholarly publication. For example, if researchers can reproduce the effectiveness of a new drug at treating a disease, that’s a good sign it could work for all sufferers of the disease. If not, we’re left wondering what accident or mistake produced the original favorable result, and would doubt the drug’s usefulness.

For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results. But, since the introduction of the personal computer – and the point-and-click software programs that have evolved to make it more user-friendly – reproducibility of much research has become questionable, if not impossible. Too much of the research process is now shrouded by the opaque use of computers that many researchers have come to depend on. This makes it almost impossible for an outsider to recreate their results.

 

UPMC data chief: Integrated health systems have advantages in reducing readmissions | MobiHealthNews

mobihealthnews


from November 09, 2015

The University of Pittsburgh Medical Center (UPMC) describes itself as the second largest integrated payor-provider network after Kaiser Permanente. At the HIMSS Big Data and Healthcare Analytics Forum on Thursday, UPMC Chief Analytics Officer Pamela Peele talked about how being both a payer and a provider enables UPMC to address readmissions in a unique way.

“Providers are trained to manage disease, and insurers are trained to manage financial risk,” she said. “[Asking providers to manage risk] is like asking me to put a stent in you. We’re asking providers who are not trained to manage financial risk to manage financial risk, which is something insurers do extraordinarily well, which is another reason putting a payer and a provider together is so powerful.”

 

Can an algorithm prove you won’t quit your next job?

Bloomberg Government


from November 09, 2015

… Over the past few years, the hiring-algorithm space has become crowded with companies promising to “hire better than a human.” In the recruitment-software world, better comes in many flavors, but most companies don’t look at post-hire metrics. An algorithm, a formula that takes in data and spits out a result, might fill the slot faster or find more diverse candidates. But how did those people do in their job? And are they still there one, two, three years down the road?

How about forever?

“I don’t think it’s very common to connect pre-hire and post-hire together,” said Kieren Snyder, co-founder and CEO of Textio, which optimizes job postings for companies including Starbucks and Barclays. Textio is developing technology to bridge the gap.

 

Choosing experiments to accelerate collective discovery

Proceedings of the National Academy of Sciences; Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster and James A. Evans


from November 04, 2015

A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.

 

How Satellite Data And Artificial Intelligence Could Help Us Understand Poverty Better

Fast Company


from November 06, 2015

Data analytics firm Orbital Insight is partnering with the World Bank to test technology that could help measure global poverty using satellite imagery and artificial intelligence.

Governments and development organizations currently measure poverty levels by conducting door-to-door surveys. The new partnership will test the use of AI to supplement these surveys and increase the accuracy of poverty data. Orbital said its AI software will analyze satellite images to see if characteristics such as building height and rooftop material can effectively indicate wealth.

The pilot study will be conducted in Sri Lanka. If successful, the World Bank hopes to scale it worldwide.

 

Let’s value the software that powers science: Introducing Depsy

Impactstory blog


from November 09, 2015

Today we’re proud to officially launch Depsy, an open-source webapp that tracks research software impact.

We made Depsy to solve a problem: in modern science, research software is often as important as traditional research papers–but it’s not treated that way when it comes to funding and tenure. There, the traditional publish-or-perish, show-me-the-Impact-Factor system still rules.

 

Your Future Depends on Your Ability to Understand Big Data. Here’s What You Need to Know. | Inc.com

Inc.com


from November 06, 2015

This is what today’s entrepreneurs need to understand. 90% of the world’s data is estimated to have been created in the past 2 years and most recently via mobile devices. Whether you know it or not, this data is impacting your business. More importantly, your ability to understand the available insights from this data and navigate through this real-time information will determine your future success.

 
Events



Conversations: “How Data is Changing Media Companies”



As data is becoming increasingly available, how are media companies digesting and presenting data to best communicate and illustrate their stories and products? How do journalists know when data is reliable? How is data impacting the way news is presented and reported? How is data driving business decisions within an organization? For our November event, “How Data is Changing Media Companies,” our panelists will discuss and debate their thoughts on what lies ahead for them and for the industry as a whole.

Wednesday, November 18, at 6 p.m., Microsoft, 11 Times Square

 

Sound Visualization & Data Sonification Hackathon



The Monthly Music Hackathon NYC for December.

Saturday, December 12, starting at 12 noon, Spotify, 45 W. 18th St.

 

Leave a Comment

Your email address will not be published.