Data Science newsletter – July 12, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 12, 2018

GROUP CURATION: N/A

Data Science News

How to get science research covered in the press

Northeastern University, News @ Northeastern

from July 10, 2018

One of the most persistent challenges in science today is how to get the mainstream press—and by extension, the general public—to pay attention to the most important scientific research of the day.

A massive new study by Northeastern researchers uses more than 91,000 scientific papers published in 2016 to demonstrate that machine learning can be used to predict press coverage for future research.

Among their findings was that media coverage is often determined more by the subject matter of the research than by its scientific importance. For example, research that involves personal health or climate change consistently gets more coverage than studies involving cell biology or applied mathematics, according to Ansel MacLaughlin, a doctoral student in computer science who is the first author on the study.

An Attempt to Chart the History of NLP in 5 Papers: Part II

Open Data Science Conference, Kaylen Sanders

from July 11, 2018

This article is the second article in a two-part series about the history of NLP as told through the context of five research papers. It picks up in midst of the 1970s.

Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists

JAMA, The JAMA Network, Original Investigation; Li Zhou, Suzanne V. Blackley, Leigh Kowalski et al

from July 06, 2018

Question How accurate are dictated clinical documents created by speech recognition software, edited by professional medical transcriptionists, and reviewed and signed by physicians?

Findings Among 217 clinical notes randomly selected from 2 health care organizations, the error rate was 7.4% in the version generated by speech recognition software, 0.4% after transcriptionist review, and 0.3% in the final version signed by physicians. Among the errors at each stage, 15.8%, 26.9%, and 25.9% involved clinical information, and 5.7%, 8.9%, and 6.4% were clinically significant, respectively.

Meaning An observed error rate of more than 7% in speech recognition–generated clinical documents demonstrates the importance of manual editing and review.

California malls are sharing license plate tracking data with an ICE-linked database

TechCrunch, Taylor Hatmaker

from July 11, 2018

A chain of California shopping centers appears to be sharing its license plate reader data with a well-known U.S. Immigration and Customs Enforcement (ICE) contractor, giving that agency the ability to track license plate numbers it captures in near real-time.

A report from the Electronic Frontier Foundation revealed that real estate group Irvine Company shares that data with Vigilant Solutions, a private surveillance tech company that sells automated license plate recognition (ALPR) equipment to law enforcement and government agencies. Irvine Company owns nearly 50 shopping centers across California with locations in Irvine, La Jolla, Newport Beach, Redwood City, San Jose, Santa Clara and Sunnyvale. ICE finalized its contract with Vigilant Solutions in January of this year.

Intel Editorial: How Governments Can Help Advance Artificial Intelligence

Business Wire, Intel

from July 10, 2018

The following is an opinion editorial provided by Naveen Rao of Intel Corporation.

Most people agree that artificial intelligence (AI) will transform modern society in positive ways. From autonomous cars that will save thousands of lives, to data analytics programs that may finally discover a cure for cancer, to machines that give voice to those who can’t speak, AI will be known as one of the most revolutionary innovations of mankind.

But this fantastic future is a long way off, and the path to get us there is still under construction. Never before has society undertaken such a significant transformation so deliberately, and no blueprints exist to guide us. Yet one thing is clear: AI is bigger than any one company, industry or country can address on its own. It will take the whole of our technology ecosystem and the world’s governments to realize the full promise of AI.

UW’s $20 Million Microbial Grant Research Projects Underway This Summer

University of Wyoming, News

from July 10, 2018

Microbes — those living, single-celled organisms too small for the naked eye to see — could end up doing some mighty big things for the University of Wyoming.

Over the next five years, UW researchers are taking an up-close look at those unseen organisms at an unprecedented scale, thanks to a $20 million, five-year grant from the National Science Foundation (NSF).

And using cutting-edge techniques including DNA sequencing and computational modeling, the scientists hope to learn the distribution and ecological consequences of microbes, producing insights that will help Wyomingites address a variety of challenges — from managing rangeland, forest and water resources, to reclaiming areas disturbed by mineral extraction, to improving crop productivity.

Ecology and AI

EurekAlert! Science News, Harvard University

from July 10, 2018

It’s poised to transform fields from earthquake prediction to cancer detection to self-driving cars, and now scientists are unleashing the power of deep learning on a new field – ecology.

A team of researchers from Harvard, Auburn University, the University of Wyoming, the University of Oxford and the University of Minnesota demonstrated that the artificial intelligence technique can be used to identify animal images captured by motion-sensing cameras.

Improving disaster response through Twitter data

Penn State University, Penn State News

from July 10, 2018

Twitter data could give disaster relief teams real-time information to provide aid and save lives, thanks to a new algorithm developed by an international team of researchers.

A team of researchers from Penn State, the Indian Institute of Technology Kharagpur, and the Qatar Computing Research Institute created an algorithm that analyzes Twitter data to identify smaller disaster-related events, known as sub-events, and generate highly accurate, real-time summaries that can be used to guide response activities.

The group presented their paper — “Identifying Sub-events and Summarizing Information from Microblogs during Disasters” — today (July 10) at the 41st International Association for Computing Machinery’s Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval in Ann Arbor, Michigan.

Microsoft partnering with Duke in downtown Durham

Triangle Business Journal, Ben Graham

from July 10, 2018

Microsoft is teaming up with Duke University on a new research partnership that will result in the tech giant creating an “innovation hub” at the recently-overhauled Chesterfield building in downtown Durham.

The 5,000-square-foot space, slated to open by the end of summer, will put Duke researchers next to Microsoft employees, allowing for collaboration on a number of research areas related to health care and machine learning.

“The partnership is around health care science, the application of data science to health,” says Lawrence Carin, Duke’s vice provost for research.

DARPA wants AI with baked-in physics knowledge

GCN

from July 09, 2018

Although artificial intelligence is making its way into private- and public-sector enterprise systems, it has not gained as much traction in the Defense Department. Between DOD’s security and performance requirements, the immaturity of the technology to deal with unstructured and incomplete data and the complex problems that come with modeling dynamic systems, integration of AI into defense applications has been slow.

To speed the adoption of AI, the Defense Advanced Research Projects Agency is issuing a Disruption Opportunity — a call for innovative basic research concepts exploring new architectures and approaches to improve AI’s ability to generalize beyond training data and work with sub-optimal data.

The Physics of AI (PAI) program hypothesizes that challenges associated with today’s machine learning and AI systems can be overcome, especially in many defense applications, by “baking in” physics – relevant scientific and mathematical knowledge — from the outset.

Tools & Resources

How to Accelerate ArtificiaI Intelligence Initiatives with Agile Development

Oracle DataScience.com, Nathaniel Gates

from July 09, 2018

The explosion of artificial intelligence (AI) initiatives at enterprise companies has brought a wave of new opportunity for data scientists, as well as a lot of frustration. Strategic data science projects are exploratory, high-visibility, and high risk — and it’s unlikely that they will end with a perfect solution. Data science teams want to accelerate delivery and reduce risk of these projects, but the traditional approach to AI development is hindering their success.

2018 Data Visualization Survey Results

Medium, Elijah Meeks

from June 21, 2018

In May, I put out a call to fill out a survey directed at people who do professional data visualization. It received 628 responses (down from 1000 last year) and the processed and cleaned results have been uploaded onto a GitHub repo alongside the 2017 results. As with last year’s survey it’s important to remember this is not a scientific survey and is likely biased toward my social network, but it was shared widely among the BI and data science communities, as well as among freelancers and other data visualization practitioners. That’s reflected in a broad set of results from journalism, science, tech, academia and other fields and approaches.

Papers with Code

h/t @ModeAnalytics

from July 05, 2018

A searchable site that links machine learning papers on ArXiv with code on GitHub.

Data Science: Ribs

Russ Poldrack

from July 10, 2018

This code plots the meat and air temperature, both measured using the Maverick BBQ thermometer. The vents all started at about 10% open, and the dashed vertical lines indicate changes in venting (closing in red, opening in green). The black dotted line indicates points when we removed the cover to check the meat and baste it with a bit of apple juice (and sometimes add more water to the pan).

[1806.11248] XGBoost: Scalable GPU Accelerated Learning

arXiv, Computer Science > Machine Learning; Rory Mitchell, Andrey Adinets, Thejaswi Rao, Eibe Frank

from June 29, 2018

We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library (this https URL). Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library. We employ data compression techniques to minimise the usage of scarce GPU memory while still allowing highly efficient implementation. Using our algorithm we show that it is possible to process 115 million training instances in under three minutes on a publicly available cloud computing instance. The algorithm is implemented using end-to-end GPU parallelism, with prediction, gradient calculation, feature quantisation, decision tree construction and evaluation phases all computed on device.

Careers

Full-time, non-tenured academic positions

Programs Manager, NYU Entrepreneurial Institute

New York University; New York, NY

Sports.BradStenger.com

Data Science newsletter – July 12, 2018

Leave a Comment Cancel reply