Data Science newsletter – September 16, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for September 16, 2019

GROUP CURATION: N/A

Data Science News

What Statistics Can and Can’t Tell Us About Ourselves

The New Yorker, Hannah Fry

from September 02, 2019

In the era of Big Data, we’ve come to believe that, with enough information, human behavior is predictable. But number crunching can lead us perilously wrong.

‘The Alexa of chemistry’: National Science Foundation puts VCU and partners on fast track to build open network

Virginia Commonwealth University, VCU News

from September 12, 2019

D. Tyler McQuade, Ph.D., a professor in the Virginia Commonwealth University College of Engineering, is principal investigator of a multi-university project seeking to use artificial intelligence to help scientists come up with the perfect molecule for everything from a better shampoo to coatings on advanced microchips.

The project is one of the first in the U.S. to be selected for $994,433 in funding as part of a new pilot project of the National Science Foundation called the Convergence Accelerator (C-Accel). McQuade and his collaborators will pitch their prototype in March 2020 in a bid for additional funding of up to $5 million over five years.

Why haven’t cyberinsurers exerted more pressure on companies to be better at security?

Boing Boing, Cory Doctorow

from September 10, 2019

In a forthcoming paper in IEEE Privacy and Security, two computer scientists (Oxford U, U of Tulsa) investigate this question, documenting the dismal state of insurers’ requirements for cyberinsurance, and the ease of making claims, even for incidents that were utterly preventable.

One possibility that the authors don’t delve into: cyberinsurance is cheap because the penalties for breaches are laughably light. While it’s true that some incidents (e.g. ransomware) have a direct operational cost to the company, the vast majority of incidents involve data-breaches that affect the company’s customers or stakeholders.

The lingering and extreme impacts of the Deepwater Horizon oil spill on the deep sea

Deep Sea News, Dr. M

from September 08, 2019

Researchers continued to find severe impacts on deep-sea life. The numerical declines were staggering within the first few months; forams (↓80–93%), copepods (↓64%), meiofauna (↓38%), macrofauna (↓54%) and megafauna (↓40%). One year later, the impacts on diversity were still evident and correlated with increases in total petroleum hydrocarbons (TPH), polycyclic aromatic hydrocarbons (PAH), and barium in deep-sea sediments. In 2014, PAH was still 15.5 and TPH 11.4 times higher in the impact zone versus the non-impact zone, and the impact zones still exhibited depressed diversity. Continued research on corals found the majority of colonies still had not recovered by 2017. However, studies examining the impacts of the DWH oil spill on most deep-sea life ended in 2014.

Google collects face data now. Here’s what it means and how to opt out

CNET, Dale Smith

from September 11, 2019

The new Face Match technology isn’t everywhere yet, but it’s always looking. Find out what’s happening with your face data and what you can do to stop it.

In Markets Gone Mad, Investors Find Rare Comfort in Data Science

Bloomberg Technology, Liz McCormick and Anchalee Worrachate

from September 11, 2019

In a Bleecker Street loft in downtown Manhattan, Vasant Dhar, the founder of a $400 million hedge fund and a pioneer of AI investing, finds his computer-driven trading is just the thing for the Trump era. He uses a program that captures not just securities prices, economic data and news sentiment but also market fears — studying patterns of volatility. In Boston, Eaton Vance Corp. has a four-person data science team studying anonymous credit-card spending information, customer sentiment from social media and ETF flows on top of the fundamental work of its equity portfolio managers.

“At times, the market has been behaving like a naive schizophrenic that puts all faith in only the most recent trade tweet,” said Eddie Perkin, chief equity investment officer at Eaton Vance. “As humans, we know we have intellectual biases that can lead us to flawed decision-making such as reacting to false signals from the market. We have built our investment processes to address this and help keep us out of trouble.” The approach has helped Eaton Vance’s Small-Cap Fund gain 22% year-to-date, almost double the Russell 2000 Index’s 11.6% increase.

Artificial Intelligence That Reads Chest X-Rays Is Approved by FDA

University of California-San Francisco, Research

from September 12, 2019

New artificial intelligence-powered technology may soon help reduce delays in diagnosing a collapsed lung, or pneumothorax, which is a serious condition that can be treated with quick intervention.

The U.S. Food and Drug Administration has approved a new artificial intelligence (AI) algorithm that works with portable X-rays to rapidly screen for collapsed lung. The technology was developed by GE Healthcare and UC San Francisco researchers.

Good stuff first: Google moves to prioritize original reporting in search

Nieman Journalism Lab, Laura Hazard Owen

from September 12, 2019

The company has changed its global search algorithm to “highlight articles that we identify as significant original reporting,” and to keep such articles in top positions for longer.

Rock Health founder Halle Tecco launches new fertility startup Natalist

MobiHealthNews, Laura Lovett

from September 11, 2019

Earlier this month, Rock Health founder Halle Tecco unveiled her next endeavor, a fertility company that sells scientifically proven products and gives online educational resources to women seeking to get pregnant. Dubbed The Natalist, Tecco teamed up with former Harvard researcher Dr. Elizabeth Kane and University of California, San Francisco OBGYN clinical instructor Dr. Naz Homaifar.

“There are a lot of things within fertility that add cost without improving outcomes, and so we wanted to make sure we felt good about the value we were adding to the healthcare system,” Tecco told MobiHealthnews. “So focusing on patient education and understanding your own fertility and your own fertile window and how to get pregnant is really at the core or the business.

Enabling Web-scale data integration in biomedicine through Linked Open Data

npj Digital Medicine, Maulik R. Kamdar et al.

from September 10, 2019

The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

Talent4AI wants to address the demand for AI talent by connecting students and industry

BetaKit, Jessica Galang

from September 11, 2019

A new Montreal-based platform wants to connect upcoming data scientists and engineers to real-world internship opportunities.

Talent4AI recently launched with the goal of helping both college and university students find paid internships in fields related to AI. The goal is to help industry recruiters and researchers better handle applications, including searching for candidates, selection, onboarding, performance evaluation during the internship, and conducting post-internship evaluation.

The program is hoping to tackle the challenge of creating enough artificial intelligence talent to support demand. According to analysis from Element AI’s Global Talent report for this year, the supply of AI talent does not meet the demand.

Q&A: New Tufts Data Intensive Studies Center director Dr. Abani Patra

The Tufts Daily student newspaper, Ananya Pavuluri

from September 11, 2019

Dr. Abani Patra was appointed director of the up and coming Data Intensive Studies Center (DISC) at Tufts University this past spring. He started in this position on Aug. 15. Patra has an extensive background in computational and data sciences. He previously served as the founding director of the Institute for Computational and Data Sciences at the University at Buffalo, as well as a professor in the University at Buffalo Department of Mechanical and Aerospace Engineering. He joins Tufts as a professor of mathematics and computer science and was recently awarded the Stern Family Professorship.

The Daily sat down with Patra to discuss his vision for DISC, plans to teach his own courses at Tufts and what students and faculty can expect from DISC in the near future.

Trifacta, A Data Cleaning Startup, Raised $100 Million From Investors

Fortune, Jonathan Vanian

from September 12, 2019

Trifacta, a startup that specializes in cleaning corporate data so it can be analyzed, has raised $100 million in funding, underscoring current investor appetite for data-crunching startups amid the artificial intelligence boom.

New Trifacta investors who were part of the funding round include Telstra Ventures, Energy Impact Partners, Japanese mobile operator NTT Docomo, BMW i Ventures, and Dutch bank ABN AMRO. Trifacta declined to comment on its valuation, but a source familiar with the deal said it’s under $1 billion, which means the company is no unicorn startup, a status that requires a private valuation of over $1 billion.

The startup’s technology—referred to in the industry as data wrangling—sorts through copious amounts of data in order to standardize the information, which can often be labeled differently or duplicated when it’s stored in multiple databases.

UC Davis Invests $4 Million to Launch New Research Centers

University of California-Davis, UC Davis News

from September 12, 2019

“The University of California, Davis, is investing $4 million over three years to launch four new research centers that align campus strengths with unique opportunities for global impact. The UC Davis Office of Research initiative is designed to promote new, sustainable, inter- and multidisciplinary research activity that responds to society’s greatest challenges and needs.” Includes the UC Davis Data Science Center.

Duke Launches New Master of Engineering in Financial Technology

Duke University, Pratt School of Engineering

from September 12, 2019

At the heart of every seamless financial transaction is financial technology, or “FinTech.” It’s the savvy code that lets you pay for your coffee with a wave of your smartwatch, and the machine-learning algorithm that lets you trade stocks from your phone. It’s the app that lets you pay your share of the rent while keeping your bank account number secret. From bitcoin to blockchain, FinTech is changing the way we think about financial services.

Now, Duke University’s Pratt School of Engineering and the Duke Financial Economics Center (DFE) have partnered to launch a new Master of Engineering in FinTech, one that will fuel innovation in the finance industry by teaching in-demand computing and programming skills along with industry-specific business fundamentals. Among the first programs of its kind based in an engineering school, Duke’s FinTech degree is designed to prepare graduates to meet growing need for technical expertise in the traditional and emerging finance sectors.

Events

West Coast Optimization Meeting

Philip Loewen

from September 28, 2019

Vancouver, BC, Canada September 27-28 at University of British Columbia. [registration required]

NYU Linguistics Colloquium Series published for 2019-2020

NYU Department of Linguistics

from September 20, 2019

New York, NY Starts at 3:30 p.m., 10 Washington Place Room 104. September 20: Claire Halpert (University of Minnesota).

Tools & Resources

Should we take a few long holidays, or lots of short ones?

Tim Harford

from September 13, 2019

… We would all like to feel that our work is essential and our personal contribution irreplaceable. But, as Alex Soojung-Kim Pang, author of Rest: Why You Get More Done When You Work Less, notes, we’re unlikely to be doing quite as essential a job as Eisenhower’s. If he benefited from some down time, so might we.

But what sort of break is best? Should we be thinking of long sabbaticals, or is it enough to keep evenings and weekends free? Perhaps the ideal compromise is Bridget Jones’s dream of a “full-blown mini-break holiday weekend”?

Introducing Feathers 4: A framework for real-time apps and REST APIs

Feathers, David Luecke

from August 27, 2019

Feathers is a web-framework for creating real-time applications and REST APIs using JavaScript or TypeScript with Node.JS, React Native and the browser. It works with any backend supporting many databases out of the box and with any frontend technology like React, VueJS, Angular, Android or iOS.

[1909.05372] Overton: A Data System for Monitoring and Improving Machine-Learned Products

arXiv, Computer Science > Machine Learning; Christopher Ré, Feng Niu, Pallavi Gudipati, Charles Srisuwananukorn

from September 07, 2019

We describe a system called Overton, whose main design goal is to support engineers in building, monitoring, and improving production machine learning systems. Key challenges engineers face are monitoring fine-grained quality, diagnosing errors in sophisticated applications, and handling contradictory or incomplete supervision data. Overton automates the life cycle of model construction, deployment, and monitoring by providing a set of novel high-level, declarative abstractions. Overton’s vision is to shift developers to these higher-level tasks instead of lower-level machine learning tasks. In fact, using Overton, engineers can build deep-learning-based applications without writing any code in frameworks like TensorFlow. For over a year, Overton has been used in production to support multiple applications in both near-real-time applications and back-of-house processing. In that time, Overton-based applications have answered billions of queries in multiple languages and processed trillions of records reducing errors 1.7-2.9 times versus production systems.

Careers

Postdocs

Postdoc

Cornell University, Department of Neurobiology and Behavior; Ithaca, NY

Postdoc/research assistant

Università di Torino; Turin, Italy

Full-time positions outside academia

Senior Editor

The Markup; New York, NY

Image Processing and Analysis Engineer

Seattle Institute for Biomedical and Clinical Research (SIBCR); Seattle, WA

Sports.BradStenger.com

Data Science newsletter – September 16, 2019

Leave a Comment Cancel reply