Data Science newsletter – December 6, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for December 6, 2019

GROUP CURATION: N/A

Data Science News

UW Faculty Member’s Software Adopted by IBM to Make Weather Forecasts Faster

University of Wyoming, News

from December 02, 2019

Suresh Muknahallipatna, a professor in the Department of Electrical and Computer Engineering at the University of Wyoming, led the successful completion of a software program that will allow for accurate weather forecasts up to 12 times faster than traditionally possible.

The Weather Co., a subsidiary of IBM, announced the global rollout of IBM GRAF (Global High-Resolution Atmospheric Forecasting System) Nov. 14.

Russia to upgrade homegrown encyclopedia after Putin pans Wikipedia

Reuters, World News

from December 02, 2019

Russia is to set up a new online site for its national encyclopedia after President Vladimir Putin said Wikipedia was unreliable and should be replaced.

Five Immersive 3D Surroundings of Data Science

Harvard Data Science Review, Xiao-Li Meng

from November 01, 2019

I am witnessing a profound evolution of statistics as a contributing discipline to data science, from research paradigms to curriculum designs—albeit the latter is evolving at a pace too slow to meet the demand. In general, the more deeply we appreciate the impetus, impacts, and imprints of data science, the more acutely we become aware of the inwardness, incompleteness, and inaccuracies that entail when data science is framed merely as an academic field of knowledge. Adding to the evidence presented in our first issue, the 13 articles in this second issue further demonstrate the ecosystemic nature of data science.

The immersive collaboration among academia, government, and industry that is depicted in the interview about the Laboratory of Analytical Sciences at North Carolina State University provides a glimpse into the making and living of data science as a complex ecosystem. The word “immersive” captures the extent to which data science has evolved in multiple dimensions. The articles in this issue remind us that there are at least five such immersive “3D surroundings” in the data science ecosystem. They are listed here in the approximate ordering from organizational and structural surroundings to individual and epistemological ones.

New Amazon capabilities put machine learning in reach of more developers – TechCrunch

TechCrunch, Ron Miller

from November 26, 2019

Today, Amazon announced a new approach that it says will put machine learning technology in reach of more developers and line of business users. Amazon has been making a flurry of announcements ahead of its re:Invent customer conference next week in Las Vegas.

While the company offers plenty of tools for data scientists to build machine learning models and to process, store and visualize data, it wants to put that capability directly in the hands of developers with the help of the popular database query language, SQL.

By taking advantage of tools like Amazon QuickSight, Aurora and Athena in combination with SQL queries, developers can have much more direct access to machine learning models and underlying data without any additional coding, says VP of artificial intelligence at AWS, Matt Wood.

“This announcement is all about making it easier for developers to add machine learning predictions to their products and their processes by integrating those predictions directly with their databases,” Wood told TechCrunch.

Studying water quality with satellites and public data

Colorado State University, SOURCE

from December 04, 2019

In a new paper, AquaSat: a dataset to enable remote sensing of water quality for inland waters, a team led by Colorado State University Assistant Professor Matt Ross matched large public datasets of water quality observations with satellite imagery to address the challenges of measuring water quality efficiently and cost-effectively.

Purdue announces plans to build new data science building

Purdue University, News

from December 04, 2019

Purdue University will build a new $40 million facility dedicated to data science, designed to help the university meet its goal to be a national and global leader in the field of data science research and education for all students.

The flagship four-story building will be 86,000 square feet, featuring classroom and teaching space for all of campus alongside collaborative workspace for College of Science graduate researchers and faculty who form the core of the university’s strong and growing data science program. Purdue’s Board of Trustees is expected to vote Friday (Dec. 6) to advance the project.

Research analytics: where do libraries fit in?

Hanging Together – the OCLC Research blog, Brian Lavoie

from December 02, 2019

As more and more data on research activity is collected, the range of questions that can be answered with it grows commensurately. A new service area is emerging around research analytics, but what is the role of the academic library? The OCLC Research Library Partnership (RLP)’s Research Support Interest Group recently welcomed Brian Mathews, Associate Dean for Innovation, Preservation, and Access, and David Scherer, Scholarly Communications Librarian and Research Curation Consultant, to lead a virtual discussion (actually two discussions, to accommodate RLP participants across many global time zones) on research analytics at Carnegie Mellon University, where the University Libraries recently deployed a new RIMS and is developing a research analytics service. Brian and David shared their experiences at Carnegie Mellon, which led to rich conversations and sharing of insights among participants from a number of RLP Partner institutions.

Machine learning helps scientists measure important inflammation process

UNC Health Care and UNC School of Medicine, Newsroom

from December 03, 2019

Inflammation is a hallmark of many health conditions, but quantifying how the underlying biology of inflammation contributes to specific diseases has been difficult. For the first time, UNC School of Medicine researchers and colleagues now report the development of a new technology to identify white blood cells called neutrophils that are primed to eject inflammatory DNA into the circulation via a process called NETosis.

The findings, published in Scientific Reports, mark the first time scientists have used machine learning tools for rapid quantitative and qualitative cell analysis in basic science.

Can Investing in Artificial Intelligence Increase Starbucks Sales?

The Motley Fool, Jennifer Saibil

from December 04, 2019

Starbucks is partnering with Microsoft (NASDAQ:MSFT) in using data to heighten customer experience. Starbucks chief technology officer Gerri Martin-Flickinger said: “As an engineering and technology organization, one of the areas we are incredibly excited to be pursuing is using data to continuously improve the experience for our customers and partners.”

It’s interesting that she calls her company, which produces coffee, an engineering and technology organization; it says a lot about the way management thinks. But Martin-Flickinger also points out that the goal of technological innovation is a personal one: When the baseline is taken care of, the employee has a greater opportunity to connect with the customer.

How does this all work? When customers use the mobile app, the company collects data about their preferences. Combining this with knowledge of Starbucks shops in the area, local popular drinks, the weather, and other factors, it can offer recommendations for both products and pairings.

Unpacking the Black Box in Artificial Intelligence for Medicine

Undark magazine, Eric Bender

from December 04, 2019

In deep learning, a subset of a type of artificial intelligence called machine learning, computer models essentially teach themselves to make predictions from large sets of data. The raw power of the technology has improved dramatically in recent years, and it’s now used in everything from medical diagnostics to online shopping to autonomous vehicles.

But deep learning tools also raise worrying questions because they solve problems in ways that humans can’t always follow. If the connection between the data you feed into the model and the output it delivers is inscrutable — hidden inside a so-called black box — how can it be trusted? Among researchers, there’s a growing call to clarify how deep learning tools make decisions — and a debate over what such interpretability might demand and when it’s truly needed. The stakes are particularly high in medicine, where lives will be on the line.

Climate change: From the beginning, models have been remarkably accurate

Vox, David Roberts

from December 04, 2019

For the first time, a group of scientists — Zeke Hausfather of UC Berkeley, Henri Drake and Tristan Abbott of MIT, and Gavin Schmidt of the NASA Goddard Institute for Space Studies — has done a systematic review of climate models, dating back to the late 1970s. Published in Geophysical Research Letters, it tests model performance against a simple metric: how well they predicted global mean surface temperature (GMST) through 2017, when the latest observational data is available.

How good have climate models been at truly predicting the future?

RealClimate, Gavin Schmidt

from December 04, 2019

A new paper from Hausfather and colleagues (incl. me) has just been published with the most comprehensive assessment of climate model projections since the 1970s. Bottom line? Once you correct for small errors in the projected forcings, they did remarkably well.

Climate models are a core part of our understanding of our future climate. They also have been frequently attacked by those dismissive of climate change, who argue that since climate models are inevitably approximations they have no predictive power, or indeed, that they aren’t even scientific.

In an upcoming paper in Geophysical Research Letters, Zeke Hausfather, Henri Drake, Tristan Abbott and I took a look at how well climate models have actually been able to accurately project warming in the years after they were published. This is an extension of the comparisons we have been making on RealClimate for many years, but with a broader scope and a deeper analysis. We gathered all the climate models published between 1970 and the mid-2000s that gave projections of both future warming and future concentrations of CO2 and other climate forcings – from Manabe (1970) and Mitchell (1970) through to CMIP3 in IPCC 2007.

We found that climate models – even those published back in the 1970s – did remarkably well, with 14 out of the 17 projections statistically indistinguishable from what actually occurred.

What Machine Learning Will Mean for Asset Managers

Harvard Business Review, Robert C. Pozen and Jonathan Ruane

from December 03, 2019

ML is particularly adaptable to securities investing because the insights it garners can be acted on quickly and efficiently. By contrast, when ML generates new insights in other sectors, firms must overcome substantial constraints before putting those insights into action. For example, when Google develops a self-driving car powered by ML, it must gain approval from an array of stakeholders before that car can hit the road. These stakeholders include federal regulators, auto insurers, and local governments where these self-driving cars would operate. Portfolio managers do not need regulatory approval to translate ML insights into investment decisions.

In the context of investment management, ML augments the quantitative work already done by security analysts in three ways:

1. ML can identify potentially outperforming equities by finding new patterns in existing data sets.

Prolific wants to challenge Amazon’s Mechanical Turk in the online research space

TechCrunch, Steve O'Hear

from December 04, 2019

Founded in 2014 by Ekaterina Damer and Phelim Bradley, doctoral students at Sheffield and Oxford universities, respectively, at the time, Prolific offers an online tool to easily recruit and pay research participants and conduct what it calls “ethical and trustworthy” research. The idea was born out of Damer’s own frustration with existing options, including Amazon’s Mechanical Turk (MTurk), when carrying out research for her own PhD.

Events

Southern Data Science Conference

Southern Data Science, LLC

from April 16, 2020

Atlanta, GA April 16-17, 2020. “If you are a data science enthusiast who is interested in learning from the experts and top researchers about the cutting-edge technologies and best practices in data science, don’t hesitate to register.” [$$$]

NYC UX + Data Meetup: Finding Inspiration in Historic Data Visualizations

Meetup, NYC UX + Data

from December 11, 2019

New York, NY December 11, starting at 6:45 p.m., General Assembly (902 Broadway, 4th Floor). Speaker: Jason Forrest. [$$]

Book Talk | The Code: Silicon Valley and the Remaking of America

New York University, Institute for Public Knowledge

from December 11, 2019

New York, NY December 11, starting at 6 p.m. “NYU’s Institute for Public Knowledge invites you to join for a book talk for The Code: Silicon Valley and the Remaking of America, featuring the author Margaret O’Mara in conversation with Meredith Broussard.” [rsvp required]

CONNECT: The Symposium of Wu Tsai Neuro and ChEM-H

Stanford ChEM-H, Wu Tsai Neurosciences Institute

from February 11, 2020

Stanford, CA February 11, starting at 8:30 a.m., Stanford University, Li Ka Shing Center and Paul Berg Hall. “Stanford ChEM-H and Wu Tsai Neurosciences Institute will host a joint scientific symposium on the occasion of the official dedication of the new ChEM-H and Neurosciences Buildings.” [registration required]

Deadlines

HHMI Investigator Program Opens National Competition

“The Howard Hughes Medical Institute (HHMI) is seeking to appoint approximately 20 new Investigators through a national open competition. This effort will expand HHMI’s community of nearly 300 Investigators, who perform basic biological research across the nation.” Deadline to apply is March 18, 2020.

Tools & Resources

SFU global collaboration creates world’s first open-source database of natural microbial products

Simon Fraser University, SFU News

from December 05, 2019

“Simon Fraser University associate professor Roger Linington and a team of international collaborators have created the Natural Products Atlas, which holds information on nearly 25,000 natural compounds and serves as a knowledge base and repository for the global scientific community.”

Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

Medium, Netflix Tech Blog; David Berg, Ravi Kiran Chirravuri, Romain Cledat, Savin Goyal, Ferras Hamad, Ville Tuulos

from December 03, 2019

“Data scientists want to retain their freedom to use arbitrary, idiomatic Python code to express their business logic — like they would do in a Jupyter notebook. However, they don’t want to spend too much time thinking about object hierarchies, packaging issues, or dealing with obscure APIs unrelated to their work. The infrastructure should allow them to exercise their freedom as data scientists but it should provide enough guardrails and scaffolding, so they don’t have to worry about software architecture too much.”

Better wildfire and smoke predictions with new vegetation database

University of Washington, UW News

from December 04, 2019

“Researchers from the University of Washington and Michigan Technological University have created the first comprehensive database of all the wildfire fuels that have been measured across North America. Called the North American Wildland Fuel Database, the tool incorporates the best available measurements of vegetation in specific locations, and allows fire managers to see where information about fuels is missing altogether.”

Careers

Internships and other temporary positions

Google Research Intern, 2020

Google; Research Lab locations across the U.S.

Postdocs

Post Doctoral Fellow in Data Science and Population Informatic

Texas A&M University, Population Informatics Lab; College Station, TX

Tenured and tenure track faculty positions

Tenure Track Software Engineering & Data Science Faculty (Open-Rank)

University of St. Thomas, School of Engineering; St. Paul, MN

Full-time positions outside academia

Data Scientist

American Civil Liberties Union, National Office; New York, NY

Sports.BradStenger.com

Data Science newsletter – December 6, 2019

Leave a Comment Cancel reply