Suresh Muknahallipatna, a professor in the Department of Electrical and Computer Engineering at the University of Wyoming, led the successful completion of a software program that will allow for accurate weather forecasts up to 12 times faster than traditionally possible.
The Weather Co., a subsidiary of IBM, announced the global rollout of IBM GRAF (Global High-Resolution Atmospheric Forecasting System) Nov. 14.
Russia is to set up a new online site for its national encyclopedia after President Vladimir Putin said Wikipedia was unreliable and should be replaced.
I am witnessing a profound evolution of statistics as a contributing discipline to data science, from research paradigms to curriculum designs—albeit the latter is evolving at a pace too slow to meet the demand. In general, the more deeply we appreciate the impetus, impacts, and imprints of data science, the more acutely we become aware of the inwardness, incompleteness, and inaccuracies that entail when data science is framed merely as an academic field of knowledge. Adding to the evidence presented in our first issue, the 13 articles in this second issue further demonstrate the ecosystemic nature of data science.
The immersive collaboration among academia, government, and industry that is depicted in the interview about the Laboratory of Analytical Sciences at North Carolina State University provides a glimpse into the making and living of data science as a complex ecosystem. The word “immersive” captures the extent to which data science has evolved in multiple dimensions. The articles in this issue remind us that there are at least five such immersive “3D surroundings” in the data science ecosystem. They are listed here in the approximate ordering from organizational and structural surroundings to individual and epistemological ones.
Today, Amazon announced a new approach that it says will put machine learning technology in reach of more developers and line of business users. Amazon has been making a flurry of announcements ahead of its re:Invent customer conference next week in Las Vegas.
While the company offers plenty of tools for data scientists to build machine learning models and to process, store and visualize data, it wants to put that capability directly in the hands of developers with the help of the popular database query language, SQL.
By taking advantage of tools like Amazon QuickSight, Aurora and Athena in combination with SQL queries, developers can have much more direct access to machine learning models and underlying data without any additional coding, says VP of artificial intelligence at AWS, Matt Wood.
“This announcement is all about making it easier for developers to add machine learning predictions to their products and their processes by integrating those predictions directly with their databases,” Wood told TechCrunch.
In a new paper, AquaSat: a dataset to enable remote sensing of water quality for inland waters, a team led by Colorado State University Assistant Professor Matt Ross matched large public datasets of water quality observations with satellite imagery to address the challenges of measuring water quality efficiently and cost-effectively.
Purdue University will build a new $40 million facility dedicated to data science, designed to help the university meet its goal to be a national and global leader in the field of data science research and education for all students.
The flagship four-story building will be 86,000 square feet, featuring classroom and teaching space for all of campus alongside collaborative workspace for College of Science graduate researchers and faculty who form the core of the university’s strong and growing data science program. Purdue’s Board of Trustees is expected to vote Friday (Dec. 6) to advance the project.
Hanging Together – the OCLC Research blog, Brian Lavoie
from
As more and more data on research activity is collected, the range of questions that can be answered with it grows commensurately. A new service area is emerging around research analytics, but what is the role of the academic library? The OCLC Research Library Partnership (RLP)’s Research Support Interest Group recently welcomed Brian Mathews, Associate Dean for Innovation, Preservation, and Access, and David Scherer, Scholarly Communications Librarian and Research Curation Consultant, to lead a virtual discussion (actually two discussions, to accommodate RLP participants across many global time zones) on research analytics at Carnegie Mellon University, where the University Libraries recently deployed a new RIMS and is developing a research analytics service. Brian and David shared their experiences at Carnegie Mellon, which led to rich conversations and sharing of insights among participants from a number of RLP Partner institutions.
UNC Health Care and UNC School of Medicine, Newsroom
from
Inflammation is a hallmark of many health conditions, but quantifying how the underlying biology of inflammation contributes to specific diseases has been difficult. For the first time, UNC School of Medicine researchers and colleagues now report the development of a new technology to identify white blood cells called neutrophils that are primed to eject inflammatory DNA into the circulation via a process called NETosis.
The findings, published in Scientific Reports, mark the first time scientists have used machine learning tools for rapid quantitative and qualitative cell analysis in basic science.
Starbucks is partnering with Microsoft (NASDAQ:MSFT) in using data to heighten customer experience. Starbucks chief technology officer Gerri Martin-Flickinger said: “As an engineering and technology organization, one of the areas we are incredibly excited to be pursuing is using data to continuously improve the experience for our customers and partners.”
It’s interesting that she calls her company, which produces coffee, an engineering and technology organization; it says a lot about the way management thinks. But Martin-Flickinger also points out that the goal of technological innovation is a personal one: When the baseline is taken care of, the employee has a greater opportunity to connect with the customer.
How does this all work? When customers use the mobile app, the company collects data about their preferences. Combining this with knowledge of Starbucks shops in the area, local popular drinks, the weather, and other factors, it can offer recommendations for both products and pairings.
In deep learning, a subset of a type of artificial intelligence called machine learning, computer models essentially teach themselves to make predictions from large sets of data. The raw power of the technology has improved dramatically in recent years, and it’s now used in everything from medical diagnostics to online shopping to autonomous vehicles.
But deep learning tools also raise worrying questions because they solve problems in ways that humans can’t always follow. If the connection between the data you feed into the model and the output it delivers is inscrutable — hidden inside a so-called black box — how can it be trusted? Among researchers, there’s a growing call to clarify how deep learning tools make decisions — and a debate over what such interpretability might demand and when it’s truly needed. The stakes are particularly high in medicine, where lives will be on the line.
For the first time, a group of scientists — Zeke Hausfather of UC Berkeley, Henri Drake and Tristan Abbott of MIT, and Gavin Schmidt of the NASA Goddard Institute for Space Studies — has done a systematic review of climate models, dating back to the late 1970s. Published in Geophysical Research Letters, it tests model performance against a simple metric: how well they predicted global mean surface temperature (GMST) through 2017, when the latest observational data is available.
A new paper from Hausfather and colleagues (incl. me) has just been published with the most comprehensive assessment of climate model projections since the 1970s. Bottom line? Once you correct for small errors in the projected forcings, they did remarkably well.
Climate models are a core part of our understanding of our future climate. They also have been frequently attacked by those dismissive of climate change, who argue that since climate models are inevitably approximations they have no predictive power, or indeed, that they aren’t even scientific.
In an upcoming paper in Geophysical Research Letters, Zeke Hausfather, Henri Drake, Tristan Abbott and I took a look at how well climate models have actually been able to accurately project warming in the years after they were published. This is an extension of the comparisons we have been making on RealClimate for many years, but with a broader scope and a deeper analysis. We gathered all the climate models published between 1970 and the mid-2000s that gave projections of both future warming and future concentrations of CO2 and other climate forcings – from Manabe (1970) and Mitchell (1970) through to CMIP3 in IPCC 2007.
We found that climate models – even those published back in the 1970s – did remarkably well, with 14 out of the 17 projections statistically indistinguishable from what actually occurred.
Harvard Business Review, Robert C. Pozen and Jonathan Ruane
from
ML is particularly adaptable to securities investing because the insights it garners can be acted on quickly and efficiently. By contrast, when ML generates new insights in other sectors, firms must overcome substantial constraints before putting those insights into action. For example, when Google develops a self-driving car powered by ML, it must gain approval from an array of stakeholders before that car can hit the road. These stakeholders include federal regulators, auto insurers, and local governments where these self-driving cars would operate. Portfolio managers do not need regulatory approval to translate ML insights into investment decisions.
In the context of investment management, ML augments the quantitative work already done by security analysts in three ways:
1. ML can identify potentially outperforming equities by finding new patterns in existing data sets.
Founded in 2014 by Ekaterina Damer and Phelim Bradley, doctoral students at Sheffield and Oxford universities, respectively, at the time, Prolific offers an online tool to easily recruit and pay research participants and conduct what it calls “ethical and trustworthy” research. The idea was born out of Damer’s own frustration with existing options, including Amazon’s Mechanical Turk (MTurk), when carrying out research for her own PhD.
Atlanta, GA April 16-17, 2020. “If you are a data science enthusiast who is interested in learning from the experts and top researchers about the cutting-edge technologies and best practices in data science, don’t hesitate to register.” [$$$]
New York University, Institute for Public Knowledge
from
New York, NY December 11, starting at 6 p.m. “NYU’s Institute for Public Knowledge invites you to join for a book talk for The Code: Silicon Valley and the Remaking of America, featuring the author Margaret O’Mara in conversation with Meredith Broussard.” [rsvp required]
Stanford, CA February 11, starting at 8:30 a.m., Stanford University, Li Ka Shing Center and Paul Berg Hall. “Stanford ChEM-H and Wu Tsai Neurosciences Institute will host a joint scientific symposium on the occasion of the official dedication of the new ChEM-H and Neurosciences Buildings.” [registration required]
“The Howard Hughes Medical Institute (HHMI) is seeking to appoint approximately 20 new Investigators through a national open competition. This effort will expand HHMI’s community of nearly 300 Investigators, who perform basic biological research across the nation.” Deadline to apply is March 18, 2020.
“Simon Fraser University associate professor Roger Linington and a team of international collaborators have created the Natural Products Atlas, which holds information on nearly 25,000 natural compounds and serves as a knowledge base and repository for the global scientific community.”
Medium, Netflix Tech Blog; David Berg, Ravi Kiran Chirravuri, Romain Cledat, Savin Goyal, Ferras Hamad, Ville Tuulos
from
“Data scientists want to retain their freedom to use arbitrary, idiomatic Python code to express their business logic — like they would do in a Jupyter notebook. However, they don’t want to spend too much time thinking about object hierarchies, packaging issues, or dealing with obscure APIs unrelated to their work. The infrastructure should allow them to exercise their freedom as data scientists but it should provide enough guardrails and scaffolding, so they don’t have to worry about software architecture too much.”
“Researchers from the University of Washington and Michigan Technological University have created the first comprehensive database of all the wildfire fuels that have been measured across North America. Called the North American Wildland Fuel Database, the tool incorporates the best available measurements of vegetation in specific locations, and allows fire managers to see where information about fuels is missing altogether.”