Expanding its relationship with the University of Vermont’s (UVM) Complex Systems Center, Massachusetts Mutual Life Insurance Company (MassMutual) announced today that it is providing $5 million in funding to further advance study and research in the field of data science and analytics.
The funding, to be provided over five years beginning in 2019, will include the establishment of the MassMutual Center of Excellence for Complex Systems and Data Science, which will initiate research projects and programs aimed at better understanding human wellness through data analytics, as well as programming to cultivate a strong pipeline of data science talent.
Over at Argonne, Nils Heinonen writes that four new projects for the ALCF Data Science Program (ADSP). These projects will utilize machine learning, deep learning, and other artificial intelligence methods to enable data-driven discoveries across scientific disciplines.
Howard University, Office of University Communications
from
Citi Ventures today announced an expansion of its Citi® University Partnerships in Innovation & Discovery (CUPID) program with Howard University School of Business to engage and embed students in innovation efforts across Citi.
Students of Howard’s 21st Century Advantage Program and the School of Business Executive Leadership Honors Program (SBEL) will be paired with Citi teams to work on emerging technology projects involving machine learning, natural language processing and big data analytics.
TheHill, Opinion, Debashis Ghosh, Karen Bandeen-Roche and Jason Roy
from
With the current administration still questioning the figures used to measure storm deaths from Maria, we must better understand how and why the number of storm attributed deaths increased so dramatically and to clarify how the reported death counts were calculated. The evidenced-based approach to estimating the deaths from natural disasters illustrates how mortality may occur. Deaths occur not only from the disaster itself, but also from breakdowns in infrastructure in the post-disaster period. These complete measurements of storm impacts are critically important for future disaster planning.
As the chairs of the North American biostatistics departments, we conducted a review of the two scientific reports on estimated deaths attributable to Hurricane Maria. We found the methodology used in both studies to be well-established and statistically rigorous. Both estimates well exceed the initial report of 64 deaths after considering the limits of precision and data quality. The available evidence supports the number of storm-attributable deaths estimated by the recent independent analyses.
The Department of Homeland Security (DHS) Science and Technology Directorate (S&T) has reorganized its research and development (R&D) structure to more rapidly transition technology capabilities into operations and respond to emerging threats.
William N. Bryan, the senior official performing the duties of the undersecretary for science and technology, DHS, explains the revitalized configuration enhances the focus on the needs of the DHS operational components and homeland security operators across all levels of government.
“We no longer have the luxury of time to do traditional R&D, so we must change if we are to get ahead of threats cycles and keep pace with rapid innovation,” he says. “We are improving our R&D business practices to make it easier for industry, including the start-up community, to work with us.”
The blog of the Stigler Center at the University of Chicago Booth School of Business, Luigi Zingales
from
Two years ago, the Stigler Center hosted a talk by David Plouffe, then chief advisor for Uber and a board member. In his speech, Plouffe promoted not only the convenience of the Uber model (most of us can attest to it), but also its environmental benefits: ride-sharing will reduce car circulation, congestion, and pollution. For the first time since the invention of the car, claimed Plouffe, urban planning was actually reducing parking space in city downtowns, because ride-sharing was reducing total demand for cars.
While high in rhetoric, his speech was short of actual data. If it was another kind of company, it would have been understandable—the phenomenon was so new that the data were not available yet—but Uber did have all the data. So why didn’t Plouffe use actual data to support his environmental claims?
I’ve spent the last six months starting Pictal Health, a company to help patients tell their health stories visually. During this time I’ve been piloting a service that includes a few steps: talking and listening to patients and getting their full story, preparing visuals based on what I heard (usually a timeline and sometimes a symptom diagram on a body shape,) and helping them feel prepared for their next doctor appointment.
On Sunday evening eastern time, the U.N. Intergovernmental Panel on Climate Change is scheduled to release its special report on the risks and benefits of limiting global warming to 1.5°C, or 2.7°F, above preindustrial levels.
Why it matters: The report is expected to contain sobering findings about how difficult it will be to meet the 1.5-degree target, which is an aspirational goal contained in the Paris Agreement on Climate Change. Every country in the world — except the U.S. — intends to honor the 2015 agreement, and the report will help inform negotiators in the next round of climate talks, set for December.
Most hikers on the Reflection Lakes trail have their cameras pointed at Mount Rainier; Karen Sy, however, had her back to the imposing mountain. Instead, she examined a patch of spindly, tufted plants that look like Dr. Seuss’ truffula trees. In any case, Rainier looked hazy; choking wildfire smoke had pushed air quality readings into the “unhealthy” range for days. Luckily, Sy had come prepared with an N-95 respirator mask, so she could concentrate on writing down her observations: Western anemones, in their fruiting stage — typical for mid-August.
She’d braved the smoky weather for MeadoWatch, a program that enlists volunteers to collect wildflower data on hikes at Mount Rainier National Park. Three volunteers at a different plot told me that they find the program rewarding, in part because it provides an opportunity to inject much-needed scientific data into political discussions about climate change. Rather than agonizing over struggling animal species and changing ecosystems, recording observations feels like a proactive step to stave off climate dread.
Recent years have seen an explosion in the creation of machine learning models for everything from self-driving cars to social media feeds. Despite the success of these models at perception and simple prediction, they have yet to have a larger impact on traditional enterprise computing and data processing applications.
Applying machine learning inside the enterprise is the ambition behind a new initiative launched yesterday at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) with industry collaborators Google, Intel and Microsoft.
Dubbed the “Data Systems and Artificial Intelligence Lab” (DSAIL), the effort will focus on investigating how machine learning can be used to improve the performance of data-processing systems.
New York, NY Thursday, November 15, starting at 6:30 p.m., Two Sigma (100 Avenue of the Americas). Speaker: Yixin Wang, a PhD candidate in the Statistics Department of Columbia
University. [rsvp required, open on 11/1]
Orlando, FL October 15-19. “a premier technical event in the global R&E community, convening the community’s technology visionaries—including chief technologists, scientists, engineers, architects, operators and students from around the U.S. and the globe.” [$$$$]
A team of researchers at The University of Texas at Austin has created CRN++, a new language for programming deterministic (mass-action) chemical kinetics in performing computations. In their paper, pre-published on arXiv, the researchers outline this new language and build a compiler that translates CRN++ programs into chemical reactions.
DXR is a toolkit for rapidly prototyping Data visualizations in XR (augmented, mixed, and virtual reality). A visualization in DXR is a collection of Unity game objects whose properties such as position, color, and size are mapped to data attributes. This mapping can be specified interactively at runtime via a graphical user interface (GUI) or via a high-level programming interface, inspired by Polestar and Vega-Lite, respectively. DXR is extensible, allowing the use of most Unity game objects for custom marks and channels. To learn more, check out the top navigation bar or the following links:
arXiv, Computer Science > Distributed, Parallel, and Cluster Computing; Thorsten Kurth et al.
from
We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.
Mixnode allows you to think of all the web pages, images, videos, PDF files, and other resources on the web as rows in a database table; a giant database table with trillions of rows that you can query using the standard Structured Query Language (SQL). So, rather than running web crawlers/scrapers you can write simple queries in a familiar language to retrieve all sorts of interesting information from this table of live data.
As it turns out, it is possible to perform meaningful analysis when you have an unbalanced data set. If the training and test sets have the same proportion of balls and strikes, the process behind the training stage and test stage does not change. What changes is the notion of how the “accuracy” and “success” of the results is defined. In particular, the conventional definition of “accuracy” (number of correct classifications divided by number of test examples) is now less useful. With an equal data set, the baseline accuracy is 50 percent because guessing randomly between ball and strike, or declaring all balls or all strikes for the test pitches, would yield an accuracy of 50 percent. Now, given that about 68 percent of the pitches in the data set are balls, I could declare all the test pitches to be balls and have myself a 68 percent accuracy. But this is a terrible idea!
Instead of relying upon basic accuracy, it is better to evaluate the system based on more advanced metrics such as F1 score and AUC (area under the curve). In my previous article, I calculated the F1 score, but it happened to be the case that it was almost equal to the basic accuracy. This time around, that won’t be true.