Data Science newsletter – October 9, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for October 9, 2018

GROUP CURATION: N/A

 
 
Data Science News



MassMutual and UVM Expand Groundbreaking Data Science Partnership

University of Vermont, UVM Today


from

Expanding its relationship with the University of Vermont’s (UVM) Complex Systems Center, Massachusetts Mutual Life Insurance Company (MassMutual) announced today that it is providing $5 million in funding to further advance study and research in the field of data science and analytics.

The funding, to be provided over five years beginning in 2019, will include the establishment of the MassMutual Center of Excellence for Complex Systems and Data Science, which will initiate research projects and programs aimed at better understanding human wellness through data analytics, as well as programming to cultivate a strong pipeline of data science talent.


Data Science Program at Argonne Looks to Machine Learning for New Breakthroughs

insideHPC


from

Over at Argonne, Nils Heinonen writes that four new projects for the ALCF Data Science Program (ADSP). These projects will utilize machine learning, deep learning, and other artificial intelligence methods to enable data-driven discoveries across scientific disciplines.


Citi Ventures Partners with Howard University to Provide Students a Look Inside Innovation at Citi

Howard University, Office of University Communications


from

Citi Ventures today announced an expansion of its Citi® University Partnerships in Innovation & Discovery (CUPID) program with Howard University School of Business to engage and embed students in innovation efforts across Citi.

Students of Howard’s 21st Century Advantage Program and the School of Business Executive Leadership Honors Program (SBEL) will be paired with Citi teams to work on emerging technology projects involving machine learning, natural language processing and big data analytics.


The introductory data science course at @Cal is Data 8. The course is so popular that it’s in Zellerbach Hall. Fall semester 2018, Day 1.

Twitter, Mike Olson


from


Statistics can save lives

TheHill, Opinion, Debashis Ghosh, Karen Bandeen-Roche and Jason Roy


from

With the current administration still questioning the figures used to measure storm deaths from Maria, we must better understand how and why the number of storm attributed deaths increased so dramatically and to clarify how the reported death counts were calculated. The evidenced-based approach to estimating the deaths from natural disasters illustrates how mortality may occur. Deaths occur not only from the disaster itself, but also from breakdowns in infrastructure in the post-disaster period. These complete measurements of storm impacts are critically important for future disaster planning.

As the chairs of the North American biostatistics departments, we conducted a review of the two scientific reports on estimated deaths attributable to Hurricane Maria. We found the methodology used in both studies to be well-established and statistically rigorous. Both estimates well exceed the initial report of 64 deaths after considering the limits of precision and data quality. The available evidence supports the number of storm-attributable deaths estimated by the recent independent analyses.


Directorate Reorganizes for Efficiency

SIGNAL Magazine, Maryann Lawlor


from

The Department of Homeland Security (DHS) Science and Technology Directorate (S&T) has reorganized its research and development (R&D) structure to more rapidly transition technology capabilities into operations and respond to emerging threats.

William N. Bryan, the senior official performing the duties of the undersecretary for science and technology, DHS, explains the revitalized configuration enhances the focus on the needs of the DHS operational components and homeland security operators across all levels of government.

“We no longer have the luxury of time to do traditional R&D, so we must change if we are to get ahead of threats cycles and keep pace with rapid innovation,” he says. “We are improving our R&D business practices to make it easier for industry, including the start-up community, to work with us.”


MacArthur Fellows Program

MacArthur Foundation


from

includes Deborah Estrin from Cornell


Does Uber Kill? The Real Cost of Ride-sharing

The blog of the Stigler Center at the University of Chicago Booth School of Business, Luigi Zingales


from

Two years ago, the Stigler Center hosted a talk by David Plouffe, then chief advisor for Uber and a board member. In his speech, Plouffe promoted not only the convenience of the Uber model (most of us can attest to it), but also its environmental benefits: ride-sharing will reduce car circulation, congestion, and pollution. For the first time since the invention of the car, claimed Plouffe, urban planning was actually reducing parking space in city downtowns, because ride-sharing was reducing total demand for cars.

While high in rhetoric, his speech was short of actual data. If it was another kind of company, it would have been understandable—the phenomenon was so new that the data were not available yet—but Uber did have all the data. So why didn’t Plouffe use actual data to support his environmental claims?


Election security is a mess, and the cleanup won’t arrive by the midterms

CNET, Alfred Ng


from

Many experts believe a proper fix for election security won’t come until at least 2021.


Being seen, feeling heard – Reflections from working with patients to visualize their health

Medium, Pictal Health, Katie McCurdy


from

I’ve spent the last six months starting Pictal Health, a company to help patients tell their health stories visually. During this time I’ve been piloting a service that includes a few steps: talking and listening to patients and getting their full story, preparing visuals based on what I heard (usually a timeline and sometimes a symptom diagram on a body shape,) and helping them feel prepared for their next doctor appointment.


New IPCC special report faces stark emissions reality

Axios, Andrew Freedman


from

On Sunday evening eastern time, the U.N. Intergovernmental Panel on Climate Change is scheduled to release its special report on the risks and benefits of limiting global warming to 1.5°C, or 2.7°F, above preindustrial levels.

Why it matters: The report is expected to contain sobering findings about how difficult it will be to meet the 1.5-degree target, which is an aspirational goal contained in the Paris Agreement on Climate Change. Every country in the world — except the U.S. — intends to honor the 2015 agreement, and the report will help inform negotiators in the next round of climate talks, set for December.


The untold story of Stripe, the secretive $20bn startup driving Apple, Amazon and Facebook

Wired UK, Stephen Armstrong


from

Patrick and John Collison have democratised online payments – and reshaped the digital economy in the process


Volunteer scientists study flowers to battle climate dread

High Country News, James C. Hu


from

Most hikers on the Reflection Lakes trail have their cameras pointed at Mount Rainier; Karen Sy, however, had her back to the imposing mountain. Instead, she examined a patch of spindly, tufted plants that look like Dr. Seuss’ truffula trees. In any case, Rainier looked hazy; choking wildfire smoke had pushed air quality readings into the “unhealthy” range for days. Luckily, Sy had come prepared with an N-95 respirator mask, so she could concentrate on writing down her observations: Western anemones, in their fruiting stage — typical for mid-August.

She’d braved the smoky weather for MeadoWatch, a program that enlists volunteers to collect wildflower data on hikes at Mount Rainier National Park. Three volunteers at a different plot told me that they find the program rewarding, in part because it provides an opportunity to inject much-needed scientific data into political discussions about climate change. Rather than agonizing over struggling animal species and changing ecosystems, recording observations feels like a proactive step to stave off climate dread.


Google, Intel and Microsoft team up w/CSAIL on new data-driven initiative

MIT CSAIL


from

Recent years have seen an explosion in the creation of machine learning models for everything from self-driving cars to social media feeds. Despite the success of these models at perception and simple prediction, they have yet to have a larger impact on traditional enterprise computing and data processing applications.

Applying machine learning inside the enterprise is the ambition behind a new initiative launched yesterday at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) with industry collaborators Google, Intel and Microsoft.

Dubbed the “Data Systems and Artificial Intelligence Lab” (DSAIL), the effort will focus on investigating how machine learning can be used to improve the performance of data-processing systems.

 
Events



Causal inference in machine learning with Columbia University

Meetup, NYC Artificial Intelligence & Machine Learning


from

New York, NY Thursday, November 15, starting at 6:30 p.m., Two Sigma (100 Avenue of the Americas). Speaker: Yixin Wang, a PhD candidate in the Statistics Department of Columbia
University. [rsvp required, open on 11/1]


Blockchain in Journalism: Promise and Practice

by The Tow Center for Digital Journalism & Brown Institute for Media Innovation


from

New York, NY Friday, October 19, starting at 1 p.m., Columbia Journalism School (2950 Broadway, Pulitzer Hall). [free, registration required]


Jupyter Day in the Triangle

Project Jupyter


from

Chapel Hill, NC November 13, starting at 8 a.m., Carolina Club (150 Stadium Drive). [$$]


UW eScience Institute Guest Seminar with Greg Wilson

UW eScience Institute


from

Seattle, WA October 16, starting at 12 noon, eScience Institute (3910 15th Ave NE). Speaker: Greg Wilson, co-founder of Software Carpentry. [free]


Internet2 Technology Exchange

Internet2


from

Orlando, FL October 15-19. “a premier technical event in the global R&E community, convening the community’s technology visionaries—including chief technologists, scientists, engineers, architects, operators and students from around the U.S. and the globe.” [$$$$]

 
Deadlines



Towards AI for Collaborative Open Science (TACOS-19)

Stanford, CA March 25-27 at Stanford University. Deadline for submissions is November 2.
 
Tools & Resources



Comet

Comet


from

Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects.


A new molecular programming language: CRN++

Tech Xplore, Ingrid Fadelli


from

A team of researchers at The University of Texas at Austin has created CRN++, a new language for programming deterministic (mass-action) chemical kinetics in performing computations. In their paper, pre-published on arXiv, the researchers outline this new language and build a compiler that translates CRN++ programs into chemical reactions.


DXR, An Immersive Visualization Toolkit

Ronell Sicat and Jiabao Li with Hanspeter Pfister


from

DXR is a toolkit for rapidly prototyping Data visualizations in XR (augmented, mixed, and virtual reality). A visualization in DXR is a collection of Unity game objects whose properties such as position, color, and size are mapped to data attributes. This mapping can be specified interactively at runtime via a graphical user interface (GUI) or via a high-level programming interface, inspired by Polestar and Vega-Lite, respectively. DXR is extensible, allowing the use of most Unity game objects for custom marks and channels. To learn more, check out the top navigation bar or the following links:


Exascale Deep Learning for Climate Analytics

arXiv, Computer Science > Distributed, Parallel, and Cluster Computing; Thorsten Kurth et al.


from

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.


Turn the web into a database: An alternative to web crawling/scraping

Mixnode News Blog, Nariman Jelveh


from

Mixnode allows you to think of all the web pages, images, videos, PDF files, and other resources on the web as rows in a database table; a giant database table with trillions of rows that you can query using the standard Structured Query Language (SQL). So, rather than running web crawlers/scrapers you can write simple queries in a familiar language to retrieve all sorts of interesting information from this table of live data.


Building a Robot Umpire with Deep Learning Video Analysis (Part Two)

The Hardball Times, Roger Cheng


from

As it turns out, it is possible to perform meaningful analysis when you have an unbalanced data set. If the training and test sets have the same proportion of balls and strikes, the process behind the training stage and test stage does not change. What changes is the notion of how the “accuracy” and “success” of the results is defined. In particular, the conventional definition of “accuracy” (number of correct classifications divided by number of test examples) is now less useful. With an equal data set, the baseline accuracy is 50 percent because guessing randomly between ball and strike, or declaring all balls or all strikes for the test pitches, would yield an accuracy of 50 percent. Now, given that about 68 percent of the pitches in the data set are balls, I could declare all the test pitches to be balls and have myself a 68 percent accuracy. But this is a terrible idea!

Instead of relying upon basic accuracy, it is better to evaluate the system based on more advanced metrics such as F1 score and AUC (area under the curve). In my previous article, I calculated the F1 score, but it happened to be the case that it was almost equal to the basic accuracy. This time around, that won’t be true.

Leave a Comment

Your email address will not be published.