Data Science newsletter – September 11, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 11, 2017

GROUP CURATION: N/A

 
 
Data Science News



Data Visualization of the Week

Twitter, Pew Research Center


from


Far from actual storms, UW scientists provide critical hurricane data

Milwaukee Journal Sentinel, Meg Jones


from

“It came off Africa and went west-northwest, but then oddly it dipped west-southwest,” said Derrick Herndon, associate researcher in the Tropical Cyclone Research Group at the UW-Madison Space Science and Engineering Center.

“That worried us because those storms are strong in intensity and they usually head toward the United States.”

As a result of their research, the [University of Wisconsin] Tropical Cyclone Research Group was the first to notice Irma — before it became a tropical storm, before it earned a nickname and before it became a Category 5 hurricane. The UW scientists have been analyzing its intensity ever since, just as they did earlier with Hurricane Harvey.


Schill announces interdisciplinary data science initiative

University of Oregon, Around the O


from

The University of Oregon is launching a new interdisciplinary initiative in data science and tapping one of its top researchers, biology Professor Bill Cresko, to lead its development.

President Michael Schill announced the initiative at the September meeting of the UO Board of Trustees.

The cross-disciplinary effort will bring together existing faculty members and recruit new ones across schools and colleges to create new educational, training and research programs at the UO and will be integral to the Knight Campus for Accelerating Scientific Impact. Over the long term, it could also lead to new degree programs for students.


Massive genetic study shows how humans are evolving

Nature News & Comment, Bruno Martin


from

“If a genetic variant influences survival, its frequency should change with the age of the surviving individuals,” says Hakhamanesh Mostafavi, an evolutionary biologist at Columbia University in New York City who led the study. People who carry a harmful genetic variant die at a higher rate, so the variant becomes rarer in the older portion of the population.

Mostafavi and his colleagues tested more than 8 million common mutations, and found two that seemed to become less prevalent with age. A variant of the APOE gene, which is strongly linked to Alzheimer’s disease, was rarely found in women over 70. And a mutation in the CHRNA3 gene associated with heavy smoking in men petered out in the population starting in middle age. People without these mutations have a survival edge and are more likely to live longer, the researchers suggest.


In the Future, Warehouse Robots Will Learn on Their Own

The New York Times, Cade Metz


from

The robot was perched over a bin filled with random objects, from a box of instant oatmeal to a small toy shark. This two-armed automaton did not recognize any of this stuff, but that did not matter. It reached into the pile and started picking things up, one after another after another.

“It figures out the best way to grab each object, right from the middle of the clutter,” said Jeff Mahler, one of the researchers developing the robot inside a lab at the University of California, Berkeley.

For the typical human, that is an easy task. For a robot, it is a remarkable talent — something that could drive significant changes inside some of the world’s biggest businesses and further shift the market for human labor.


Company Data Science News

At Amazon workers who used to lift and pack are already being replaced by robots to some degree. Some of the humans are now employed to babysit the robots, redirecting or resetting them when they run into trouble.



Craig Venter the billionaire scientist, published a paper in PNAS using reviewers recommended by the authors that Science had originally rejected, using a blind peer review system. Authors wrote that they could simulate an individual’s facial structure from their DNA using a little over 1,000 people/DNA samples. The paper has drawn controversy which mostly boils down to frustration that the paper overstates its findings. There is some agreement that their technique for assessing age is useful and somewhat novel (estimates were based on the length of telomeres, known to shorten with age).



Data & Society Research Institute has named a new executive director, Janet Haven who was at the Open Society Foundation for over a decade. I have worked with Ms. Haven and am delighted that she is taking the day-to-day lead at Data & Society. So is Data & Society’s President and Founder danah boyd.



Trunk Club is a company that attempts to send clothing to its members, clothing that the members will actually like enough to keep. If they don’t like what’s in the trunk, they can simply send the clothing back. What’s fascinating, from a data science perspective, is how they add social media likes and clicks to their prediction models. Goodness only knows I would be somewhat baffled if I had to draw inferences from my own social media stream compared to my actual wardrobe.



Apple launched a new iPhone that costs a lot, doesn’t have a headphone jack, and requires facial recognition which in turn required a neural net-enabling chip architecture.


U.S. Open: How IBM’s Watson Picks the Highlights

Fortune.com, Jeff John Roberts


from

Professional sports can deliver several jaw-dropping, jump-out-of-your-chair moments of pure excitement—but also lots of downtime. This makes compiling highlights a chore, especially at events like the ongoing U.S. Open, where TV producers sift through hundreds of hours of tennis play to find the best action.

Well, at least they used to.

This year, the task got much easier since the U.S. Open began relying on IBM’s artificial intelligence tool, Watson, to watch the games and pluck out highlights.

This involves using Watson to monitor games in order to detect evidence of excitement and assign those moments a score. The key indicators are crowd noise, player gestures (such as fist pumps or shouts of “come on”) and critical strategic moments in the match such as break points.


Updated: How to collect better data on government programs—and improve privacy, too

Science, ScienceInsider, Jeffrey Mervis


from

“This is impressive and important work you’ve done,” Representative Paul Ryan (R–WI), speaker of the House of Representatives, told members of the Commission on Evidence-Based Policymaking yesterday during brief remarks at the report’s unveiling on Capitol Hill in Washington, D.C. But there’s more to do, emphasized Senator Patty Murray (D–WA). “A report is only as good as the work that comes from it,” she said, adding that she and Ryan are crafting a bill “to turn several of the nearly two dozen recommendations into law, and to lay down a foundation for even more work to come.”

Murray said the pending legislation, dubbed the Foundations for Evidence-Based Policymaking Act, is focused on the report’s three core ideas: expanding access to the data, ensuring privacy, and strengthening the government’s capacity to evaluate how spending trillions of dollars every year on programs affects the health, education, and economic wellbeing of millions of Americans. Ryan said the bottom line for him is “changing our approach [to government] … to get the results we want and to improve people’s lives.”

 
Deadlines



Young Faculty Award 2018 Research Topics Announced

DARPA has 26 YFA topic areas for 2018. Executive summaries are due October 2, and full proposals are due December 4.
 
NYU Center for Data Science News



Awesome, our #NIPS2017 workshop on “Deep Learning for Physical Sciences” was accepted!

Kyle Cranmer, NIPS 2017


from

Long Beach, CA December 8, at NIPS 2017

 
Tools & Resources



DeepMoji: State-of-the-art emotional content analysis using deep learning models pretrained on emojis

GitHub – bfelbo


from

“DeepMoji is a model trained on 1.2 billion tweets with emojis to understand how language is used to express emotions. Through transfer learning the model can obtain state-of-the-art performance on many emotion-related text modeling tasks.”


Vagrant 2.0

HashiCorp


from

“Vagrant 2.0 supports provisioning development environments on VirtualBox, VMware, Hyper-V, Docker, AWS, GCP, and more. It can virtualize Windows, macOS, and a dozen other new operating systems. Vagrant 2.0 is paired with Vagrant Cloud for finding and consuming boxes. It has come a long way since Vagrant 1.0 when it only supported VirtualBox, and the community has grown significantly since then.”


AllenNLP – An open-source NLP research library, built on PyTorch

Allen Institute for Brain Science


from

“AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop.”


In search of the missing signals

giorgio patrini


from

An overview of current trends for feature learning in the unsupervised way: regress to random targets for manifold learning, exploit causality to characterize visual features, and in reinforcement learning, augment the objective with auxiliary control tasks and pre-train by self-play. There is so much to learn from unlabeled data and it seems that we have only skimmed the surface of it by only using labels.


Introducing Pytorch for fast.ai

fast.ai, Jeremy Howard


from

The next fast.ai courses will be based nearly entirely on a new framework we have developed, built on Pytorch. Pytorch is a different kind of deep learning library (dynamic, rather than static), which has been adopted by many (if not most) of the researchers that we most respect, and in a recent Kaggle competition was used by nearly all of the top 10 finishers.

We have spent around a thousand hours this year working with Pytorch to get to this point, and we are very excited about what it is allowing us to do. We will be writing a number of articles in the coming weeks talking about each aspect of this. First, we will start with a quick summary of the background to, and implications of, this decision.


Announcing Yarn 1.0

Facebook Code, Engineering Blog; Burak Yiğit Kaya, Christoph Nakazawa, Maël Nison


from

We are excited to announce the 1.0 release of the Yarn JavaScript package manager, a major step for the project. In the 11 months since its initial release, Yarn has generated a large following. Currently, there are more than 175,000 projects on GitHub with a yarn.lock file in their root directory. In use by many large and small companies, and across the open source community, Yarn is now responsible for nearly 3 billion package downloads per month. Reduced install times are a big draw for many users. Twitter, as well as Microsoft on outlook.com, have seen improvements of 5x in install times. Expo, Kenzan, and Sentry have reported performance and stability improvements so they can ship code faster. Additionally, Yarn now also comes pre-installed on major continuous integration platforms such as CircleCI, Travis CI, and AppVeyor.


How to conduct a good Programming Interview

Li Haoyi, Haoyi's Programming Blog


from

“This post will cover some principles and techniques you can use to make the most of the limited time you have together with a candidate, so you can make a firm decision whether or not to give the candidate a job offer to work with you in your company or on your team.”


The Incredible Growth of Python

Stack Overflow blog, David Robinson


from

In this post, we’ll explore the extraordinary growth of the Python programming language in the last five years, as seen by Stack Overflow traffic within high-income countries. The term “fastest-growing” can be hard to define precisely, but we make the case that Python has a solid claim to being the fastest-growing major programming language.

 
Careers


Full-time positions outside academia

Senior Data Analyst (Nonprofit/Education)



Philanthropy University; Oakland, CA

Data Scientist



REI; Kent, WA

Artificial Intelligence/FutureTech Investigative Reporter



The New York Times; New York, NY

Chief Operating Officer



Center For Open Science; Charlottesville, VA
Tenured and tenure track faculty positions

Assistant Professor in Environmental Data Science and Informatics



University of California-Santa Barbara, Earth Research Institute; Santa Barbara, CA
Full-time, non-tenured academic positions

Research Assistant – Decision Making



University College London, Institute of Cognitive Neuroscience; London, England

Leave a Comment

Your email address will not be published.