Data Science newsletter – December 14, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for December 14, 2018

GROUP CURATION: N/A

 
 
Data Science News



‘Universal fingerprint’ in cancer DNA points to potential blood test. But it’s not ready for patients

Cancer Research UK


from

Professor Paul Pharoah, from the University of Cambridge, urges caution, saying it’s unknown if the innovative idea will be useful in a clinical setting, as some news reports suggested.

That’s because using a blood test to confirm that people with advanced cancer have cancer is very different to using that same test to detect the earliest hints of cancer in otherwise healthy people.


Scientists Don’t Stay for Long in Their Jobs Anymore: Study

The Scientist Magazine®, Ashley P. Taylor


from

These days, about half of scientists who enter a scientific discipline go on to drop out after five years, according to a study published yesterday (December 10) in PNAS. In the 1960s, that half-life for scientists was seven times longer—35 years before they moved on to other professions.

“Entering graduate students should be aware of this, so that they would have realistic expectations and perhaps try to plan their lives accordingly,” study coauthor Staša Milojevic, associate professor of informatics at Indiana University, tells Inside Higher Ed.


Artificial intelligence helps predict volcanic eruptions

Science, Paul Voosen


from

Satellites are providing torrents of data about the world’s active volcanoes, but researchers have struggled to turn them into a global prediction of volcanic risks. That may soon change with newly developed algorithms that can automatically tease from that data signals of volcanic risk, raising the prospect that within a couple years scientists could develop a global volcano warning system.

Without such tools, geoscientists simply can’t keep up with information pouring out the satellites, says Michael Poland, the scientist-in-charge of the U.S. Geological Survey’s Yellowstone Volcano Observatory in Vancouver, Washington, who was not involved in either study. “The volume of data is overwhelming,” he says.


A monitoring network in the Amazon captures a flood of data

Mongabay, Elizabeth Oliveira


from

Cameras and microphones are capturing images and sounds of the world’s largest rainforest to monitor the Amazon’s species and environmental dynamics in an unprecedented way.

The Providence Project’s series of networked sensors is aimed at complementing remote-sensing data on forest cover change by revealing ecological interactions beneath the forest canopy.


Changing demographics of scientific careers: The rise of the temporary workforce

Proceedings of the National Academy of Sciences; Staša Milojević, Filippo Radicchi, and John P. Walsh


from

Contemporary science has been characterized by an exponential growth in publications and a rise of team science. At the same time, there has been an increase in the number of awarded PhD degrees, which has not been accompanied by a similar expansion in the number of academic positions. In such a competitive environment, an important measure of academic success is the ability to maintain a long active career in science. In this paper, we study workforce trends in three scientific disciplines over half a century. We find dramatic shortening of careers of scientists across all three disciplines. The time over which half of the cohort has left the field has shortened from 35 y in the 1960s to only 5 y in the 2010s. In addition, we find a rapid rise (from 25 to 60% since the 1960s) of a group of scientists who spend their entire career only as supporting authors without having led a publication. Altogether, the fraction of entering researchers who achieve full careers has diminished, while the class of temporary scientists has escalated. We provide an interpretation of our empirical results in terms of a survival model from which we infer potential factors of success in scientific career survivability. Cohort attrition can be successfully modeled by a relatively simple hazard probability function. Although we find statistically significant trends between survivability and an author’s early productivity, neither productivity nor the citation impact of early work or the level of initial collaboration can serve as a reliable predictor of ultimate survivability. [full text]


Vanderbilt Data Science Institute launches new master of science program

Vanderbilt University, my VU


from

Vanderbilt’s Data Science Institute, founded earlier this year, has launched a new master of science program to prepare students for careers in the burgeoning field of data science. Applications for the 2019-20 academic year are now open to students.


Amid Decline in Popularity, Harvard Government Department Turns to New Data and Tech Programs

The Harvard Crimson, Sonia F. Epstein and Sam E. Sharfstein


from

Harvard’s Government department hopes that two new tracks — in Tech Science and Data Science — will not only widen the scope of what affiliated students study, but offer an antidote to the field’s diminished popularity.


Edward Norton’s start-up EDO wants to bring data science to TV

CNBC, Michelle Castillo


from

Norton has teamed with poet Daniel Nadler and one of Facebook’s first investors on a project designed to measure in real time the effectiveness of TV ads. The company is called EDO, which stands for Entertainment Data Oracle, and its technology is already being used by ESPN, Turner, CNBC owner NBCUniversal and Paramount, providing insights into which ads viewers like or what parts of movies audiences love.


Will Democrats revive the Office of Technology Assessment in 2019?

FCW, Chase Gunter


from

Many OTA backers imagine they can get the votes for a revival of the office in some form in 2019.

Takano, who serves on the House Science, Space and Technology Committee, said he was “optimistic” about the chances next year, pointing out the “dire need for access to unbiased technology experts to move good and effective legislation forward.”

Rep. Ted Lieu, a California Democrat who was just elected to leadership as the co-chairman of the Democratic Policy and Communications Committee, said he wants to bring back OTA. Lieu, who has a computer science degree, pointed to cybersecurity shortcomings in both the private and public sectors, aerospace and medicine as areas of Democratic priority ­– and areas OTA will help inform lawmakers.


Effective approach to epidemic containment using link equations in complex networks

Science Advances; Joan T. Matamalas, Alex Arenas and Sergio Gómez


from

Epidemic containment is a major concern when confronting large-scale infections in complex networks. Many studies have been devoted to analytically understand how to restructure the network to minimize the impact of major outbreaks of infections at large scale. In many cases, the strategies are based on isolating certain nodes, while less attention has been paid to interventions on the links. In epidemic spreading, links inform about the probability of carrying the contagion of the disease from infected to susceptible individuals. Note that these states depend on the full structure of the network, and its determination is not straightforward from the knowledge of nodes’ states. Here, we confront this challenge and propose a set of discrete-time governing equations that can be closed and analyzed, assessing the contribution of links to spreading processes in complex networks. Our approach allows a scheme for the containment of epidemics based on deactivating the most important links in transmitting the disease. The model is validated in synthetic and real networks, yielding an accurate determination of epidemic incidence and critical thresholds. Epidemic containment based on link deactivation promises to be an effective tool to maintain functionality of networks while controlling the spread of diseases, such as disease spread through air transportation networks. [full text]


How the IRS Was Gutted

ProPublica, Paul Kiel and Jesse Eisinger


from

An eight-year campaign to slash the agency’s budget has left it understaffed, hamstrung and operating with archaic equipment. The result: billions less to fund the government. That’s good news for corporations and the wealthy.


The principles of tomorrow’s university – F1000Research

F1000 Research, Daniel S. Katz et al.


from

In the 21st Century, research is increasingly data- and computation-driven. Researchers, funders, and the larger community today emphasize the traits of openness and reproducibility. In March 2017, 13 mostly early-career research leaders who are building their careers around these traits came together with ten university leaders (presidents, vice presidents, and vice provosts), representatives from four funding agencies, and eleven organizers and other stakeholders in an NIH- and NSF-funded one-day, invitation-only workshop titled “Imagining Tomorrow’s University.” Workshop attendees were charged with launching a new dialog around open research – the current status, opportunities for advancement, and challenges that limit sharing.
The workshop examined how the internet-enabled research world has changed, and how universities need to change to adapt commensurately, aiming to understand how universities can and should make themselves competitive and attract the best students, staff, and faculty in this new world. During the workshop, the participants re-imagined scholarship, education, and institutions for an open, networked era, to uncover new opportunities for universities to create value and serve society. They expressed the results of these deliberations as a set of 22 principles of tomorrow’s university across six areas: credit and attribution, communities, outreach and engagement, education, preservation and reproducibility, and technologies.


How “Kaggle Survey Challenge 2018” Concluded ?

Kaggle, Shivam Bansal


from

Welcome to Kaggle’s second annual Machine Learning and Data Science Survey ― and our first-ever survey data challenge.

This year, as last year, we set out to conduct an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live for one week in October, and after cleaning the data we finished with 23,859 responses, a 49% increase over last year!

There’s a lot to explore here.


AlphaFold @ CASP13: “What just happened?”

Mohammed AlQuraishi


from

I just came back from CASP13, the biennial assessment of protein structure prediction methods (I previously blogged about CASP10.) I participated in a panel on deep learning methods in protein structure prediction, as well as a predictor (more on that later.) If you keep tabs on science news, you may have heard that DeepMind’s debut went rather well. So well in fact that not only did they take first place, but put a comfortable distance between them and the second place predictor (the Zhang group) in the free modeling (FM) category, which focuses on modeling novel protein folds. Is the news real or overhyped? What is AlphaFold’s key methodological advance, and does it represent a fundamentally new approach? Is DeepMind forthcoming in sharing the details? And what was the community’s reaction? I will summarize my thoughts on these questions and more below. At the end I will also briefly discuss how RGNs, my end-to-end differentiable model for structure prediction, did on CASP13.

“What just happened?” was a question put to me in exactly these words by at least one researcher at CASP, and a sentiment expressed by most academics I spoke with. As one myself, I shared it going in and throughout the meeting. In fact I went into CASP13 feeling melancholy (the raw results were out two days prior), although my mood lifted during the meeting due to the general excitement and quality of discussions, and as my tribal reflexes gave way to a cooler and more rational assessment of the value of scientific progress.


Exclusive CEO Interview: Satya Nadella Reveals How Microsoft Got Its Groove Back

Forbes, Alex Konrad


from

After watching Nadella lead the Redmond, Washington-based giant for two years, GitHub made a surprise move, choosing Microsoft over Google as its acquirer this past June.

It was the latest coup for Nadella, 51, who’s breaking free of Microsoft’s recent past by returning it to its roots under cofounder Bill Gates.

“Bill used to teach me, ‘Every dollar we make, there’s got to be five dollars, ten dollars on the outside,’ ” Nadella tells Forbes, in his first sit-down interview since the $7.5 billion deal closed.

 
Deadlines



2nd Deep Learning and Security Workshop

San Francisco, CA May 23, 2019, co-located with the 40th IEEE Symposium on Security and Privacy. Deadline for submitting papers is December 21.

Syngenta Crop Challenge

“The Intersection of Mathematics,
Agriculture and Big Data – Enter the Syngenta Crop Challenge and put your analytical skills to the test for a chance to win up to $5,000.

Young Researchers: Apply for Seventh Heidelberg Laureate Forum, September 22-27, 2019

“ACM encourages young computer scientists and mathematicians from all over the world to apply for one of the 200 coveted spots to participate in the Heidelberg Laureate Forum (HLF), an annual event. The HLF offers all accepted young researchers the great opportunity to personally meet the winners of the most prestigious prizes in their fields.” Deadline to apply is January 25, 2019.

Call for Papers – 1st Workshop on Fairness, Accountability, Transparency, Ethics, and Society on the Web

San Francisco, CA May 13-14, 2019. “The FATES on the Web 2019 (Fairness, Accountability, Transparency, Ethics, and Society on the Web) is the first edition of a workshop to bring together researchers and enthusiasts concerned with the urgent challenges concerning algorithmic fairness and accountability, transparency, and ethics on data management and social interaction on the web.” Co-located with The Web Conference 2019. Deadline for submissions is January 25, 2019.
 
Tools & Resources



Our learnings from adopting GraphQL

Medium, Netflix TechBlog, Artem Shtatnov and Ravi Srinivas Ranganathan


from

In an earlier blog post, we provided a high-level overview of some of the applications in the Marketing Technology team that we build to enable scale and intelligence in driving our global advertising, which reaches users on sites like The New York Times, Youtube, and thousands of others. In this post, we’ll share our journey in updating our front-end architecture and our learnings in introducing GraphQL into the Marketing Tech system.

Our primary application for managing the creation and assembly of ads that reach the external publishing platforms is internally dubbed Monet. It’s used to supercharge ad creation and automate management of marketing campaigns on external ad platforms.


NeurIPS 2018 video talk collection

Videoken


from

10 videos of full-length keynotes from NeurIPS


About The Lens » Academic metasearch engine The Lens: solving the problem of solving problems

Cambia and Queensland University of Technology


from

The Lens is building an open platform for Innovation Cartography . Specifically, the Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow document collections, aggregations, and analyzes to be shared, annotated, and embedded to open the mapping of the world of knowledge-directed innovation. Ultimately, this will restore the role of the patent system as a teaching resource to inspire and inform entrepreneurs, citizens and policy makers.

Within the next two years, we expect to host over 95% of the world’s patent information and link to most of the scholarly literature, creating open public innovation portfolios of individuals and institutions. Using all open source components, we are working to create open schemas by which patent documents can be used to teach and communicate, rather than confuse and intimidate .


Scaling Multi-Agent Reinforcement Learning

The Berkeley Artificial Intelligence Research Blog, Eric Liang and Richard Liaw


from

We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and how we designed for it in RLlib. Our goal is to enable multi-agent RL across a range of use cases, from leveraging existing single-agent algorithms to training with custom algorithms at large scale.


Etsy’s experiment with immutable documentation

Code as Craft blog, Paul-Jean Letourneau


from

Etsy has recognized that technical documentation is a mixture of two distinct types: a narrative that explains why a system exists (“why-docs”), and operational details that describe how to use the system (“how-docs”). In trying to overcome the problem of staleness, the crucial observation is that how-docs typically change faster than why-docs do. Therefore the more how-docs are mixed in with why-docs in a doc page, the more likely the page is to go stale.

We’ve leveraged this observation by creating an entirely separate system to hold our how-docs. The FYI system simply allows us to save Slack messages to a persistent data store.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor



University of North Carolina, Shuford Program in Entrepreneurship; Chapel Hill, NC
Internships and other temporary positions

Health Informatics Intern



Verily; Cambridge, MA

Internship | 2019-2020



TotalSim; Brackley, England
Full-time positions outside academia

Data Scientist (machine learning)



The New York Times; New York, NY

Data Scientist – Merch Inventory Algorithms



Stitch Fix; San Francisco, CA
Postdocs

Full-time Postdoctoral Researcher



Centre for East European and International Studies (ZOiS); Berlin, Germany

Leave a Comment

Your email address will not be published.