Data Science newsletter – October 18, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for October 18, 2018

GROUP CURATION: N/A

Data Science News

University of California-Santa Barbara, The UCSB Current

from October 05, 2018

UC Santa Barbara will collaborate on a Department of Energy research center adapting machine learning for use in scientific research

What Emails Reveal About Your Performance At Work

Josh Bersin

from October 12, 2018

Genpact, one of the largest and most successful global professional services firm (78,000 employees generating over $2.6 Billion in revenue), has figured this out.

After analyzing months of communication patterns using messaging metadata (data about the messages, not the messages themselves), the company can now statistically prove that certain types of communication behavior directly correlates to business performance. In fact, using employee communication data with a Deep Learning Model, Genpact can predict “Rockstar” performers with 74% accuracy. (This process works for emails, slack messages, skype messages, etc.)

So yes, your communication patterns are strongly correlated with your performance. What did we learn?

Project Information Literacy – News Study

Project Information Literacy

from October 16, 2018

The News Study research report presents findings about how a sample of U.S. college students gather information and engage with news in the digital age. Results are included from an online survey of 5,844 respondents and telephone interviews with 37 participants from 11 U.S. colleges and universities selected for their regional, demographic, and red/blue state diversity. A computational analysis was conducted using Twitter data associated with the survey respondents and a Twitter panel of 135,891 college-age people. Six recommendations are included for educators, journalists, and librarians working to make students effective news consumers. To explore the implications of this study’s findings, concise commentaries from leading thinkers in education, libraries, media research, and journalism are included.

Alexa, does AI have gender?

University of Oxford, Research

from October 15, 2018

Professor Gina Neff has been asking questions about bias and balance of power in the development of artificial intelligence (AI) systems. She talks to Ruth Abrahams about the challenges we face in marrying futuristic solutions with values of trust, openness and equality.

UW researchers are changing the way we use satellite observations of Earth

University of Washington, eScience Institute

from October 17, 2018

The University of Washington, along with collaborators from the National Center for Atmospheric Research (NCAR), Anaconda, and Element84 have just been awarded a $1.5 million grant from the National Aeronautics and Space Administration (NASA) to develop new approaches for using satellite observations of Earth. The team will work with the Pangeo Project, a community effort for big data in the geosciences, to develop state-of-the-art open-source tools for cloud-based data analysis.

The Technion and Intel to inaugurate joint Center for Artificial Intelligence

Technion

from October 09, 2018

The Technion – Israel Institute of Technology and Intel Corporation inaugurated a new Center for Artificial Intelligence (AI) yesterday, Monday, October 8th. The Center is chartered with advancing research in AI fields and collaboration between Technion and Intel researchers.

Intel’s Dr. Michael Mayberry, chief technology officer, and Naveen Rao, corporate vice president, and general manager of the Artificial Intelligence Products Group represented Intel at the inauguration of the new AI Center. Prof. Boaz Golany, vice president for External Relations and Resource Development, Prof. Wayne Kaplan, executive vice president for Research, Prof. Nahum Shimkin, dean of the Andrew and Erna Viterbi Faculty of Electrical Engineering, Prof. Dan Geiger, dean of the Computer Science Department and Prof. Carmel Domshlak, Dean of the William Davidson Faculty of Industrial Engineering and Management, represented the Technion.

Economists (and Economics) in Tech Companies

National Bureau of Economic Research; Susan Athey, Michael Luca

from September 11, 2018

As technology platforms have created new markets and new ways of acquiring information, economists have come to play an increasingly central role in tech companies – tackling problems such as platform design, strategy, pricing, and policy. Over the past five years, hundreds of PhD economists have accepted positions in the technology sector. In this paper, we explore the skills that PhD economists apply in tech companies, the companies that hire them, the types of problems that economists are currently working on, and the areas of academic research that have emerged in relation to these problems.

Apple Launches Portal for U.S. Users to Download Their Data

Bloomberg, Technology, Mark Gurman

from October 17, 2018

Apple Inc. on Wednesday began allowing users in the U.S. to download a copy of all of the data that they have stored with the company from a single online portal.

The Cupertino, California-based technology giant previously rolled out that functionality in Europe as part of the European Union’s General Data Protection Regulation rules that kicked in earlier this year. U.S. users will be able to download data such as all of their address book contacts, calendar appointments, music streaming preferences and details about past Apple product repairs. Previously, customers could get their data by contacting Apple directly.

Changing Who Pays for CS Professional Development in the U.S. and Who Controls It: It Has to be Local

Communications of the ACM, blog@CACM, Mark Guzdial

from October 17, 2018

In 2012, the Outliers group at the University of Chicago completed a national landscape survey on the state of computer science (CS) education in the U.S. It’s still available here. One of their findings was that teacher professional development worked differently in computer science education than for any other subject. In all other subjects, individual school districts ordered professional development and paid for it.

Baidu is joining an American artificial intelligence partnership

Quartz, Dave Gershgorn

from October 17, 2018

Search giant Baidu will be the first Chinese company to join the US-centric Partnership on AI, the organizations announced Oct. 16.

The addition to the two-year-old partnership is an unusual move within the modern narrative concerning AI, where Chinese and American AI advancements are often characterized as an arms race. Baidu, which can be thought of as the Google of China, has far-reaching goals for artificial intelligence technology, from self-driving cars to personal assistants.

Baidu’s fundamental approach to artificial intelligence also makes it an odd candidate for the partnership, which focuses on mitigating potential harms of AI and promoting responsible development of the technology. Baidu has been portrayed in the past as valuing speed of innovation rather than being concerned about societal implications, and has been cited by Human Rights Watch as being discriminatory in its job posting.

Reddit CEO Says Tech Scandals Proved Him Right on User Data

Bloomberg, Technology, Sarah Frier

from October 16, 2018

Reddit Inc., the news aggregator and discussion forum website, doesn’t always know the full names, genders or other identifying information about its users. That might not be the most lucrative way to build an advertising business, but Chief Executive Officer Steve Huffman feels vindicated.

Larger technology peers, including Facebook Inc. and Alphabet Inc.’s Google, are facing regulation and government scrutiny over how they use, and profit from, the personal data of billions of people. Reddit, meanwhile, is serving up ads alongside its popular user forums, which are segmented by interest and run the gamut from beauty tips to politics.

Reddit “exists because we didn’t build a business that’s predicated on harvesting and selling your personal data,” Huffman said at Bloomberg’s Sooner Than You Think conference in New York.

Twitter’s Misguided Barriers for Researchers

Bloomberg, Opinion, Kara Alaimo

from October 16, 2018

Here’s the problem: It’s not possible to use Twitter’s advanced search function to pull all tweets related to a certain subject within a date range. For example, when I tried searching recently for use of the #MeToo hashtag during the first three days of the movement, I got a subset of tweets, but not all of them, and there was no way for me to know how many tweets there had been in aggregate. Astonishingly, when I searched for use of the #MeToo hashtag on Oct. 15, 2017 — the first day of the movement — I got zero results.

And Twitter’s policies have made it increasingly challenging and costly for academics like me to access historical tweets in other ways.

Artificial Intelligence Is Learning to Keep Learning

Scientific American, Matthew Hutson

from October 15, 2018

What if you stopped learning after graduation? It sounds stultifying, but that is how most machine-learning systems are trained. They master a task once and then are deployed. But some computer scientists are now developing artificial intelligence that learns and adapts continuously, much like the human brain.

Machine-learning algorithms often take the form of a neural network, a large set of simple computing elements, or neurons, that communicate via connections between them that vary in strength, or “weight.” Consider an algorithm designed to recognize images. If it mislabels a picture during training, the weights are adjusted. When mistakes are reduced below a certain threshold, the weights are frozen at set values.

The new technique splits each weight into two values that combine to influence how much one neuron can activate another. The first value is trained and frozen as in traditional systems. But the second value continually adjusts in response to surrounding activity in the network. Critically, the algorithm also learns how adjustable to make these weights. So the neural network learns patterns of behavior, as well as how much to modify each part of that behavior in response to new circumstances. The researchers presented their technique in July at a conference in Stockholm, Sweden.

Donors Pour $583 Million Into Artificial-Intelligence Programs and Research

The Chronicle of Philanthropy, Maria Di Mento

from October 15, 2018

The robots are taking over, and some of the tech billionaires who drove their rise are stepping up to give big to programs focused on artificial intelligence.

Since 2015, nine wealthy donors have given a total of about $583.5 million to nonprofit institutions that are developing new artificial-intelligence tools and studying the effects of A.I. on human life, according to a Chronicle tally of publicly announced gifts.

Of those nine, five are technology moguls. Among them is the late Microsoft co-founder Paul Allen, who died Monday. He gave $125 million in February to his Allen Institute for Artificial Intelligence, a Seattle nonprofit he launched in 2013. His last gift during his lifetime went to a new research program that is studying ways to instill common sense in artificial-intelligence programs to improve problem-solving without human input.

Google to release DeepMind’s StreetLearn for teaching machine-learning agents to navigate cities

TechRepublic, Nick Heath

from October 05, 2018

The StreetLearn environment relies on images from Google Street View and has been used by Google DeepMind to train a software agent to navigate various western cities without reference to a map.

Events

UW eScience Institute Lightning Strikes: Successes in Research Computing

UW eScience Institute

from October 23, 2018

Seattle, WA October 23, starting at 9 a.m., Husky Union room 332. “Researchers from across the UW will provide a glimpse into what they are excited about in data science, from fields including genomics, neuroscience, resources for compute power (UW in-house, cloud, national supercomputers)”, remote sensing, astronomy, molecular engineering, oceanography and student-driven research.

Ted Underwood at Berkeley

University of California-Berkeley, Digital Humanities at Berkeley

from November 08, 2018

Berkeley, CA November 8, starting at 5:30 p.m., University of California-Berkeley Townsend Center (220 Stephens Hall). “Please join us for discussions around the intersections of data and literature with Ted Underwood, Laura McGrath, Associate Professor of English at the University of Illinois, Urbana-Champaign. Professor Underwood will speak on ‘Modeling Perspective and Parallax to Tell the Story of Genre Fiction.'” [free]

Deadlines

Universal Guidelines for Artificial Intelligence

Individual and Organization Sign-ons until October 23.

Visual Analytics in Deep Learning

“We present a survey of the role of visual analytics in deep learning research, noting its short yet impactful history and summarize the state-of-the-art using a human-centered interrogative framework, focusing on the Five W’s and How (Why, Who, What, How, When, and Where), to thoroughly summarize deep learning visual analytics research. We conclude by highlighting research directions and open research problems.”

ICML 2019 Call for Papers

Long Beach, CA Conference is June 10-15, 2019. Deadlines for submissions begin on January 18, 2019.

Tools & Resources

Rousillon: Scraping Distributed Hierarchical Web Data

UIST 2018; Sarah E. Chasins, Maria Mueller, Rastislav Bodik

from October 11, 2018

Programming by Demonstration (PBD) promises to enable data scientists to collect web data. However, in formative interviews with social scientists, we learned that current PBD tools are insufficient for many real-world web scraping tasks. The missing piece is the capability to collect hierarchically-structured data from across many different webpages. We present Rousillon, a programming system for writing complex web automation scripts by demonstration. Users demonstrate how to collect the first row of a ‘universal table’ view of a hierarchical dataset to teach Rousillon how to collect all rows. To offer this new demonstration model, we developed novel relation selection and generalization algorithms. In a within-subject user study on 15 computer scientists, users can write hierarchical web scrapers 8 times more quickly with Rousillon than with traditional programming.

The Netflix Media Database (NMDB)

Medium, Netflix TechBlog

from October 15, 2018

This blog post introduces the Netflix Media DataBase (NMDB) — a highly queryable data system built on the Netflix micro-services platform. NMDB is used to persist deeply technical metadata about various media assets at Netflix and to serve queries in near real-time using a combination of lookups as well as runtime computation. NMDB enables developers (such as video streaming researchers) to focus their time on developing insights into media data and crank out awesome data-driven algorithms as opposed to worrying about the task of collecting and organizing data.

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics | Scientific Data

Nature, Scientific Data; Robert Forkel et al.

from October 16, 2018

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices. [full text]

Paper: Skipping the Replication Crisis in Visualization

Eager Eyes blog, Robert Kosara

from October 16, 2018

Steve Haroz and I look at six different sources of problems, from bad study design to misinterpreted results, describe why and how they happen, and what can be done about them. We also discuss a number of ways replications can work, from direct replication (same experiment) to conceptual replications (same phenomenon, but different experiment) and registered reports (which get reviewed before the experiment is run to minimize p-hacking).

One reason why replications are hard to publish in our literature is that they are not considered novel. We therefore also propose a few ways of working replications into papers to make them more palatable for reviewers, though we also argue for a more scientific publication landscape in visualization.

Careers

Full-time, non-tenured academic positions

Exoplanet Science Research Positions (1+)

Space Telescope Science Institute; Baltimore, MD

Group Leader in Biostatistics and Systems Biology in Cancer Research

Oslo University Hospital, Department of Cancer Genetics, Institute for Cancer Research; Oslo, Norway

Research Fellow

University of Oxford, Oxford Internet Institute; Oxford, England

Full-time positions outside academia

Data Scientist

Devoted Health; Waltham, MA

Sports.BradStenger.com

Data Science newsletter – October 18, 2018

Leave a Comment Cancel reply