Data Science newsletter – January 16, 2019

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for January 16, 2019


Data Science News

How Corning Makes Super-Pure Glass for Fiber-Optic Cable

WIRED, Backchannel, Susan Crawford


Fiber-optic cable is made in an almost incomprehensibly precise way. It has to be so pure, so clear, that it can transmit light over many dozens of miles without any boosting or encouragement, and without losing any of the information that has been encoded onto that light. To get that clarity, its manufacturers control every micron and every second of the manufacturing process.

Privacy in the age of medical big data

Nature Medicine, W. Nicholson Price II & I. Glenn Cohen


Big data has become the ubiquitous watch word of medical innovation. The rapid development of machine-learning techniques and artificial intelligence in particular has promised to revolutionize medical practice from the allocation of resources to the diagnosis of complex diseases. But with big data comes big risks and challenges, among them significant questions about patient privacy. Here, we outline the legal and ethical challenges big data brings to patient privacy. We discuss, among other topics, how best to conceive of health privacy; the importance of equity, consent, and patient governance in data collection; discrimination in data uses; and how to handle data breaches. We close by sketching possible ways forward for the regulatory system. [full text]

Israel needs national vision for AI or risks falling behind, tech authority says

The Times of Israel, Shoshanna Solomon


Israel Innovation Authority urges government, academia, industry to join forces for advances in artificial intelligence as global race is underway

How Researchers At The University Of Toronto Aim To Help Musicians With Artificial Intelligence

Ludwig Van (Toronto), Anya Wassenberg


Eric Baptiste, the CEO of Society of Composers, Authors and Music Publishers of Canada (SOCAN), has partnered with the U of T’s Department of Computer Science Innovation Lab (DCSIL) on a project designed to benefit Canadian musicians from a different angle — that of being able to track music played around the world in order to collect and distribute the royalties. It’s ironic that, as the ability to proliferate music across the globe has become easier and easier, artist revenues have dwindled. The project will use AI — artificial intelligence — in various ways to hunt down and track music under copyright wherever it is played.

How Algorithms Are Taking Over Big Oil

Forbes, Christopher Helman


A visitor to one of BP’s natural gas fields in Wyoming a few years ago might have noticed an odd sight: smartphones in plastic bags tied to pumps with zip ties. This was an early test of a multistate initiative by the oil giant to link a network of Wi-Fi sensors to an artificial intelligence system—one that now operates the Wamsutter field in Wyoming with far less human oversight than before.

Artificial intelligence has come to the oil patch, accelerating a technical change that is transforming the conditions for the oil and gas industry’s 150,000 U.S. workers. Giant energy companies like Shell and BP are investing billions to bring artificial intelligence to new refineries, oilfields and deepwater drilling platforms. Already, these investments are paying for themselves by helping drill tricky oil wells faster, predict equipment failures and slash fugitive methane emissions.

Princeton collaboration brings new insights to the ethics of artificial intelligence

Princeton University, Office of Communications


The growing use of artificial intelligence in both everyday life and life-altering decisions brings up complex questions of fairness, privacy and accountability. Surrendering human authority to machines raises concerns for many people. At the same time, AI technologies have the potential to help society move beyond human biases and make better use of limited resources.

“Princeton Dialogues on AI and Ethics” is an interdisciplinary research project that addresses these issues, bringing engineers and policymakers into conversation with ethicists, philosophers and other scholars. At the project’s first workshop in fall 2017, watching these experts get together and share ideas was “like nothing I’d seen before,” said Ed Felten, director of Princeton’s Center for Information Technology Policy (CITP). “There was a vision for what this collaboration could be that really locked into place.”

The project is a joint venture of CITP and the University Center for Human Values, which serves as “a forum that convenes scholars across the University to address questions of ethics and value” in diverse settings, said director Melissa Lane.

Science gets shut down right along with the federal government

The Conversation, Angela K. Wilson


When the U.S. government shuts down, much of the science that it supports is not spared. And there is no magic light switch that can be flipped to reverse the impact.

For instance, large-scale instruments like NASA’s Stratoscopheric Observatory for Infrared Astronomy – the “flying telescope” – have to stop operations. Eventually bringing such instrumentation back up to speed requires over a week. If the shutdown lingers, contingency funds provided to maintain large-scale instruments supported by agencies including NASA, the National Oceanic and Atmospheric Administration and the National Science Foundation will run out and operations will cease, adding to the list of closed facilities.

When I headed NSF’s Division of Chemistry from March 2016 to July 2018, I experienced firsthand two shutdowns like the one the country is weathering now. The 1,800 NSF staff would be sent home, without access to email and without even the option to work voluntarily, until eventually an end to the shutdown was negotiated. As we were unsure how long the shutdowns would run, a lot of time was spent developing contingency plans – and coordinating with many hundreds of researchers about them. Concerns about what will happen to researchers’ day-to-day projects are compounded by apprehension about interruptions to long-term funding.

Twitter’s tampered samples: Limitations of big data sampling in social media

SpringerOpen blog; Jürgen Pfeffer, Katja Mayer & Fred Morstatter


Social networks are widely used as sources of data in computational social science studies, and so it is of particular importance to determine whether these datasets are bias-free. In EPJ Data Science, Jürgen Pfeffer, Katja Mayer and Fred Morstatter demonstrate how Twitter’s sampling mechanism is prone to manipulation that could influence how researchers, journalists, marketeers and policy analysts interpret their data.

How We’ll Forget John Lennon – Our culture has two types of forgetting.

Nautilus, Kevin Berger


A few years ago a student walked into the office of Cesar A. Hidalgo, director of the Collective Learning group at the MIT Media Lab. Hidalgo was listening to music and asked the student if she recognized the song. She wasn’t sure. “Is it Coldplay?” she asked. It was “Imagine” by John Lennon. Hidalgo took it in stride that his student didn’t recognize the song. As he explains in our interview below, he realized the song wasn’t from her generation. What struck Hidalgo, though, was the incident echoed a question that had long intrigued him, which was how music and movies and all the other things that once shone in popular culture faded like evening from public memory.

Hidalgo is among the premier data miners of the world’s collective history. With his MIT colleagues, he developed Pantheon, a dataset that ranks historical figures by popularity from 4000 B.C. to 2010. Aristotle and Plato snag the top spots. Jesus is third. It’s a highly addictive platform that allows you to search people, places, and occupations with a variety of parameters. Most famous tennis player of all time? That’s right, Frenchman Rene Lacoste, born in 1904. (Roger Federer places 20th.) Rankings are drawn from, essentially, Wikipedia biographies, notably ones in more than 25 different languages, and Wikipedia page views.

Astronomers still can’t decipher the ‘Cow,’ a mysterious explosion in deep space

Science, Daniel Clery


An unusually bright glow in the sky that appeared suddenly last June has got astronomers in a frenzy. After months of study, they still aren’t sure what the object—officially called AT2018cow, but universally referred to as the “Cow”—is. But scientists have some ideas, which they offered here today at the American Astronomical Society meeting. Whatever it is, says astronomer Liliana Rivera Sandoval of Texas Tech University in Lubbock, “It’s super weird.”

Churchill Capital Corp and Clarivate Analytics Announce Merger Agreement

Clarivate Analytics


Clarivate provides comprehensive intellectual property and scientific information, decision support tools and services that enable academia, corporations, governments, and the legal community, to discover, protect and commercialize new ideas, mission-critical content, and brands. Clarivate’s many well‐known brands include Web of Science™, Cortellis™, Derwent Innovation™, Derwent World Patents Index™, CompuMark™, MarkMonitor®, and Techstreet™, among others. Funds affiliated with Onex Corporation (“Onex”) (TSX: ONEX) and Baring Private Equity Asia (“BPEA”) acquired Clarivate in a carve-out transaction from Thomson Reuters in 2016. Since the acquisition, Clarivate has continued to capitalize on its unique legacy and assets while separating from its former parent and positioning the business for success as a standalone entity, which included recruiting a new management team and re-energizing its products and commercial capabilities. As the business completes its multi-year separation from Thomson Reuters this year, it is well positioned for accelerated growth.

As Self-Driving Cars Stall, Players Revive an Old Approach

WIRED, Transportation, Jack Stewart


This step-by-step approach to autonomy, where the machine gradually takes over the work of driving, started to go out of fashion in 2012, when Google’s self-driving project (now Waymo) decided it was safer to go full-robo than to find a way to make the human and computer work together effectively. Much of the auto industry reached the same conclusion over the next few years, even vowing to take the steering wheel and pedals out of their cars.

The mastery to land that moonshot, though, has proven elusive, even in carefully prescribed areas like the Phoenix suburbs where Waymo operates. And so the gradual approach, built on capabilities that are achievable in the near-term, is making something of a comeback, with a focus on the human-machine interaction.

Who are you writing for? The role of community membership on authors’ decisions to publish in open access mega-journals

London School of Economics, Impact of Social Sciences blog


Open Access mega-journals have in some academic disciplines become a key channel for communicating research. In others, however, they remain unknown. Drawing on evidence from a series of focus groups, Jenny Fry and Simon Wakeling explore how authors’ perceptions of mega-journals differ across disciplines and are shaped by motivations associated with the multiple communities they function within.


Industry Leaders prepare for Rice University Oil and Gas Conference in March

Rice University, Ken Kennedy Institute for Information Technology


Houston, TX March 4-6. The conference “will focus on the computational challenges and needs in the Energy industry.” [$$$]


SysML Conference – Call for Demonstration Proposals

“The Conference on Systems and Machine Learning (SysML) 2019 has a Demonstration Track running in parallel with some of the Poster Sessions, March 31st – April 2nd, 2019, in Stanford, California.” Deadline for submissions is February 10.


Portland, OR May 8-9. “Featuring stories about data sharing and data analysis from science, journalism, government, and open source.” Deadline to submist a talk is February 10.

Workshop on Immersive Analytics: Interaction Design and Prototyping for Immersive Analytics

Glasgow, Scotland May 5, a CHI 2019 workshop. Deadline for position paper submissions is February 12.

Machine Learning for Healthcare 2019

Ann Arbor, MI August 8-10 at University of Michigan. “We are calling for papers in two tracks: a research track and a clinical abstract track.” Deadline for submissions is March 29.
Tools & Resources

The Apache Software Foundation Announces Apache® Airflow™ as a Top-Level Project

The Apache Software Foundation


The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).

Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF’s meritocratic process and principles.

Stanford Open-Sources Neural Network Verification Project



A Stanford Intelligent Systems Laboratory (SISL) research group has announced it is open-sourcing its NeuralVerification.jl project, which helps verify deep neural networks’ training, robustness and safety results.

Rxivist: Popular biology pre-print papers ranked

Blekhman Lab at the University of Minnesota.


Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field.

How to prepare your digital life for your inevitable death

Popular Science, Whitson Gordon


No one likes to think about it, but one day, you’re going to die. (Sorry to be the bearer of bad news.) You probably know where you want your money and other physical possessions to go, but you have a lot of digital assets to pass on too. You need to plan for the future of those accounts right now.


Full-time positions outside academia

Chief Data Officer

The Bail Project; Los Angeles, CA

Director ML Platform Design

Capital One; Vienna, VA

Data Architect

JUST Capital; New York, NY
Full-time, non-tenured academic positions

Research Associate (Fixed Term)

University of Cambridge, Centre for Research in Arts, Social Sciences and Humanities; Cambridge, England

Assistant Director, Educational Technology

New York University, Stern School of Business; New York, NY

Leave a Comment

Your email address will not be published.