Data Science newsletter – March 8, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for March 8, 2019


Data Science News

How artificial intelligence will revolutionize the way video games are developed and played

The Verge, Nick Statt


Putting aside the more morbid military applications of Card’s fantasy game (and the fact that the software ultimately develops sentience), The Mind Game is a solid starting point for a conversation about the future of video games and artificial intelligence. Why are games, and the AI used to both aid in creating them and drive the actions of virtual characters, not even remotely this sophisticated? And what tools or technologies do developers still require to reach this hypothetical fusion of AI and simulated reality?

These are questions researchers and game designers are just now starting to tackle as recent advances in the field of AI begin to move from experimental labs and into playable products and usable development tools. Until now, the kind of self-learning AI — namely the deep learning subset of the broader machine learning revolution — that’s led to advances in self-driving cars, computer vision, and natural language processing hasn’t really bled over into commercial game development. That’s despite the fact that some of these advancements in AI are thanks in part to software that’s improved itself through the act of playing video games, such as DeepMind’s unbeatable AlphaGo program and OpenAI’s Dota 2 bot that’s now capable of beating pro-level players.

NCSA’s Siebel Digital Transformation Institute aims to diversify use of data science

The News-Gazette (Champaign, IL), Julie Wurth


You may not think the words “medieval graffiti” and “data science” belong in the same sentence.

But a project supported by the University of Illinois’ National Center for Supercomputing Applications is helping medieval studies researcher Mia Trentin analyze thousands of images scratched on stone columns of centuries-old European churches.

The graffiti acts as a window into the lives of ordinary people during the Middle Ages, and new computing technologies are helping Trentin and other researchers more precisely interpret the images without having to travel to the sites.

The NCSA hopes to build on that approach through the proposed Siebel Digital Transformation Institute, designed to further “human-centered digital transformation” in all disciplines, in partnership with the University of California-Berkeley.

A Glimpse of A.I.’s Future? MIT-IBM Research Lab Sees Early Progress

Xconomy, Jeff Engel


The capabilities of artificial intelligence technologies have increased significantly in the past decade, but there’s a growing sense that new breakthroughs are needed for the field to continue delivering on its promise.

David Cox and his colleagues have dedicated themselves to identifying and breaking down “the fundamental core barriers” to advancing A.I., he says. In February 2018, IBM (NYSE: IBM) hired Cox—then a Harvard University associate professor—to direct its efforts in a new, joint A.I. research lab with MIT. For its part, IBM is investing $240 million over 10 years into the “MIT-IBM Watson AI Lab.”

Xconomy recently checked in with Cox (pictured above) to see how the first year went. That’s not a lot of time, especially considering the lab’s ambitious goals, but Cox sounds happy with the progress made thus far.

The Nation’s Data Hubs Drive Innovation

SIGNAL Magazine, Kimberly Underwood


“We’ve shifted fundamentally to talk about data science and build capacity in data science, and of course, machine learning comes under that umbrella,” explains Melissa Cragin, executive director of the Midwest Big Data Regional Innovation Hub. “And so while we’re called the Big Data Innovation Hubs, we’re very focused on building capacity in data science, building expertise, access to data-related services and networks related to all things data science.”

That means making available “to all kinds of communities” access to data-related skills, services, tools and opportunities, Cragin states. By developing public/private partnerships and working with groups to leverage these resources, the hubs can help coordinate solutions to “shared grand challenges,” she notes. The hub also is endeavouring to extend data science research and education to undergraduate institutions—including minority-serving institutions—to help add data skills for the developing workforce, she states.

NIH Provides $23 Million for Statewide Translational Research Institute

University of Virginia, UVA Today


The National Institutes of Health have awarded the integrated Translational Health Research Institute of Virginia a five-year grant of nearly $23 million to advance innovative ideas from the point of discovery to implementation in clinical practice and population health.

“iTHRIV” includes the University of Virginia, Inova Health System, Virginia Tech and Carilion Clinic as partners, with the Center for Open Science and UVA’s Licensing & Ventures Group as affiliates. The focus of iTHRIV is “using data to improve health” and leverages data science expertise from across the state.

Special issue on knowledge graphs and semantics in text analysis and retrieval

Information Retrieval Journal; Laura Dietz, Chenyan Xiong, Jeff Dalton, Edgar Meij


Knowledge graphs are an effective way to store semantics in a structured format that is easily used by computer systems. In the past few decades, work across different research communities led to scalable knowledge acquisition techniques for building large-scale knowledge graphs. The result is the emergence of large publicly available knowledge graphs (KGs) such as DBpedia (Lehmann et al. 2014), Freebase (Bollacker et al. 2008), and others. While knowledge graphs are designed to support a wide set of different applications, this special issue focuses on the use case of text retrieval and analysis.

Utilizing knowledge graphs for text analysis requires effective alignment techniques that associate segments of unstructured text with entries in the knowledge graph, for example using entity extraction and linking algorithms (Carmel et al. 2014; Mendes et al. 2011; Blanco et al. 2015). A wide range of approaches that combine query-document representations and machine learning repeatedly demonstrate significant improvements for such tasks across diverse domains (Dalton et al. 2014; Liu and Fang 2015; Hasibi et al. 2015; Xiong and Callan 2015; Raviv et al. 2016; Ensan and Bagheri 2017; Xiong et al. 2017). The goal of this special issue is to summarize recent progress in research and practice in constructing, grounding, and utilizing knowledge graphs and similar semantic resources for text retrieval and analysis applications. The scope includes acquisition, alignment, and utilization of knowledge graphs and other semantic resources for the purpose of optimizing end-to-end performance of information retrieval systems.

Urban groups: behavior and dynamics of social groups in urban space

EPJ Data Science; Matteo Zignani, Christian Quadri, Sabrina Gaito and Gian Paolo Rossi


The tendency of people to form socially cohesive groups that get together in urban spaces is a fundamental process that drives the formation of the social structure of cities. However, the challenge of collecting and mining large-scale data able to unveil both the social and the mobility patterns of people has left many questions about urban social groups largely unresolved. We leverage an anonymized mobile phone dataset, based on Call Detail Records (CDRs), which integrates the usual voice call data with text message and Internet activity information of one million mobile subscribers in the metropolitan area of Milan to investigate how the members of social groups interact and meet onto the urban space. We unveil the nature of these groups through an extensive analysis, along with proposing a methodology for their identification. The findings of this study concern the social group behavior, their structure (size and membership) and their root in the territory (locations and visit patterns). Specifically, the footprint of urban groups is made up by a few visited locations only; which are regularly visited by the groups. Moreover, the analysis of the interaction patterns shows that urban groups need to combine frequent on-phone interactions with gatherings in such locations. Finally, we investigate how their preferences impact the city of Milan telling us which areas encourage group get-togethers best. [full text]

Caffe Pioneer & AI Infrastructure Director Leaves Facebook



Facebook AI Infrastructure Director Yangqing Jia is leaving his position with the company, a person familiar with the matter told Synced. The Facebook team confirmed his departure yesterday.

‘The military Google industrial complex’: How Silicon Valley cashes in on war technology

CBC Radio, Yasha Levine


Last Sunday, Microsoft unveiled HoloLens 2, a set of goggles that can project interactive holographic images in front of the user.

The tech giant is pitching it as a tool for education and training in medicine and manufacturing. But another purpose for the tool has some within the company up in arms.

A group of Microsoft workers is protesting the contract their employer signed with the U.S. military to provide the HoloLens technology for use in combat. The deal is worth $480 million US. [audio, 9:20]

Recent advances in biomedical simulations: a manifesto for model engineering

F1000Research; Joseph L. Hellerstein, Stanley Gu, Kiri Choi, Herbert M. Sauro


Biomedical simulations are widely used to understand disease, engineer cells, and model cellular processes. In this article, we explore how to improve the quality of biomedical simulations by developing simulation models using tools and practices employed in software engineering. We refer to this direction as model engineering. Not all techniques used by software engineers are directly applicable to model engineering, and so some adaptations are required. That said, we believe that simulation models can benefit from software engineering practices for requirements, design, and construction as well as from software engineering tools for version control, error checking, and testing. Here we survey current efforts to improve simulation quality and discuss promising research directions for model engineering. [full text]

AI and Machine Learning Invade a New York Art Gallery

The Atlantic, Ian Bogost


The images are huge and square and harrowing: a form, reminiscent of a face, engulfed in fiery red-and-yellow currents; a head emerging from a cape collared with glitchy feathers, from which a shape suggestive of a hand protrudes; a heap of gold and scarlet mottles, convincing as fabric, propping up a face with grievous, angular features. These are part of “Faceless Portraits Transcending Time,” an exhibition of prints recently shown at the HG Contemporary gallery in Chelsea, the epicenter of New York’s contemporary-art world. All of them were created by a computer.

The catalog calls the show a “collaboration between an artificial intelligence named AICAN and its creator, Dr. Ahmed Elgammal,” a move meant to spotlight, and anthropomorphize, the machine-learning algorithm that did most of the work. According to HG Contemporary, it’s the first solo gallery exhibit devoted to an AI artist.

If they hadn’t found each other in the New York art scene, the players involved could have met on a Spike Jonze film set: a computer scientist commanding five-figure print sales from software that generates inkjet-printed images; a former hotel-chain financial analyst turned Chelsea techno-gallerist with apparent ties to fine-arts nobility; a venture capitalist with two doctoral degrees in biomedical informatics; and an art consultant who put the whole thing together, A-Team–style, after a chance encounter at a blockchain conference. Together, they hope to reinvent visual art, or at least to cash in on machine-learning hype along the way.

Scale-free networks are rare

Nature Communications, Anna D. Broido & Aaron Clauset


Real-world networks are often claimed to be scale free, meaning that the fraction of nodes with degree k follows a power law k−α, a pattern with broad implications for the structure and dynamics of complex systems. However, the universality of scale-free networks remains controversial. Here, we organize different definitions of scale-free networks and construct a severe test of their empirical prevalence using state-of-the-art statistical tools applied to nearly 1000 social, biological, technological, transportation, and information networks. Across these networks, we find robust evidence that strongly scale-free structure is empirically rare, while for most networks, log-normal distributions fit the data as well or better than power laws. Furthermore, social networks are at best weakly scale free, while a handful of technological and biological networks appear strongly scale free. These findings highlight the structural diversity of real-world networks and the need for new theoretical explanations of these non-scale-free patterns. [full text]

Study identifies thousands of species at risk of extinction by 2070

Yale Daily News, Madison Mahoney


The continual increase in human activity and land use will put more than 1,700 species of birds, mammals and amphibians at risk of extinction by 2070, according to a recent study by Yale ecologists.

“Our study shows that depending on the specific future pathway, land-use changes will have severe consequence for biodiversity,” said Walter Jetz, co-author of the paper and professor of ecology and evolutionary biology and of forestry and environmental studies.

I’m very excited to welcome @tedgreenwald to the @deeplearningai_ team! Ted is a former @WSJ editor, and will be leading a new editorial function.

Twitter, Andrew Ng


Joining AI Foundation | Delip Rao

Delip Rao


A few friends and family knew this, but today, I am excited to announce widely that I joined the AI Foundation as their VP of Research. I am overseeing their deep learning/machine learning research roadmaps in speech, language, and vision for generating and detecting artificial content.


The Future of Human-Centered AI

Stanford University, Global Digital Policy Incubator


Stanford, CA April 16, starting at 10 a.m., Stanford University McCaw Hall. [free, registration required]

Baseball Analytics Workshop

Jim Albert and Chad Redmond,


Cleveland, OH May 4, starting at 10 a.m., Cleveland State University. ” The morning session introduces basic data exploration techniques with the R programming language, a popular tool for data scientists and statisticians. The afternoon session will explore Statcast data which provides detailed pitch-by-pitch data including off-the-bat measures such as launch angle, exit velocity, and spray angle.” [$$]

Columbia University Data Science Day

Columbia University, Data Science Institute


New York, NY April 3, starting at 9 a.m., Columbia University Lerner Hall. “Join us for demos and lightning talks by Columbia researchers presenting their latest work in data science.” [$$$]

!!Con 2019

Erty Seidohl, Julia Evans, Ahmed Abdalla, Emily Xie, and Alicja Raszkowska


New York, NY May 11-12 at AppNexus. “!!Con (pronounced “bang bang con”) 2019 is two days of ten-minute talks (with lots of breaks, of course!) to celebrate the joyous, exciting, and surprising moments in computing.” [registration coming soon]


Agorize – Sports Analytics Challenge

“For the first time, one of the world’s greatest sports teams (Paris Saint-Germain) and one of the world’s most prestigious universities (École Polytechnique) are joining forces to give you the opportunity to enter a competition with unique scientific impact in the field of sports.” Deadline for Stage 1 submissions is April 15.
Tools & Resources

A new tool from Google and OpenAI lets us better see through the eyes of artificial intelligence

The Verge, James Vincent


New research from Google and nonprofit lab OpenAI hopes to further pry open the black box of AI vision by mapping the visual data these systems use to understand the world. The method, dubbed “Activation Atlases,” lets researchers analyze the workings of individual algorithms, unveiling not only the abstract shapes, colors, and patterns they recognize, but also how they combine these elements to identify specific objects, animals, and scenes.

Google’s Shan Carter, a lead researcher on the work, told The Verge that if previous research had been like revealing individual letters in algorithms’ visual alphabet, Activation Atlases offers something closer to a whole dictionary, showing how letters are put together to make actual words. “So within an image category like ‘shark,’ for example, there will be lots of activations that contribute to it, like ‘teeth’ and ‘water,’” says Carter.

Starting With Data Science: A Rigorous Hands-On Introduction to Data Science for Software Engineers

Win-Vector blog, John Mount


Win Vector LLC is now offering a 4 day on-site intensive data science course. The course targets software engineers familiar with Python and introduces them to the basics of current data science practice. This is designed as an interactive in-person (not remote or video) course.

The course includes lectures, hands-on labs, and optional homework exercises. Students are expected to attend a 4 full days, and will come out with a basic understanding of some of the most important tools for supervised learning in data science.


Full-time positions outside academia

Senior Scientific Services Officer

European Bioinformatics Institute (EMBL-EBI); Hinxton, England

Research Space Scientist, Research AST, Planetary Studies

NASA; Moffett Field, CA
Full-time, non-tenured academic positions

Technical Associate I

Massachusetts Institute of Technology, McGovern Institute for Brain Research; Cambridge, MA

Lecturer in Health Data Science/ Lecturer in Health Informatics and Health Data Analytics

University College London, Institute of Health Informatics; London, England

Research Assistant – Data Science Studies

University of California-Berkeley, Berkeley Institute for Data Science; Berkeley, CA

Lecturer of Statistical and Data Sciences

Smith College; Northampton, MA

Leave a Comment

Your email address will not be published.