Data Science newsletter – August 21, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 21, 2018


Data Science News

Which outfit looks best? AI set to give shoppers smart styling tips

South China Morning Post, Peta Tomlinson


AI (artificial intelligence) is reshaping the fashion industry, and in a crossover to retail, it’s set to change the shopping experience soon.

Hongkongers were privy to what this will mean in the fitting room when Alibaba Group – the owner of South China Morning Post – and US fashion brand Guess launched a pilot FashionAI concept shop at Hong Kong Polytechnic University (PolyU) in July. The initiative taps technologies ranging from machine learning to computer vision to “learn” from consumers, designers and fashion aficionados within the e-commerce ecosystem, which is also set up to spot fashion trends.

Facebook hires CMU professors: fair or poaching?

Carnegie Mellon University, The Tartan student newspaper, Emma Flickinger


Carnegie Mellon, a leader in AI research, is a natural place for Facebook to look for talent. The university is ranked #1 in the country in AI research, and in the fall, it will be the nation’s first to offer an undergraduate degree in AI. (Facebook will also be upgrading its FAIR lab in Seattle, to be headed by a professor from the University of Washington.) Facebook’s choice to expand its presence in Pittsburgh is a logical one.

The ethics of the move, however, are less straightforward. By siphoning talent from top research universities, Facebook is ‘poaching,’ an increasingly unpopular move. Many believe the shift of research jobs from universities to corporations has highly privatized research, locking valuable scientific work behind non-disclosure agreements. When researchers leave academia for lucrative industry jobs, universities lose out on not only talent and intellectual property, but also grants, and the mentorship of young researchers.

Inside Facebook’s plan to protect the U.S. midterm elections

Recode, Kurt Wagner


“My greatest fear, and I hope I’m wrong, is that the Russians, or maybe it’s the Iranians — they’ve already started working on these things, they’ve already conducted penetrations of campaigns, and they’re getting set to go to the next stage of conducting an infowar at the time that will most hurt the candidates that are in key states,” [Eric Rosenbach] said in an interview with Recode. “A week or a couple days before the actual election day and midterms, [they’ll] carpet bomb the internet using Facebook and Twitter.”

Will Facebook be ready? The company says it’s moving quickly on its plan — which includes a physical war room to monitor the elections from its corporate headquarters in Menlo Park, Calif. — and has promised to double the number of safety and security employees on staff to 20,000 people. Facebook says it’s spending so much money monitoring political ads that it will actually hurt profits.

But Facebook is also running out of time to execute its plan. With the midterms less than three months away, it’s almost go time.

The World Economic Forum warns that AI may destabilize the financial system

MIT Technology Review, Will Knight


Artificial intelligence will reshape the world of finance over the next decade or so by automating investing and other services—but it could also introduce troubling systematic weaknesses and risks, according to a new report from the World Economic Forum (WEF).

Compiled through interviews with dozens of leading financial experts and industry leaders, the report concludes that artificial intelligence will disrupt the industry by allowing early adopters to outmaneuver competitors. It also suggests that the technology will create more convenient products for consumers, such as sophisticated tools for managing personal finances and investments.

But most notably, the report points to the potential for big financial institutions to build machine-learning-based services that live in the cloud and are accessed by other institutions.

Google Employees Are Organizing To Protest The Company’s Secret, Censored Search Engine For China

BuzzFeed News, Caroline O'Donovan


Following a series of crises at Google, disparate worker organizing efforts within the company are coalescing, part of a growing workers’ movement in the tech industry.

Lehigh research team to investigate a “Google for research data”

Lehigh University, Lehigh Engineering


There was a time—not that long ago—when the phrases “Google it” or “check Yahoo” would have been interpreted as sneezes, or a perhaps symptoms of an oncoming seizure, rather than as coherent thoughts.

Today, these are key to answering all of life’s questions.

It’s one thing to use the Web to keep up with a Kardashian, shop for ironic T-shirts, argue with our in-laws about politics, or any of the other myriad ways we use the Web in today’s world. But if you are a serious researcher looking for real data that can help you advance your ideas, how useful are the underlying technologies that support the search engines we’ve all come to take for granted?

“Not very,” says Brian Davison, associate professor of computer science at Lehigh University. “They understand web pages, not datasets. And existing dataset search services are cumbersome, focusing on searching descriptions instead of data, and they cater to researchers looking within their own discipline.”

Brian and his Lehigh research team envision a “dataset search engine” that can ultimately assist many kinds of scientists in locating data that they can use to perform exploratory analysis and test hypotheses.

NYU Researchers Awarded $6.3 Million NIH Grant to Create Video Data Library of Infants and Mothers at Play

New York University, News Release


The National Institutes of Health announced a $6,341,419 grant to support the Play and Learning Across a Year (PLAY) project—a collaborative research initiative by 65 researchers from 45 universities across the United States and Canada. PLAY will focus on the behaviors of infants and mothers during natural activity in their homes, providing an unprecedented corpus of data, and using an innovative, transparent approach to science. The data set will consist of fully transcribed and annotated videos, parent report questionnaires, video tours of the home, digital recordings of ambient noise, and detailed demographic information on 900+ infants and mothers from across the United States. This first-of-its-kind corpus will be shareable and searchable with data spanning domains from language to locomotion, gender to gesture, and object play to emotion.

The competing narratives of scientific revolution

Statistical Modeling, Causal Inference, and Social Science blog, Andrew Gelman


[skipping to the end] … We are in the middle of a scientific revolution involving statistics and replication in many areas of science, moving from an old paradigm in which important disoveries are a regular, expected product of statistially-significant p-values obtained from routine data collection and analysis, to a new paradigm of . . . weeelll, I’m not quite sure what the new paradigm is. I have some ideas related to quality control, and when it comes to the specifics of design, data collection, and analysis, I recommend careful measurement, within-person comparisons, and multilevel models. Compared to ten years ago, we have a much better sense of what can go wrong in a study, and a lot of good ideas of how to do better. What we’re still struggling with is the big picture, when we move away from the paradigm of routine discovery to a more continuous sense of scientific progress.

Google just gave control over data center cooling to an AI

MIT Technology Review, Will Knight


Google revealed today that it has given control of cooling several of its leviathan data centers to an AI algorithm.

Over the past couple of years, Google has been testing an algorithm that learns how best to adjust cooling systems—fans, ventilation, and other equipment—in order to lower power consumption. This system previously made recommendations to data center managers, who would decide whether or not to implement them, leading to energy savings of around 40 percent in those cooling systems.

Now, Google says, it has effectively handed control to the algorithm, which is managing cooling at several of its data centers all by itself.

“It’s the first time that an autonomous industrial control system will be deployed at this scale, to the best of our knowledge,” says Mustafa Suleyman, head of applied AI at DeepMind, the London-based artificial-intelligence company Google acquired in 2014.

Understanding urban issues through credit cards

University College London, UCL News


Credit Card Records (CCRs) are currently used to measure similarities in purchasing activity, but for the first time researchers have used the data along with Call Detailed Records (CDRs) to understand the daily rhythms of human mobility and communication.

Combining both reveals patterns in citizens’ socio-economic behaviours.

For the study, published today in Nature Communications, researchers used anonymous and aggregated credit card data from a major city, with the results allowing them to group the urban population into six clusters.

Older women dominated the ‘Homemaker’ cluster and tended to have the least expenditure and mobility, with their core transaction being grocery shops. The ‘Commuters’ cluster was mainly men who lived the farthest from the city centre.

Ecosystems Are Getting Greener in the Arctic

Lawrence Berkeley Lab, News Center


In recent decades, scientists have noted a surge in Arctic plant growth as a symptom of climate change. But without observations showing exactly when and where vegetation has bloomed as the world’s coldest areas warm, it’s difficult to predict how vegetation will respond to future warming. Now, researchers at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley have developed a new approach that may paint a more accurate picture of Arctic vegetation and our climate’s recent past – and future.

In a study published online Aug. 20 in Nature Climate Change, the researchers used satellite images taken over the past 30 years to track – down to a pixel representing approximately 25 square miles – the ebb and flow of plant growth in cold areas of the northern hemisphere, such as Alaska, the Arctic region of Canada, and the Tibetan Plateau.

California Finally Starts Its USD 3 Million Medical Interpretation Study

Slator, Eden Estopace


Two years after California lawmakers passed a bill requiring the Department of Health Care Services (DHCS) to conduct a study that would assess the current state of medical interpretation in California, the project is finally getting off the ground. The project had been slow to get started, which drew criticism in the local press.

Student-Built Satellite Telescope Prepares for Space

Caltech, News


After nine years, a student-designed-and-built satellite is being readied for launch into orbit, where it will be a test bed for a new type of space telescope that assembles itself in flight from multiple components.

With telescopes, bigger is better: the larger their primary mirror, the more light they can capture and the better the images they can create. Currently, however, space telescopes are limited in size and must be folded up to fit inside the rockets that launch them into space. Hubble is 2.4 meters in diameter, for instance, and the James Webb Space Telescope will be 6.5 meters in diameter when it launches in 2021. To build a telescope that exceeds 10 meters, scientists and engineers will need to develop new, modular designs that can be sent to space in multiple pieces—even on multiple rockets.

Facebook and NYU School of Medicine launch research collaboration to improve MRI

Facebook Code


Facebook and NYU School of Medicine’s Department of Radiology are excited to announce fastMRI, a new collaborative research project that will investigate the use of artificial intelligence (AI) to make magnetic resonance imaging (MRI) scans up to 10 times faster. If this effort is successful, it will make MRI technology available to more people, expanding access to this key diagnostic tool.

MRI scanners provide doctors and patients with images that typically show a greater level of detail related to soft tissues — such as organs and blood vessels — than is captured by other forms of medical imaging. But they are relatively slow, taking anywhere from 15 minutes to over an hour, compared with less than a second or up to a minute, respectively, for X-ray and CT scans. These long scan times can make MRI machines challenging for young children, as well as for people who are claustrophobic or for whom lying down is painful. Additionally, there are MRI shortages in many rural areas and in other countries with limited access, resulting in long scheduling backlogs. By boosting the speed of MRI scanners, we can make these devices accessible to a greater number of patients.

Amazon in Running to Acquire Landmark Movie Chain

Bloomberg; Anousha Sakoui, Nabila Ahmed and Spencer Soper

from Inc. is in the running to acquire Landmark Theaters, a move that would vault the e-commerce giant into the brick-and-mortar cinema industry, according to people familiar with the situation.


2018 Association of Internet Researchers Conference – Transnational Materialities



Montreal, QC, Canada October 10-13. Conference agenda posted. [$$$]

2nd FATREC Workshop: Responsible Recommendation



Vancouver, BC, Canada October 6, held in conjunction with the Recys 2018 conference. Accepted papers have been posted. [$$$]

CCDS Plans Un-Conference on Democracy, Digital Media, Decision Making, Data Analytics

Syracuse University, School of Information Studies


Syracuse, NY September 14-15. “The purpose of the Un-Conference is to bring together faculty from the Syracuse University community in a space that allows for peer-to-peer learning, collaboration, and creativity around the themes of this year’s event, Democracy, Digital Media, Decision Making, and Data Analytics.” [registration required]

Tools & Resources

No Machine Learning in your product? Start here

Medium, The Lever, Clemens Mewald


The notion of a PM as the product’s CEO, although imperfect, provides a good framework. PMs at Google are technically strong and commonly involved in engineering discussions and designs. Our PM ladder requires PMs at certain levels to “sign off on overall technical direction and architecture” or to be able to “describe, in detail, the full stack and interactions of the product sub-area on a technical level”. However, deep understanding in ML/AI often requires years of applying ML in research and practice. In an AI-first company, does that mean we have to send all of our PMs back to school?

I argue that PMs do NOT need to know the nitty-gritty details of ML. They don’t need to know exactly which algorithm to apply when, or how ML models are deployed to billions of users. Rather, they should rely on their software engineering (SWE) teams and focus on how they can contribute to ML-driven product definition and strategy.

In this post I aim to share some of the best practices on what product owners should know about ML.


GitHub – Microsoft


“Charticulator is a new charting tool that allows you to design charts by interactively specifying constraints.”

Beyond Interactive: Notebook Innovation at Netflix

Medium, Netflix TechBlog


“In this post, we’ll share our motivations and why we find Jupyter notebooks so compelling. We’ll also introduce components of our notebook infrastructure and explore some of the novel ways we’re using notebooks at Netflix.”


Simon Willison


“fix unicode that’s broken in various ways”


Tenured and tenure track faculty positions

Tenure track faculty at the assistant or associate professor rank to expand interdisciplinary programming in the environment

University of North Carolina; Chapel Hill, NC

Leave a Comment

Your email address will not be published.