Data Science newsletter – November 3, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for November 3, 2018


Data Science News

Data from a Female Point of View

LinkedIn, Anna Anisin and Amarita Natt


According to Burtch Works (via DSSe), 85% of Data Scientists and 74% of Predictive Analysts are male. Considering the fields that make up those professions, the disconnect becomes even more apparent: only 18% of computer science bachelor’s degrees in 2016 were awarded to women, according to the National Center for Education Statistics. Statistics does better, at almost 43% female, and economics (via the National Science Foundation) falls somewhere around 31%, but consider that overall, women earn 57% of all bachelor’s degrees awarded. Given the stark numbers in education, it’s not surprising that women are underrepresented among the Data Science and Predictive Analytics professions. Of course, that lack of representation means that the questions we ask, the variables we use in models, and even our interpretations of those models are all skewed by selection bias within our profession.

Father of Web says tech giants may have to be split up

Reuters, Guy Faulconbridge and Paul Sandle


“What naturally happens is you end up with one company dominating the field so through history there is no alternative to really coming in and breaking things up,” Berners-Lee, 63, said in an interview. “There is a danger of concentration.”

But he urged caution too, saying the speed of innovation in both technology and tastes could ultimately cut some of the biggest technology companies down to size.

“Before breaking them up, we should see whether they are not just disrupted by a small player beating them out of the market, but by the market shifting, by the interest going somewhere else,” Berners-Lee said.

Arts and Sciences faculty approves new undergrad curriculum

Cornell University, Cornell Chronicle


“The biggest improvement for students under the new curriculum is the more streamlined and meaningful list of distribution requirements, which are broad areas of study students explore when they choose classes” said Tom Pepinsky, associate professor of government and chair of the college’s Curriculum Review Committee (CRC). “New categories of requirements reflect areas of real faculty and student interest, including data science, global citizenship and social difference.”

Apple, Siri fall further behind Google, Amazon in the smart home

CNET, Ben Fox Rubin


The hardware heavyweight may make the HomePod, but it doesn’t seem all that interested in overtaking its leading rivals.

The Privacy Battle to Save Google From Itself

WIRED, Security, Lily Hay Newman


Over two days during the summer of 2009, experts from inside and outside Google met to forge a roadmap for how the company would approach user privacy. At the time, Google was under fire for its data collection practices and user tracking. The summit was designed to codify ways that users could feel more in control.

Engineer Amanda Walker, then in her third year at Google and now the company’s software engineering manager of privacy infrastructure, jotted down notes on a paper worksheet during one of the summit’s sessions. “HMW: Mitigate Impact of bad Gov’t + 3rd party requests,” she wrote, using shorthand for “how might we.” A few suggestions followed: “Discourage abusive requests. Make privacy measurable/surface rising threats. Industry wide.” It was the seed of what would eventually become Google’s suite of transparency reports that, among other things, disclose government requests for data.

It also was just one of several features the group brainstormed that summer that became a reality.

Amazon’s new goal: Teach 10 million kids a year to code

Associated Press, Joseph Pisani


Amazon wants to get more kids thinking about becoming computer engineers.

The company launched a program Thursday that aims to teach more than 10 million students a year how to code. Amazon said it will pay for summer camps, teacher training and other initiatives to benefit kids and young adults from low-income families who might not have learned to code otherwise. It hopes the programs spur more black, Hispanic and female students to study computer science.

Rice Data Lab Has Students Tackle Real-World Analytics

Campus Technology, Dian Schaffhauser


A new laboratory experience at Rice University is giving students ample opportunities to tackle real-world problems in data science. The Houston institution’s new Center for Transforming Data to Knowledge (D2K Lab) allows learners to work directly with companies, academic labs, government agencies and nonprofits. The lab was founded through a $4 million gift from Rice alumnus Kevin Harvey and his wife Catherine Harvey. It’s being led by Genevera Allen, an associate professor of statistics and electrical and computer engineering at Rice University.

The lab features courses, co-curricular programs and events to link students to people with data problems. For example, in one course, students participate in the D2K “Consulting Clinic,” a three-hour, once-a-week walk-in clinic designed to address “smaller-scale” problems that can be tackled in a few hours or a couple of days. The clinic, which opened in the spring, has a mix of 12 graduate and undergraduate students. Clients can just walk in or make appointments. Small teams of three or four students meet with the client to discuss the problem, and the team suggests a course of action within a few days.

With Student Interest Soaring, Berkeley Creates New Data-Sciences Division

Chronicle of Higher Education, Alexander C. Kafka


The University of California at Berkeley today announced a new Division of Data Science and Information. It is the university’s largest program change in decades and helps secure its status among the country’s top data-science research and training hubs.

“The division will enable students and researchers to tackle not just the scientific challenges opened up by pervasive data, but the societal, economic, and environmental impacts as well,” the university said.

Berkeley is in an elite group with Carnegie Mellon University, MIT, Stanford, and the University of Washington in the caliber and scope of its data-science program, said Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, a computer-science professor at the University of Washington, and a tech entrepreneur. In creating the new division, Berkeley is responding to two issues, Etzioni said.

Machine learning spots natural selection at work in human genome

Nature, News, Amy Maxmen


Pinpointing where and how the human genome is evolving can be like hunting for a needle in a haystack. Each person’s genome contains three billion building blocks called nucleotides, and researchers must compile data from thousands of people to discover patterns that signal how genes have been shaped by evolutionary pressures.

To find these patterns, a growing number of geneticists are turning to a form of machine learning called deep learning. Proponents of the approach say that deep-learning algorithms incorporate fewer explicit assumptions about what the genetic signatures of natural selection should look like than do conventional statistical methods.

“Machine learning is automating the ability to make evolutionary inferences,” says Andrew Kern, a population geneticist at the University of Oregon in Eugene. “There is no question that it is moving things forward.”

Google Walkout Organizers Explain Their Demands

The Cut; Claire Stapleton, Tanuja Gupta, Meredith Whittaker, Celie O'Neil-Hart, Stephanie Parker, Erica Anderson, and Amr Gaber


“Last week, the New York Times published an article about Google’s history of harassment, discrimination, support for abusers, and the people whose lives and careers become collateral damage in the process. The article provided a narrow window into a culture we, as Google employees, know well. These stories are our stories. We share them in hushed tones to trusted peers, friends, and partners. There are thousands of us, at every level of the company. And we’ve had enough.”

Deepfake-busting apps can spot even a single pixel out of place

MIT Technology Review, Karen Hao


There are two main ways to deal with the challenge of verifying images, explains [Hany] Farid. The first is to look for modifications in an image. Image forensics experts use computational techniques to pick out whether any pixels or metadata seem altered. They can look for shadows or reflections that don’t follow the laws of physics, for example, or check how many times an image file has been compressed to determine whether it has been saved multiple times.

The second and newer method is to verify an image’s integrity the moment it is taken. This involves performing dozens of checks to make sure the photographer isn’t trying to spoof the device’s location data and time stamp. Do the camera’s coordinates, time zone, and altitude and nearby Wi-Fi networks all corroborate each other? Does the light in the image refract as it would for a three-dimensional scene? Or is someone taking a picture of another two-dimensional photo?

Farid thinks this second approach is particularly promising.

A Conversation With Quoc Le: The AI Expert Behind Google AutoML



In 2016, Le teamed up with a Google resident and published the seminal paper Neural Architecture Search with Reinforcement Learning. The core idea was akin to building blocks: The machine picks up the components it needs from a defined space to build a neural network, and then improves its accuracy using a trial-and-error technique, which is reinforcement learning. The result was promising, as machines generated models that matched humans’ best performance models.

Le’s research contributed with the creation of Google Cloud AutoML, a set of tools that enables developers with limited machine learning expertise to train high quality models. Unsurprisingly AutoML quickly became a popular topic, with tech giants and startups alike following Google’s footprints and betting on the new tech.


The Poetics of Information Symposium

NYU, Gallatin


New York, NY November 16, starting at 9:30 a.m., NYU, The Jerry H. Labowitz Theatre for the Performing Arts. “a one-day symposium bringing together writers, artists, scholars, and critics to consider the place of information as an aesthetic, conceptual, and creative force in contemporary culture. Topics to be addressed include vastness and repetition as aesthetic principles, algorithms and the creation and consumption of music, conceptual writers in their engagement with information, and the place of computation in literary studies and creation.” [rsvp required]


ACH2019 Conference

Pittsburgh, PA “The inaugural Association for Computers and the Humanities (ACH) conference will take place on July 23-26, 2019.” Deadline for proposals is November 10.

Call for proposals to heighten diversity and inclusion within the Julia community

“We are pleased to announce that Julia Computing will be offering small grants to fund efforts to heighten diversity and inclusion within the Julia community, using funds graciously provided by the Alfred P. Sloan Foundation.” Deadline to apply is November 25.

AI Impact Challenge

“ is issuing an open call to organizations around the world to submit their ideas for how they could use AI to help address societal challenges. Selected organizations will receive support from Google’s AI experts, grant funding from a $25M pool, credit and consulting from Google Cloud, and more.” Deadline for applications is January 22, 2019.

Workshop on Technology and Consumer Protection (ConPro ’19)

San Francisco, CA May 23, 2019, “The Workshop on Technology and Consumer Protection (ConPro ’19) will explore computer science topics with an impact on consumers. This workshop has a strong security and privacy emphasis, with an overall focus on ways in which computer science can prevent, detect, or address the potential for technology to deceive or unfairly harm consumers.” Deadline for submissions is January 23, 2019.
Tools & Resources

End-to-end Music Source Separation – Audio samples

Jordi Pons


“By operating directly over the waveform, these models take into account all the information available in the raw audio signal, including the phase. Our results show that waveform-based models can outperform a recent spectrogram-based deep learning model. Namely, a novel Wavenet-based model we propose and Wave-U-Net can outperform DeepConvSep, a spectrogram-based deep learning model. This suggests that end-to-end learning has a great potential for the problem of usic source separation.”

Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform

Facebook Research;; Jason Gauci, Edoardo Conti, et al.


In this paper we present Horizon, Facebook’s open source applied reinforcement learning (RL) platform. Horizon is an end-to-end platform designed to solve industry applied RL problems where datasets are large (millions to billions of observations), the feedback loop is slow (vs. a simulator), and experiments must be done with care because they don’t run in a simulator. Unlike other RL platforms, which are often designed for fast prototyping and experimentation, Horizon is designed with production use cases as top of mind. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. We also showcase real examples of where models trained with Horizon significantly outperformed and replaced supervised learning systems at Facebook.

New guide: Adding to a GitHub Pages site

Software Sustainability Institute


“Niall Beard, from the University of Manchester, has written a guide for the Software Sustainability Institute on ‘Adding to a GitHub Pages site.'”


Full-time positions outside academia

Data Scientist- Weather Radar

Garmin; Olathe, KS

Senior Software Engineer

Space Telescope Science Institute; Baltimore, MD
Tenured and tenure track faculty positions

Media Arts Professors

University of Nebraska-Lincoln, Johnny Carson Center for Emerging Media Arts; Lincoln, NE

Leave a Comment

Your email address will not be published.