Data Science newsletter – July 6, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 6, 2018

GROUP CURATION: N/A

Data Science News

Company Data Science News

Vint Cerf penned an op-ed arguing that we need to pay attention to the “brittle” decision making and potentially “biased” training data of autonomous decision making systems. His remedy is also mine: insert a healthy dose of nuanced, human, techno-ethical thinking.

Google’s Home and Chromecast technologies are leaking location data according to Tripwire researcher Craig Young. Google initially declined to address the problem, but caved under pressure from the infosec community and will release a patch by the end of July.

Facebook’s regulatory problems are increasing. In addition to the Justice Department and the Federal Trade Commission, the Federal Bureau of Investigation and the Securities and Exchange Commission are now investigating the company. [Watch out: The SEC has a very broad mandate, fb.] The company also clearly released it’s political ad targeting bot before the data science was on lock. Advertisements for lawn care services and bikini waxes were removed for using the word “bush” which is also the last name of two former Presidents. The bag of words approach is seriously unsuited to this task…

Adobe is jumping on the Canadian bandwagon (sleigh? dogsled?) and opening an AI lab in Toronto. They are still looking for a director, if you want to join them where you can work on problems like detecting photoshopped images.

The Royal Bank of Canada formed a cybersecurity partnership with Ben-Gurion University to the tune of $2 million. The partnership will focus on adversarial neural nets in the context of cybersecurity. The weakness of adversarial nets to attack is the theme of this issue of the newsletter.

Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot have a new paper in Communications of the ACM that takes up the weaknesses of classification algorithms to deliberately manipulative inputs. There’s a superb image of a sheep/goat/dog that should be the mascot of team image recognition.

Let’s hope JP Morgan gets a sociotechnical researcher on their new RoarData team of data science superstars being hired to “predict the future of everything.” (What could go wrong??) Only the extremely talented need apply. Access to lots and lots of data is promised.

Apple is replacing keyboards on MacBook Pros and rebuilding Maps.

Baidu unveiled a new chip, the Kunlun, designed for AI and cloud computing and the 818-100 for inference.

AI for Social Good: A Conversation with Jake Porway

Babson Blogs, Creating Social Value Blog, Alex Corindia

from June 26, 2018

Jake Porway is the founder and executive director of DataKind, a nonprofit that uses data science and AI in the service of humanity. DataKind brings top data scientists together with leading social impact organizations, corporations, and governments to tackle their toughest challenges together. Engagements can be anything from weekend hackathons to multi-month projects. DataKind has done over 250 projects to date helping organizations enhance their impact using technology.

Jake is a New York Times R&D Lab alum, has worked at Google and Bell Labs, and is a PopTech Social Innovation Fellow and a National Geographic Emerging Explorer. A recognized leader in the Data for Good Movement, he has spoken at IBM, Microsoft, Google, and the White House. He holds a BS in Computer Science from Columbia University and an MS and PhD in Statistics from UCLA. Jake founded DataKind in hopes of creating a world in which every social organization has access to data capacity to better serve humanity.

This past winter, he spoke to a group of Babson students through the college’s Net Impact chapter. Recently, I sat down with Jake to discuss the future of DataKind and what role Babson students might play.

Goodhart’s Law, or Weaponizing the Study of Culture

Simon DeDeo, Axiom of Chance blog

from June 30, 2018

Our lab is particularly interested in scientific discovery, knowledge creation, mental generativity, synergistic intellectual cooperation and, as part of that, cases where we see an increase in the power of the individual to determine her fate and exercise her talents. We call it cultural flourishing as shorthand, and spend a lot of time looking at systems that may or may not embody it.

One question we get asked from time to time is whether or not our findings can help us predict things about the future of an institution. The answer, we believe as good Baconians, is yes. We think we’re hooking into some underlying realities. But I’m reluctant to tell them how we find it.

That’s because bad things, potentially very bad things, can happen when you put institutions to the test.

Let’s begin with the easy cases: university rankings and Goodhart’s Law.

Want a job in data science? You might have to take a standardized test when applying

Chicago Tribune, Ally Marotti

from July 05, 2018

Data analytics is becoming more vital for businesses, and data scientists are in high demand. But the emerging field is broad, and some companies say they have struggled to find job candidates whose skills fit their needs.

To better define the tools their employees need to properly analyze big data, executives from eight companies have created a data science standards board. The companies, which include The New York Times, Spotify and several firms with a Chicago presence, plan to establish a standardized test for job applicants.

Eventually, companies across industries could use the test, executives involved in creating the board said. They equated it to a bar exam or certified public accountant test for data professionals. But one expert cautioned that the companies could miss out on top-notch employees if the standardized testing isn’t done right.

Red-hot planet: All-time heat records have been set all over the world during the past week

The Washington Post, Jason Samenow

from July 05, 2018

From the normally mild summer climes of Ireland, Scotland and Canada to the scorching Middle East, numerous locations in the Northern Hemisphere have witnessed their hottest weather ever recorded over the past week.

Large areas of heat pressure or heat domes scattered around the hemisphere led to the sweltering temperatures. The Canadian Broadcasting Corporation reports the heat is to blame for at least 34 deaths in southern Quebec, mostly in and near Montreal, which endured record high temperatures.

Data sharing, an environmental issue

Orange

from June 27, 2018

“The future does not predict, it prepares,” says the philosopher Maurice Blondel. Used in many areas to analyse human activities, data could occupy a central place in safeguarding the planet in the long term by transforming decision-making. Nicolas de Cordes, VP Marketing Anticipation at Orange, and Emmanuel Letouzé, Director and Co-founder of the non-profit think thank Data-Pop Alliance, reflect on the issue further.

[1806.10474] The challenge of realistic music generation: modelling raw audio at scale

arXiv, Computer Science > Sound; Sander Dieleman, Aäron van den Oord, Karen Simonyan [from Google Deep Mind]

from June 26, 2018

Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio domain. It has been shown that autoregressive models excel at generating raw audio waveforms of speech, but when applied to music, we find them biased towards capturing local signal structure at the expense of modelling long-range correlations. This is problematic because music exhibits structure at many different timescales. In this work, we explore autoregressive discrete autoencoders (ADAs) as a means to enable autoregressive models to capture long-range correlations in waveforms. We find that they allow us to unconditionally generate piano music directly in the raw audio domain, which shows stylistic consistency across tens of seconds.

IBM’s New Do-It-All Deep-Learning Chip

IEEE Spectrum, Samuel K. Moore

from July 02, 2018

IBM’s new chip is designed to do both high-precision learning and low-precision inference across the three main flavors of deep learning

Just How Transparent Can a Criminal Justice Algorithm Be?

Slate, Future Tense, Stephanie Wykstra

from July 03, 2018

Algorithms that make predictions about future behavior based on factors such as a person’s age and criminal history are increasingly used—and increasingly controversial—in criminal justice decision-making. One of the big objections to the use of such algorithms is that they sometimes operate out of the public’s view. For instance, several states have adopted a tool called COMPAS developed by the company Northpointe (now called Equivant), which claims the algorithm is proprietary and refuses to share crucial details of how it calculates scores.

In a striking contrast, the Pennsylvania sentencing commission has been very transparent. Legislation passed in 2010 tasked the commission with developing a risk assessment instrument for use by judges at sentencing “as an aide in evaluating the relative risk that an offender will reoffend and be a threat to public safety,” and to help identify candidates for alternatives to incarceration. Since 2010, the commission has released more than 15 reports detailing the development of the algorithm and has held 11 public hearings to invite feedback. The commission has also altered its proposal over time in response to the community’s input. For example, the Defender Association of Philadelphia and other organizations argued in 2017 that the use of past arrest record as an input factor would be likely to exacerbate racial bias, and this concern was a factor in the commission’s decision to switch to using convictions rather than arrests.

How Smart TVs in Millions of U.S. Homes Track More Than What’s On Tonight

The New York Times, Sapna Maheshwari

from July 05, 2018

The growing concern over online data and user privacy has been focused on tech giants like Facebook and devices like smartphones. But people’s data is also increasingly being vacuumed right out of their living rooms via their televisions, sometimes without their knowledge.

In recent years, data companies have harnessed new technology to immediately identify what people are watching on internet-connected TVs, then using that information to send targeted advertisements to other devices in their homes. Marketers, forever hungry to get their products in front of the people most likely to buy them, have eagerly embraced such practices. But the companies watching what people watch have also faced scrutiny from regulators and privacy advocates over how transparent they are being with users.

Samba TV is one of the bigger companies that track viewer information to make personalized show recommendations. The company said it collected viewing data from 13.5 million smart TVs in the United States, and it has raised $40 million in venture funding from investors including Time Warner , the cable operator Liberty Global and the billionaire Mark Cuban.

Trust the machines? Funds run by artificial intelligence

Associated Press, Alex Veiga

from July 05, 2018

A computer can trounce a human chess master and solve complex mathematical calculations in seconds. Can it do a better job investing your money than a flesh-and-blood portfolio manager?

Investors willing to test that question can do so with a couple of exchange-traded funds, or ETFs, that leave the investment decisions to a computer’s so-called artificial intelligence, or AI.

ETF Managers Group and Ocean Capital Advisors launched an AI-powered fund last month dubbed the Rogers AI Global Macro ETF (BIKR) that invests primarily in single-country ETFs. The fund’s AI sifts through millions of data points from countries around the globe and uses what it learns to determine how best to allocate the fund’s holdings. (Humans carry out the trades, however.)

UK joins forces with France on artificial intelligence and data

ComputerWeekly, Liz Evenstad

from July 05, 2018

The UK and French governments have agreed on a series of collaborative measures, aiming to boost the countries’ digital economies.

This includes the UK’s Alan Turing Institute signing a deal with France’s DATAIA institute to collaborate on several research areas, including transparency in designing and implementing algorithms, sharing expertise and hosting joint workshops.

Groupe PSA and Inria create an OpenLab dedicated to artificial intelligence

Automotive World (news release)

from July 05, 2018

Groupe PSA and Inria today announced the creation of an OpenLab dedicated to artificial intelligence. The studied areas will include autonomous and intelligent vehicles, mobility services, manufacturing, design development tools, the design itslelf and digital marketing as well as quality and finance.

“Artificial intelligence will quickly become an efficiency factor for the group. The OpenLab will work on artificial intelligence algorithms enabling autonomous vehicles to drive in complex environments for example. It will also work on predictive maintenance, powertrain design optimisation and the modelling of complex systems such as cities, to offer mobility services adapted to people’s needs” said Carla Gohin, Groupe PSA’s Vice President for Research and Advanced Engineering.

European Parliament: Suspend Privacy Shield unless US fully complies

iapp, Jennifer Baker

from July 06, 2018

While the U.S. was busy celebrating Independence Day July 4 with barbecues and fireworks, the European Parliament was debating the future of the Privacy Shield deal. The conclusion? Today, Parliament voted for its suspension.

The non-binding resolution was passed 303 to 223 votes, with 29 abstentions, and calls on the executive arm of the EU, the European Commission, to suspend the data-sharing deal “unless the U.S. is fully compliant” by Sept. 1.

Privacy Shield is the “gentlemen’s agreement” that came into force in 2016 after Safe Harbor was struck down. Like its predecessor, the arrangement allows the transfer of personal data from the EU to U.S. companies that have promised to adhere to European data protection standards.

Take A Tour of HOMEChem

YouTube, National Press Foundation

from June 28, 2018

HOMEChem is the focus of a scientific collaboration of 15 university professors and 60 researchers into the chemistry of the indoors.

Events

Convergence: The Future of Policing

Caveat

from July 12, 2018

New York, NY July 12, doors open at 6:30 p.m., presented by Caveat (21 A Clinton St.) [$$]

Future of Identity

Meetup, The Hive Data

from July 11, 2018

San Francisco, CA July 11, starting at 6 p.m., Capital One (201 3rd St.). [free, registration required]

Deadlines

Call for Abstracts – DELVE2018

Thessaloniki, Greece September 26, Digital Epidemiology and Surveillance
CCS2018 Satellite Workshop. Deadline for abstract submissions is July 13.

Tools & Resources

EUA publishes Roadmap on Research Assessment

European University Association

from June 28, 2018

European University Association “has published a Roadmap on Research Assessment in the Transition to Open Science. The main objective is to raise awareness and support the EUA membership with the development of research assessment approaches that focus on research quality, potential and future impact, and that take into account Open Science practices.”

A Tutorial Introduction to Kubernetes

Ok I Give Up blog, Ulas Turkmen

from July 05, 2018

“In this tutorial, I want to document my journey of learning Kubernetes, clear up some points that tripped me as a beginner, and try to explain the most important concepts behind how it works. There is absolutely no claim of completeness; Kubernets is way too big for a blog tutorial like this.”

Careers

Full-time positions outside academia

Researcher and Advocate on Human Rights and Artificial Intelligence

Human Rights Watch; Palo Alto, CA

Full-time, non-tenured academic positions

Research Associate in High-Order Computational Fluid Dynamics (2)

Imperial College London; London, England

Sports.BradStenger.com

Data Science newsletter – July 6, 2018

Leave a Comment Cancel reply