Data Science newsletter – June 25, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 25, 2018


Data Science News

Study: Charts change hearts and minds better than words do

The Washington Post, Wonkblog, Christopher Ingraham


A poll taken after the 2016 presidential election found, for instance, that more than half of people who voted for Donald Trump incorrectly believed that President Barack Obama was born in Kenya. Ten years earlier, a Scripps Howard/Ohio University survey found that just over half of Democrats wouldn’t rule out the possibility that “people in the federal government either assisted in the 9/11 attacks or took no action to stop the attacks because they wanted [the] United States to go to war in the Middle East.”

How do you combat misperceptions like these? That’s the subject of a new paper published in the Journal of Elections, Public Opinion and Parties by Brendan Nyhan of Dartmouth College and Jason Reifler of the University of Exeter.

Through survey experiments, Nyhan and Reifler arrived at a surprising answer: charts. “We find that providing participants with graphical information significantly decreases false and unsupported factual beliefs.” Crucially, they show that data presented in graphs and illustrations does a better job of fighting misperceptions than the same information presented in text form.

Google Home’s data leak proves the IoT is still deeply flawed

Wired UK, Matt Burgess


Tripwire security researchers found the Google Home and Google Chromecast could leak location data through unauthorised network connections. The IoT’s security issues run much deeper

Bias detectives: the researchers striving to make algorithms fair

Nature, Rachel Courtland


“What concerns me most is the idea that we’re coming up with systems that are supposed to ameliorate problems [but] that might end up exacerbating them,” says Kate Crawford, co-founder of the AI Now Institute, a research centre at New York University that studies the social implications of artificial intelligence.

With Crawford and others waving red flags, governments are trying to make software more accountable. Last December, the New York City Council passed a bill to set up a task force that will recommend how to publicly share information about algorithms and investigate them for bias. This year, France’s president, Emmanuel Macron, has said that the country will make all algorithms used by its government open. And in guidance issued this month, the UK government called for those working with data in the public sector to be transparent and accountable. Europe’s General Data Protection Regulation (GDPR), which came into force at the end of May, is also expected to promote algorithmic accountability.

Why Women Don’t Code

Quillette, Stuart Reges


Ever since Google fired James Damore for “advancing harmful gender stereotypes in our workplace,” those of us working in tech have been trying to figure out what we can and cannot say on the subject of diversity. You might imagine that a university would be more open to discussing his ideas, but my experience suggests otherwise.

For the last ten months I have been discussing this issue at the Allen School of Computer Science & Engineering where I work. I have tried to understand why Damore’s opinions generated such anger and have struggled to decide what I want to do in response. As a result of my attempts to discuss this, our mailing list known as ‘diversity-allies’ is now a moderated list to prevent “nuanced, and potentially hurtful, discussion.” Instead, I have been encouraged to participate in face-to-face meetings that have often been tense, but which have helped me to understand where others are coming from.

I embarked on this journey because I worry that tech companies and universities are increasingly embracing an imposed silence, in which one is not permitted to question the prevailing wisdom on how to achieve diversity goals. I intend to fight this imposed silence and I encourage others to do the same. We can’t allow the Damore incident to establish a precedent.

Why don’t women code? A UW lecturer’s answer draws heat

The Seattle Times, Katherine Long


For thousands of undergrads at the University of Washington, Stuart Reges is the man who introduces them to computer science — a senior lecturer who teaches entry-level coding in a packed lecture hall every quarter, and has won awards for his teaching.

Now Reges has waded into one of tech’s most volatile issues by arguing that the number of women going into the industry has stagnated because women simply aren’t interested in computer science.

His 4,580-word essay headlined “Why Women Don’t Code,” is nuanced, and difficult to summarize in a few sentences. But it provoked an immediate backlash at the UW, where there has been a long-running effort to increase the number of women and people of color in computer science and engineering.

Why it’s Exhausting to be a Woman in Tech/CS

Margo Seltzer, Misinformation blog


So, what has happened in the past 24 hours:

  • I spent an hour double and triple checking data that demonstrated clearly that several years ago, both I and another female colleague were being paid $20,000 less per year than our male peers (i.e., colleagues with pretty much identical credentials, years since degree, experience, etc). I do not know what the status is today, because salaries are a huge secret at my current institution; a practice that allows this kind of thing to happen.
  • Talked with a junior woman about a complicated situation.
  • Read a thoughtful blog post by someone I believe is well-meaning, attributing the lack of gender diversity to “women just don’t want to do CS.” (This is admittedly a gross over-simplification of his argument, but I’ll come back to this later.)
  • Read a wonderful response to said blog post.
  • In response to my forwarding of items 3 and 4, read this post about gender and board gaming (who knew?).
  • Continued my slog through the 350-page National Academies report on sexual harassment in Science, Engineering, and Medicine in Academia.
  • Agreed to write a letter of recommendation for a woman Ph.D., who has been working in industry at a research lab for six years, has been promoted twice, and still hasn’t been given a green card.
  • Wrote a letter of recommendation for a female colleague, being nominated for an award by another female colleague. Why is it that men virtually never get ask me to write award letters for women? Think about that: under 20% of my colleagues are women, but it is almost always the women who think to nominate women for awards.

  • Adobe set to join rush of foreign giants opening AI labs in Canada

    The Globe and Mail, Sean Silcoff


    Adobe Systems Inc. is the latest foreign technology giant planning to open an artificial intelligence lab in Canada.

    The Silicon Valley software giant, best known for document-creation products Photoshop and Acrobat, says it is looking for a Toronto-based AI lab director to “establish a cutting-edge research lab in artificial intelligence, that will both push the state-of-the-art and have a profound impact on Adobe’s products,” according to a job posting on LinkedIn. An Adobe spokesperson declined to elaborate on the company’s plans, saying it does “not comment on rumours or speculation.”

    U.S. Supreme Court Bolsters Mobile-Phone Privacy Rights

    Bloomberg Politics, Greg Stohr


    Law enforcement officials generally need a warrant to get mobile-phone tower records that show someone’s location over an extended period, the U.S. Supreme Court ruled in a decision that bolsters digital privacy rights.

    Voting 5-4, the court ruled Friday in favor of Timothy Ivory Carpenter, who said prosecutors violated the Constitution when they obtained four months of phone data and used it at trial to show he was near the sites of a string of armed robberies. Chief Justice John Roberts joined the court’s liberals in the majority.

    The ruling could have a far-reaching impact. Prosecutors seek phone-location information from telecommunications companies in tens of thousands of cases a year. Prosecutors in most parts of the country had been able to demand that data from mobile-phone carriers without showing the “probable cause” required to get a warrant.

    German president visits Palo Alto, accompanied by CHP officers in bow ties

    Palo Alto Daily Post, Allison Levistsky


    On Tuesday, Steinmeier attended an event on the proliferation of artificial intelligence at the Stanford Faculty Club, then met briefly with former Stanford political science professor Condoleezza Rice, who he knows from when she served as U.S. secretary of state under President George W. Bush.

    Steinmeier then had dinner in San Francisco with Gov. Jerry Brown, Heinz said. Yesterday morning, Steinmeier met with representatives of trade unions, with whom he spoke about the future of work and how unions operate in Silicon Valley.

    Facial recognition technology is here to stay. And we should all probably get over it

    Quartz, Dave Gershgorn


    For the past few years, the world’s biggest tech companies have been on a mission to put artificial-intelligence tools in the hands of every coder. The benefits are clear: Coders familiar with free AI frameworks from Google, Amazon, Microsoft, or Facebook might be more inclined to someday work for one of those talent-starved companies. Even if they don’t, selling pre-built AI tools to other companies has become big business for Google, Amazon, and Microsoft.

    Today these same companies are under fire from their employees over who this technology is being sold to, namely branches of the US government like the Department of Defense and Immigration and Customs Enforcement. Workers from Google, Microsoft, and now Amazon have signed petitions and quit in protest of the government work. It’s had some impact: Google released AI ethics principles and publicly affirmed it would not renew its contract with the Department of Defense in 2019. Microsoft told employees in an email that it wasn’t providing AI services to ICE, though that contradicts earlier descriptions of the contract on the company’s website, according to Gizmodo.

    This debate, playing out in a very public manner, marks a major shift in how tech companies and their employees talk about artificial intelligence. Until now, everyone has been preaching the Gospel of Good AI: Microsoft CEO Satya Nadella has called it the most transformational technology of a generation, and Google CEO Sundar Pichai has gone even further, saying AI will have comparable impact to fire and electricity

    S3 Partners Works With Nasdaq to Track Real-Time Short Interest



    Nasdaq’s portal will allow S3’s real-time analytics to monitor short interest changes between exchange reporting periods.

    Financial technology and analytics firm S3 Partners announced it has finalized an agreement with Nasdaq to provide their listed companies with real-time short interest analytics. Nasdaq’s portal will provide access to S3’s tools to allow companies to stay apprised of changes between exchange reporting periods.

    In addition to real-time short interest data, Nasdaq listed companies will also have access to analytics for long selling and short selling price indicators and lending and borrowing rates. The real-time short interest forecasts will provide data on a percentage of float, per share and notational basis instantly rather than the traditional three-week wait for publication.

    DNA Barcodes That Reliably Work: A Game-Changer for Biomedical Research

    University of Texas at Austin, UT News


    In the same way that barcodes on your groceries help stores know what’s in your cart, DNA barcodes help biologists attach genetic labels to biological molecules to do their own tracking during research, including of how a cancerous tumor evolves, how organs develop or which drug candidates actually work. Unfortunately with current methods, many DNA barcodes have a reliability problem much worse than your corner grocer’s. They contain errors about 10 percent of the time, making interpreting data tricky and limiting the kinds of experiments that can be reliably done.

    Now researchers at The University of Texas at Austin have developed a new method for correcting the errors that creep into DNA barcodes, yielding far more accurate results and paving the way for more ambitious medical research in the future.

    The team — led by postdoctoral researcher John Hawkins, professor Bill Press and assistant professor Ilya Finkelstein — demonstrated that their new method lowers the error rate in barcodes from 10 percent to 0.5 percent, while working extremely rapidly. They describe their method, called FREE (filled/truncated right end edit) barcodes, today in the journal Proceedings of the National Academy of Sciences.


    BankAI 2018

    American Banker


    Austin, TX September 27-28. “Black boxes begone: Moving forward with AI that’s transparent, explainable and ethical” [$$$$]

    PyBay 2018

    SF Python


    San Francisco, CA Workshops: August 16-17. Talks: August 18-19. “3rd Annual Regional Python Conference for the Community by the Community” [$$$]

    Tools & Resources

    Microsoft Research Open Data

    Microsoft Research


    “A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences. Download or copy directly to a cloud-based Data Science Virtual Machine for a seamless development experience.”

    egeria | Open Metadata and Governance

    Mandy Chessell


    Egeria provides the Apache 2.0 licensed open metadata and governance type system, frameworks, APIs, event payloads and interchange protocols to enable tools, engines and platforms to exchange metadata in order to get the best value from data whilst ensuring it is properly governed.

    Scoping a data science project — part 1

    Medium, Elise Huard


    In some blessed, rare situations, a data science project starts off with a crystal clear goal in mind. Here is the data, this is what we want to know, can we find out what we want to know or estimate?

    It’s much more common, unfortunately, to have a stakeholder/customer who has access to a lot of data, and expects some valuable, but only very vaguely defined defined insights. A crucial and non-trivial part of starting up a data science project is to find out what exactly we can and should do in the context of the project — let’s call this the discovery phase.

    My goal in this blog post and the next one is to provide some pointers that could help others get through this difficult stage.

    A Beginner’s Guide to Data Engineering — The Series Finale

    Medium, Robert Chang


    “At Airbnb, we have more than 100+ contributors who authored Airflow pipelines. This makes enforcing ETL best practices, upholding data quality, and standardizing workflows increasingly challenging. Luckily, one of the antidotes to complexity is the power of abstraction. This principle, of course, is no exception when it comes to data engineering.”


    Full-time positions outside academia

    Qualitative Researcher–AI Design

    Capital One; McLean, Va

    Leave a Comment

    Your email address will not be published.