Data Science newsletter – August 20, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 20, 2018


Data Science News

Can Electronic Health Data Solve Medicine’s Reproducibility Problem?

Columbia University, Irving Medical Center


“If you parse the medical literature you find that it’s basically a data dredging machine,” Hripcsak says. “There’s publication bias, when authors and editors tend to publish things that match the answers they were looking for, or that advance their careers, or that benefit the journal.”

With massive troves of electronic health data now available, Hripcsak and his colleagues think the time is right to change the way observational studies are conducted. Instead of performing one study and publishing (or not) one result at a time, electronic health data now allow researchers to answer thousands of questions at once.

“By disseminating all the findings, we not only provide new results on a large scale but also prevent the publication bias we see in the literature,” says Hripcsak. “The reader then sees all the results, not just the ones favored by the authors or editors, and other researchers can see a whole body of work to better judge if the methods are operating as they should.”

Report Proposes Recommendations and New Framework to Speed Progress Toward Open Science

The National Academies


While significant progress has been made in providing open access to scientific research, a range of challenges — including the economics of scientific publication and cultural barriers in the research enterprise — must be overcome to further advance the openness of science, says a new report from the National Academies of Sciences, Engineering, and Medicine. The report recommends coordinated action from the academic community and other research stakeholders, and the use of an “open science by design” framework to foster openness throughout the research process.

Open science aims to ensure the free availability of scholarly publications, the data that result from research, and the methodologies, including code or algorithms, that were used to generate those data. The National Academies were asked to provide guidance to the research community as it builds strategies for achieving open science.

The research enterprise has already made significant progress toward open science and is realizing a number of benefits, the report notes.

DNI Coats Names New IARPA Director

Office of the Director of National Intelligence


Director of National Intelligence Dan Coats announced today the selection of Dr. Stacey Dixon to be the next director of the Intelligence Advanced Research Projects Activity.

“Stacey brings extraordinary knowledge and experience to the position and I’m certain that she will maintain IARPA’s high bar for technical excellence and relevance to intelligence priorities,” said Coats. “I look forward her continued work in delivering breakthrough capabilities to partners throughout the national security community.”

A university is outfitting living spaces with thousands of Echo Dots

TechCrunch, Brian Heater


Soon, Saint Louis University students won’t be able to avoid Amazon’s near ubiquitous smart speakers. The university announced this week a plan to outfit living spaces with 2,300 Echo Dots. The devices are set to be deployed by the time classes start, later this month.

SLU is quick to note that it’s “the first college or university in the country to bring Amazon Alexa-enabled devices, managed by Alexa for Business, into every student residence hall room and student apartment on campus.” It’s certainly not the first to adopt Amazon’s smart speakers, but it’s among the largest scale for this sort of deployment.

While the product has become a mainstay in plenty of American homes, it does seem like an odd choice dorms and student campus. SLU has worked with Alexa for Business to create 100 custom questions, including, “What time does the library close tonight?” and “Where is the registrar’s office?”

Silicon Valley Writes a Playbook to Help Avert Ethical Disasters

WIRED, Gear, Arielle Pardes


Silicon Valley is having its Frankenstein moment. The monsters of today are the billion-dollar companies we’ve come to depend on for everything from search results to car rides; their creators, blindsided by what these platforms have become. Mark Zuckerberg hadn’t realized, back when he launched Facebook from his Harvard dorm room, that it would grow to become a home for algorithmic propaganda and filter bubbles. YouTube didn’t expect to become a conspiracy theorists’ highlight reel, and Twitter hadn’t anticipated the state-sponsored trolling or hate speech that would define its platform.

But should they have? A new guidebook shows tech companies companies that it’s possible to predict future changes to humans’ relationship with technology, and that they can tweak their products so they’ll do less damage when those eventual days arrive.

The guide, called the Ethical OS, comes out of a partnership between the Institute of the Future, a Palo Alto-based think tank, and the Tech and Society Solutions Lab, a year-old initiative from the impact investment firm Omidyar Network.

Invisible Institute Relaunches the Citizens Police Data Project

The Intercept, Jamie Kalven


Today the Invisible Institute, in collaboration with The Intercept, releases the Citizens Police Data Project 2.0, a public database containing the disciplinary histories of Chicago police officers. The scale of CPDP is without parallel: It includes more than 240,000 allegations of misconduct involving more than 22,000 Chicago police officers over a 50-year period. The data set is complete for the period 2000 to 2016; substantially complete back to 1988, and includes some data going back as far as the late 1960s.

As Nvidia expands in artificial intelligence, Intel defends turf

Reuters, Stephen Nellis


“For the next 18 to 24 months, it’s very hard to envision anyone challenging Nvidia on training,” said Jon Bathgate, analyst and tech sector co-lead at Janus Henderson Investors.

But Intel processors already are widely used for taking a trained artificial intelligence algorithm and putting it to use, for example by scanning incoming audio and translating that into text-based requests, what is called “inference.”

Intel’s chips can still work just fine there, especially when paired with huge amounts of memory, said Bruno Fernandez-Ruiz, chief technology officer of Nexar Inc, an Israeli startup using smartphone cameras to try to prevent car collisions.

Infographic: The Data Scientist Shortage



The supply of information stored digitally and accessible to web users worldwide currently amounts to around 4.4 Zettabytes or 4.4 Trillion Gigabytes. Of that information, most of it was produced in the last 2 years; more than has been produced since the beginning of recorded information from worldwide civilizations. Statistics like these point to a promising career in data science for anyone with the skills and interest to pursue this field in the 21st Century, whether one wishes to start from the first year of university or redirect his or her track midway.

Given the current employment crisis featured in the infographic below, developed by our friends over at the University of California, Riverside, even individuals who have pursued programming and technical programs at high school could be thrust into more demanding positions in the work place.

D.E. Shaw Launches Machine Learning Unit

Institutional Investor, Amanda Cantrell


Multistrategy hedge fund firm D.E. Shaw Group has formed an independent machine learning research and developmental group, the company said Tuesday. This group is to run parallel to the quantitatively oriented firm’s longstanding efforts in the area.

The Machine Learning and Research Group will be run by Pedro Domingos, who joins as a managing director.

AI Is the Future—But Where Are the Women?

WIRED, Business, Tom Simonite


For all their differences, big tech companies agree on where we’re heading: into a future dominated by smart machines. Google, Amazon, Facebook, and Apple all say that every aspect of our lives will soon be transformed by artificial intelligence and machine learning, through innovations such as self-driving cars and facial recognition. Yet the people whose work underpins that vision don’t much resemble the society their inventions are supposed to transform. WIRED worked with Montreal startup Element AI to estimate the diversity of leading machine learning researchers, and found that only 12 percent were women.

That estimate came from tallying the numbers of men and women who had contributed work at three top machine learning conferences in 2017. It suggests the group supposedly charting society’s future is even less inclusive than the broader tech industry, which has its own well-known diversity problems.

At Google, 21 percent of technical roles are filled by women, according to company figures released in June. When WIRED reviewed Google’s AI research pages earlier this month, they listed 641 people working on “machine intelligence,” of whom only 10 percent were women. Facebook said last month that 22 percent of its technical workers are women. Pages for the company’s AI research group listed 115 people earlier this month, of whom 15 percent were women.

An interview with Meredith Broussard

Medium, NYU Center for Data Science


Meredith Broussard is an affiliate faculty member at the Moore-Sloan Data Science Environment at the NYU Center for Data Science and an assistant professor at the Arthur L. Carter Journalism Institute of New York University. On July 30th, 2018, Broussard discussed her latest book, “Artificial Unintelligence: How Computers Misunderstand the World” with Sabrina de Silva, Content Writer at the NYU Center for Data Science. See below for the interview. Broussard’s research focuses on artificial intelligence in investigative reporting, with an emphasis on using data analysis for social good.

Building a Community of Changemakers in AI

Medium, AI4ALL


Once high schoolers take their first step into the AI field by graduating from an AI4ALL summer program at one of our six sites, they become AI4ALL alumni and join our thriving community that is nearly 250 strong. These young women and men range in age from high school sophomore to college sophomore, and they hail from around the US and around the world, from places like Canada, Thailand, and Saudi Arabia.

To continue building on their diverse and multidisciplinary interests in AI, we offer them access to resources and programming like: grant funding to support their own outreach or research initiatives, connection with mentors and peers, and opportunities to participate in field trips, internships, and conferences. Read on to see how AI4ALL alumni have taken these opportunities and made them their own.


Midwest Big Data Hub Digital Agriculture

Midwest Big Data Hub


Lincoln, NE September 20-21. “A one-day workshop on [unmanned aircraft systems] in agriculture on the use of UAS systems and basic data acquisition and processing.” [registration required]

2018 NumFOCUS Project Forum for Core Developers and Critical Users



New York, NY September 24-25. “A unique opportunity for critical users of NumFOCUS open source projects to directly interact with the core developers.” [$$$]


Apply to Demo: NYC Media Lab Demo Expo

NYC Media Lab will select 100 of the most innovative demos from across the City’s university ecosystem to participate in the NYCML’18 Demo Expo.” Expo takes place September 20 at The New School. Deadline for submissions is September 6.
Tools & Resources

More efficient security for cloud-based machine learning

MIT News


In a paper presented at this week’s USENIX Security Conference, MIT researchers describe a system that blends two conventional techniques — homomorphic encryption and garbled circuits — in a way that helps the networks run orders of magnitude faster than they do with conventional approaches.

The researchers tested the system, called GAZELLE, on two-party image-classification tasks. A user sends encrypted image data to an online server evaluating a CNN running on GAZELLE. After this, both parties share encrypted information back and forth in order to classify the user’s image. Throughout the process, the system ensures that the server never learns any uploaded data, while the user never learns anything about the network parameters. Compared to traditional systems, however, GAZELLE ran 20 to 30 times faster than state-of-the-art models, while reducing the required network bandwidth by an order of magnitude.

How FiveThirtyEight’s House Model Works

FiveThirtyEight, Nate Silver


We’ve been publishing election models for more than 10 years now, and FiveThirtyEight’s 2018 House model is probably the most work we’ve ever put into one of them. That’s mostly because it just uses a lot of data. We collected data for all 435 congressional districts in every House race since 1998, and we’ve left few stones unturned, researching everything from how changes in district boundary lines could affect incumbents in Pennsylvania to how ranked-choice voting could change outcomes in Maine.

Not all of that detail is apparent upon launch. You can see the topline national numbers, as well as a forecast of the outcome in each district. But we’ll be adding a lot more features within the next few weeks, including detailed pages for each district. You may want to clip and save this methodology guide for then. In the meantime, here’s a fairly detailed glimpse at how the model works.

Don’t Do This in Production · Stephen Mann

Stephen Mann


The department that built the product had recently come into existence, and they hired a team of developers without having a technical person on staff to vet them. It’s difficult enough for a technical person to vet a developer – I can’t even imagine vetting a candidate without having a technical background. They hired the first developer, and he vetted the second developer, and so on until they had a development team.

If you’re lucky enough for your first developer to have significant experience and a desire to mentor, then you’re golden. If you’re unlucky, however – and it’s very easy to be unlucky at something like this – then you may end up with a very fast moving team that builds very fragile software.

I Like Julia Because It Scales and Is Productive: Some Insights From A Julia Developer

The Winnower, Christopher Rackauckas


In this post I would like to reflect a bit on the Julia programming language. These are my personal views and I have had more than a year developing a lot of packages for the Julia programming language. After roaming around many different languages including R, MATLAB, C, and Python; Julia is finally a language I am sticking to. In this post I would like to explain why. I want to go back through some thoughts about what the current state of the language is, who it’s good for, and what changes I would like to see. My opinions changed a lot since first starting to work on Julia, so I’d just like to share the changed mindset one has after using the language deeply.


Internships and other temporary positions

Pre-PhD Early-career Scholars

Carnegie Mellon University, Lab for Social Minds; Pittsburgh, PA
Full-time positions outside academia

Artificial Intelligence & Machine Learning Engineer

STATS; Chicago, IL

Leave a Comment

Your email address will not be published.