Data Science newsletter – July 27, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 27, 2018

GROUP CURATION: N/A

 
 
Data Science News



Using DCNNs to Differentiate Computer Generated Graphics & Photographic Images

Medium, NYU Center for Data Science


from

Suzanne McIntosh, CDS affiliated faculty member and Clinical Associate Professor of Computer Science at NYU, worked with Qi Cui of Nanjing University of Information Science and Technology and the Jiangshu Engineering Centre of Network Monitoring and Huiyu Sun of NYU to explore ways to differentiate between computer-generated graphic (CG) and photographic images (PI) through the use of deep convolutional neural networks (DCNNs).


AI Gets Boost From Uncle Sam

EE Times, Rick Merritt


from

Like most people in high tech these days, Uncle Sam is upping his investment in AI. The Defense Advanced Projects Research Agency (DARPA) announced a new program on Friday that will add an estimated $100 million a year to its current spending on machine learning.

The AI Exploration program will initially spend about $10 to 20 million to fund a set of feasibility studies to generate ideas worth pursuing. They are expected to spawn larger projects that will eventually bump DARPA’s total estimated spending on AI in its various forms to as much as $400 million annually.

“We hope we will have a conference next summer focusing on what we will do in AI,” said Steven Walker, DARPA’s director, speaking at an event here on the agency’s Electronics Resurgence Initiative.


What Are Machine Learning Models Hiding?

Princeton CITP, Freedom to Tinker blog, Vitaly Shmatikov


from

Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries, location traces, and health-care records.

In a recent series of papers, we uncovered multiple privacy and integrity problems in today’s ML pipelines, especially (1) online services such as Amazon ML and Google Prediction API that create ML models on demand for non-expert users, and (2) federated learning, aka collaborative learning, that lets multiple users create a joint ML model while keeping their data private (imagine millions of smartphones jointly training a predictive keyboard on users’ typed messages).


AI is hurting people of color and the poor. Experts want to fix that

CNN, Heather Kelly


from

Much of the focus on any downsides of artificial intelligence has been on things like crashing self-driving cars and the rise of machines that kill. Or, as CNN commentator Van Jones put it at a discussion on the topic last week, “What about Terminator?”

But many of the researchers behind this technology say it could pose a greater threat to society by adversely impacting the the poor, the disenfranchised, and people of color.

“Every time humanity goes through a new wave of innovation and technological transformation, there are people who are hurt and there are issues as large as geopolitical conflict,” said Fei Fei Li, the director of the Stanford Artificial Intelligence Lab. “AI is no exception.”


How long will the UK stay the world leader in opendata?

Medium, Nick M Halliday


from

Many government officials (in the data environment) around the world put some weight on the international rankings for Open Data produced by people like the Open Data Barometer or Open Knowledge International. If their own country goes up in the charts they like to blow their own trumpets or if they go down, gnash their teeth. The same applies in the UK. A few weeks ago I heard someone very senior in UK government digital put up a slide saying ‘hurrah’ the UK is recognised as a world leader.

All these rankings should be viewed with some caution. They each have their own methodological approaches and individual quirks. Fortunately there is a programme of work by a number of these assessment organisations to synchronise their work and develop some more consistent approaches. If you want to know more talk to Danny Lammerhirt at Open Knowledge (easy to find on Twitter).


Government Data Science News

San Francisco is considering copying Mountain View by banning plans for new offices that include cafeterias. This would mean NO FREE MEALS AT BAY AREA STARTUPS. Existing buildings are grandfathered in and could still offer their tech workers free meals. New buildings would have to be catered? Or eat ramen? It’s unclear how these startups will respond. I expect more catering apps to pop up to solve this looming gap in the instant gratification cocoon known as Silicon Valley. The ostensible goal is to get some of those tech workers to walk out and buy lunch at local establishments. Or, at the very least, open an app and get lunch delivered.



Whoa, Canada. Toronto created more tech jobs in 2017 than Seattle, New York, DC, and the Bay Area COMBINED. Ottawa’s growth rate put it as the fastest growing tech market in North America. I would imagine a combination of cost of living, immigration policy, and the specific growth of artificial intelligence positions led to this Canadian surge. It should come as no surprise to readers of this newsletter where I have frequently called out Canadian policies, research, and industry-academic collaborations.

DARPA set up a fund to distribute $1.5b over five years for its Electronics Resurgence Initiative which is mostly going towards chip development for automated processes. Lots of the work will be at U.S. universities.



DARPA also announced a new AI Exploration fund that will add ~$100 m per year to what it already spends on machine learning research and development. It kicks off with $10-$20m for feasibility studies that will eventually lead to more like a $400m annual AI spend.



DARPA is essentially a VC for academia, in case that wasn’t already clear.



The US Census Bureau needs to galvanize itself against cybersecurity threats as 2020’s first digital census gets closer and closer to reality. Right now, the threats outlined by Mary McCord, former acting assistant attorney general for national security at the Department of Justice, are vague and hand-wavy. Still, her recommendation that they hire an outside firm to conduct an audit is reasonable. After all, the Census contains information on every single American citizen.



The National Institutes of Health has delayed enforcing new guidelines that define basic brain and neurological studies as clinical trials. Many of the studies don’t test treatments or make treatment recommendations so researchers balked at having to pre-register them with clinicaltrials.gov and undertake other tedious bureaucratic efforts. Negotiations about what counts as a clinical trial can – and certainly will – continue until September 2019.


Google AI Chief Jeff Dean’s ML System Architecture Blueprint

Medium, Synced


from

ML has revolutionized vision, speech and language understanding and is being applied in many other fields. That’s an extraordinary achievement in the tech’s short history and even more impressive considering there is still no dedicated ML hardware.

Back in January, Google AI Chief and former head of Google Brain Jeff Dean co-published the paper A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution with Turing Award winner and computer architect David Patterson. The paper encouraged Machine Learning (ML) experts and computer architects to “work together to design the computing systems required to deliver on the potential of ML.”

At this month’s Tsinghua-Google AI Symposium in Beijing, Dean discussed trends regarding the kinds of models scientists want to train. Google Brain research scientist Azalia Mirhoseini meanwhile gave a presentation on autoML with Reinforcement Learning at the same event.


Law Firms, Artificial Intelligence, And The Fork In The Road

Above the Law, Nicole Black


from

According to [Patrick Fuller], AI is a technology that has incredible potential and some larger law firms and in-house counsel are already using AI tools very effectively. However, as he explained, as is often the case with all types of newer technology, there are cultural impediments that prevent its immediate adoption: “The biggest obstacles for getting firms to adopt AI tend to be combinations of different factors — most notable are attorney compensation structure, the matter’s fee arrangement, and the practice area as well.”

This idea is examined in the report, and the author highlights how traditional law firm cultural traits can deter technology adoption. This is especially so in the face of the rapid technological advancements that have occurred in recent years, and the fact that AI adoption necessarily impacts more than simply lawyer workflows


The hackers teaching old DNA sequencers new tricks

Nature, Technology Feature, Jeffrey M. Perkel


from

In a basement storeroom at Stanford University in California, the guts of a dozen DNA sequencers lie exposed — hundreds of thousands of dollars worth of cameras and lasers, optics and fluid controllers, all scavenged from a late-model, next-generation Illumina DNA sequencer called GAIIx. On the floor, the shell of one old instrument sits empty, picked over like a carcass. “I seem like a hoarder,” says Stanford biophysicist William Greenleaf.

But over the past 6 years, this collection has fuelled an effort that has engaged about half of Greenleaf’s 18-member lab team. Whereas most researchers use DNA sequencers to, well, sequence DNA, Greenleaf’s team is one of a small number that has repurposed the devices for an entirely different goal: to study protein and nucleic-acid biochemistry on a massive scale, from macromolecular interactions and RNA folding to enzyme function.

“It’s a revolutionary technology,” says Stanford biochemist Dan Herschlag, who uses it to study interactions between RNAs and other molecules. It provides “deep and broad quantitative information”, he says, “that allows researchers to build more-precise biophysical and cellular models for molecular interactions, and which is also a critical step towards a truly predictive understanding of biological systems”.


MIT researcher warned Amazon of bias in facial recognition software

TheHill, Ali Breland


from

An MIT researcher warned Amazon CEO Jeff Bezos last month that the company’s controversial facial recognition software is most inaccurate on women and individuals of color.

In a June 25 letter, which was not reported on at the time, Joy Buolamwini, who also founded the Algorithmic Justice League to combat bias coded into algorithms, said the biased technology could exacerbate existing racial biases in policing.


Health tech flaw: Patients hate going to the doctor

CNBC, Christina Farr


from

Medical experts say that digital health can’t do much for users that are already sick, or at high risk of a serious medical condition. Many of these companies won’t diagnose disease for regulatory reasons, even if they’re picking up strong signals through sensors and algorithms, so instead they’ll suggest that a user see their doctor.

That’s the case with smart watches from Apple and Fitbit, which are increasingly picking up on serious problems like sleep apnea, a sleep breathing issue, and atrial fibrillation, a heart rhythm disorder, but can’t do much more than steer these patients to medical care.

The glaring problem? Patients do not like going to the doctor.


Vertical Spotlight: Machine Learning for financial fraud

Algorithmia Blog


from

For every dollar of fraud that financial services companies suffer, they incur $2.67 in costs to their business. With more entry points in the digital age and increasingly sophisticated attackers, tackling fraud manually is quickly fading to irrelevance: but Machine Learning offers a promising way to automate the process, as well as surface more nuanced fraud patterns.

This post will walk through the challenges of applying ML models to fraud detection, popular applications, and tradeoffs to think about in model selection.


Your Tweets Are Somehow Worthy Of Scientific Study

FiveThirtyEight, Maggie Koerth-Baker


from

I’ve been a Twitter user since March 2009. In that time, I’ve posted more than 32,000 messages to the social media site, and I’ve liked more than 6,000 tweets from other people. More than likely, that means I’ve also been a social science guinea pig. But I’m not special: If you’re a habitual Twitter user, your tweets were probably harvested alongside mine.

“If you’re on Twitter and tweeting publicly, you’re part of a data set somewhere,” said Nick Proferes, a professor of information studies at the University of Kentucky. Twitter is a popular research tool, but it’s also a very new one. Social scientists are still debating whether it’s OK to collect and analyze tweets without users’ knowledge — and what the ethical norms of studying publicly available social media data should be.

Researchers use Twitter for a huge range of subjects: measuring how people cope with global crises and their aftermaths, tracking geographic differences in public health and analyzing the behavior of automated “bot” accounts during the 2016 presidential debates, to name a few. At least 25 billion tweets were collected and analyzed in scholarly research published from 2007 to 2012, according to a paper Proferes published with a colleague in 2014. They counted 382 papers in just the first six years that Twitter existed.1


How the Science Cloud could pay its way

Science|Business, David Pringle


from

The best way for the EU to get its bold plan for an open science cloud up and running is to focus initially on meeting the needs of the 725,000 researchers working in higher education.

That’s one of the key conclusions from a new paper published by Science|Business, which says the the open science cloud’s top priority should be to make it straightforward and cost-effective for publicly-funded EU researchers to meet the growing requirement to open their results and data to other researchers. On the basis that Member States’ science ministries and the EU institutions fund approximately two-thirds of the research conducted in EU universities, the paper estimates the higher education sector could spend up to €2.25 billion a year on services available through the new European Open Science Cloud (EOSC).

Once it is launched in November 2018, the EOSC could evolve into a multi-sided marketplace, akin to Amazon.com or Apple’s App Store, in which the value to each market segment depends on the participation of other market segments. After it reaches a critical mass, the EOSC will, like other marketplaces, be subject to so-called network effects (it grows in value as it attracts more participants).

 
Events



Data Science Salon | New York – Applying AI and Machine Learning To Media and Entertainment

Formulated By


from

New York, NY September 27. “The DataScience.SALON brings together practitioners face-to-face to educate each other, illuminate best practices, and innovate new solutions.” [$$$]


PyData Los Angeles

NumFOCUS, PyData


from

Los Angeles, CA October 21-23 at University of Southern California. [$$$]

 
Tools & Resources



The five Cs – Five framing guidelines to help you think about building data products.

O'Reilly Radar; Mike Loukides, Hilary Mason and DJ Patil


from

What does it mean to treat others’ data as you would treat your own? How many data scientists have actually thought about how their own data might be used and abused? And once you know how you’d like to see your data (and others’ data) respected, how do you implement those ideas? The golden rule isn’t enough by itself. We need guidelines to force discussions with the application development teams, application users, and those who might be harmed by the collection and use of data.

Five framing guidelines help us think about building data products. We call them the five Cs: consent, clarity, consistency, control (and transparency), and consequences (and harm). They’re a framework for implementing the golden rule for data. Let’s look at them one at a time.

 
Careers


Full-time positions outside academia

Data Engineer



Oak Ridge National Laboratory; Oak Ridge, TN

Senior Data Engineer



Obsidian Security; Newport Beach, CA

Director of Global Community



DataKind; Brooklyn, NY

Software Engineer



Pivot Bio; Berkeley, CA
Full-time, non-tenured academic positions

Research Software Engineer (Fixed Term)



University of Cambridge, Department of Genetics; Cambridge, England

Assistant Professor in Digital Design



Emerson College, Department of Journalism; Boston, MA

Leave a Comment

Your email address will not be published.