Data Science newsletter – May 5, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 5, 2017

GROUP CURATION: N/A

 
 
Data Science News



Disks back from the dead

Nature News & Comment, Monya Baker


from

In 2012, William Parker and his colleagues went hunting for a data set that tracked the growth of more than 50,000 carefully planted white spruce trees over a decade and a 1,500-kilometre range. They found a reel of computer tape, a relatively modern 3.5-inch diskette and a box of older 5.25-inch diskettes. These contained data from field trials in the late 1970s, which were set up to improve yields of commercial timber. Parker, who is at the Ontario Forest Research Institute in Sault Saint Marie, Canada, needed ways to evaluate how strategies such as ‘assisted migration’ might preserve forests on a warming planet, and this long-term systematic study was just the thing, he says. “When we found it, it was like, ‘Eureka! Hallelujah! We’ve finally got it!’.”

Not so fast. Parker booted up an old computer, but it could not read the newest disk. No one had equipment to even try the others.

Parker’s IT services referred him to a data-retrieval company. The older disks turned out to be ‘flippies’, double-sided disks written in formats that few drives can read. The specialists were ultimately able to read them using a carefully placed hole puncher, a bit of digital forensics and some programming that converted ancient software to modern spreadsheets.


Integrity starts with the health of research groups

Nature News & Comment, Editorial


from

Last month, the US National Academies of Sciences, Engineering, and Medicine published a report called Fostering Integrity in Research. Later this month, the 5th World Conference on Research Integrity will be held in Amsterdam. Over the years, universities have followed some funders’ mandates to improve the prevention and investigation of misconduct. Many discussions have been held about unreliable research.

None of these initiatives pays sufficient attention to a specific issue: the research health of research groups and the people who lead them. This includes technical robustness of lab practices, assurance of ethical integrity and the psychological health and well-being of group members. Principal investigators (PIs), the linchpins of the scientific process and of integrity, are under ever-increasing pressures from many sources.


3Q: Refocusing climate research in a new era

MIT News, Lauren Hinkel and Paul O’Gorman


from

In terms of public discussion and to some extent scientific discussion, the two big questions have been: Is the world warming? And, if so, how much are humans contributing to it? And those are very simple questions that are easy to communicate to someone who’s not a specialist in this area. But those two questions are no longer as interesting, in terms of science, because we know the world is warming, and we know that humans are contributing substantially to it. So our aim for this paper was to try and frame similarly simple questions that catch the imagination, that are easy to understand yet scientifically important, and that speak to the exciting research that’s going on to push forward into the unknown.

And so we came up with – fairly easily actually – three questions: Where does the carbon go? Second, how does the weather change with climate? Third, how does climate influence the habitability of the Earth and its regions? These are not new questions, but we think they encapsulate a lot of what’s important and exciting in current climate research.


Make AI Computing 100 Times Faster

Medium, Synced


from

The whole industry is going deep in hardware that accelerates the training speed of machine learning models. Google, for example, built the Tensor Processing Unit, or TPU. Microsoft adopted the field programmable gate array, or FPGA. Meanwhile, Intel made a progressive move with the acquisition of Altera, a manufacturer of programmable logic devices, for $16.7 billion.

IBM is seeking collaboration with academia, and last year announced a multi-year cognitive computing research project with the University of Illinois at Urbana-Champagne (UIUC). Undoubtedly, the industry leader came for Professor Wen-Mei Hwu, one of the foremost experts in parallel computing at UIUC, someone with 10 years’ experience in GPU and 30 years’ experience in computer architecture. IBM’s ambitious objective is to increase machine training speed by 100 times.


Artificial Intelligence and Law

Medium, Dr. Michael Malterer


from

During the past decades, we have been witnessing technology revolutionizing our daily lives, our business models, and our way of thinking. It is therefore somewhat of a riddle how such a sweeping technological advancement could take place without any groundbreaking changes in law. Our society has somehow managed to brush aside most legal issues while otherwise successfully adapting to a way of life in which technology is indispensable.

It is high time things changed, though. With presumably more than 20 billion devices connected to the Internet by 2020, even the most conservative law-makers now agree that the legal status quo is not only outdated but could potentially lead to a vast number of far-reaching conflicts and diminish the positive impact of technology on our society.


Nvidia Lets You Peer Inside the Black Box of Its Self-Driving AI

MIT Technology Review, Will Knight


from

In a step toward making AI more accountable, Nvidia has developed a neural network for autonomous driving that highlights what it’s focusing on.


Melinda Gates and Fei-Fei Li Want to Liberate AI from “Guys With Hoodies”

Backchannel, Jessi Hempel


from

These two female technologists discuss the promises of artificial intelligence — and how to diversify the field.


New math techniques to improve computational efficiency in quantum chemistry

DOE/Sandia National Laboratories


from

Researchers have developed new mathematical techniques to advance the study of molecules at the quantum level. Mathematical and algorithmic developments along these lines are necessary for enabling the detailed study of complex hydrocarbon molecules that are relevant in engine combustion.


Deep Learning Helps Scientists Keep Track of Cell’s Inner Parts

University of Toronto, Donnelly Centre


from

High throughput screens of image-based data allow a direct view of proteins’ whereabouts in the cell but the lack of fast and accurate analysis tools has been a bottleneck. Scientists reveal DeepLoc, a deep learning algorithm that is faster and more accurate than the human eye and brings analysis time down from months to hours.


Computers learn to understand humans better by modelling them

Aalto University


from

Researchers from Aalto University, University of Birmingham and University of Oslo present results paving the way for computers to learn psychologically plausible models of individuals simply by observing them. In newly published conference article, the researchers showed that just by observing how long a user takes to click menu items, one can infer a model that reproduces similar behavior and accurately estimates some characteristics of that user’s visual system, such as fixation durations.


2 ex-Googlers are starting Humu, a machine learning company that wants to ‘make work better’

VentureBeat, Ken Yeung


from

Less than a year after departing Google as its head of human resources, Laszlo Bock has become an entrepreneur. On Monday, he announced that he’s working on Humu, a startup designed to help us work better. He’s joined by Wayne Crosby, Google’s director of engineering who recently resigned. While details are sparse, the company plans on using science, machine learning, and “a little bit of love” to improve our jobs.


A.I. That Predicted Kentucky Derby Superfecta Reveals 2017 Bets

Inverse, Graham Templeton


from

The betting world has spent the past several weeks obsessing over its predictions for this Saturday’s Kentucky Derby, but those who pay attention to artificial intelligence held back their money until Thursday, when a San Francisco A.I. firm released its predictions.

That’s because, in 2016, Unanimous A.I. used a technology called “swarm intelligence” to coordinate a group of racing fans to correctly predict the Kentucky Derby superfecta (the first four places, in order). The swarm beat 540-to-1 odds, along with the most-trusted handicappers in the world.


Machine Learning: An Applied Econometric Approach

Journal of Economic Perspectives, Sendhil Mullainathan and Jann Spiess


from

Machines are increasingly doing “intelligent” things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.


Government Data Science News

NYU Data Services offered up its set of government data rescue best practices with info from Environmental Data & Governance Initiative (EDGI) and the Data Refuge group at the University of Pennsylvania.

Sandia National Lab researchers have taken molecular computational modeling down to quantum levels. And Los Alamos National Lab infectious disease modelers released new outbreak models with reduced computational and data requirements, better suited for real world use cases.

The recent Health Datapalooza meeting
in Washington DC provided its annual update and forecast on advances in government health data (more from Bloomberg).

Open Knowledge International released the 4th Global Open Data Index, titled Advancing the State of Open Data Through Dialogue.

Jacqueline Poh, founding chief executive of Singapore’s Government Technology Agency, describes her country’s citizen access plan for government data to The Guardian


Possible cheating uncovered in popular Harvard computer class

The Boston Globe, Travis Andersen and Brian MacQuarrie


from

Multiple Harvard students, possibly more than 60, have been suspected of cheating in a wildly popular computer science course and had their cases referred to the college’s Honor Council, a step that can lead to expulsion in the most extreme cases.

David Malan, the lead professor in the introductory course, confirmed that multiple students in the fall semester had been referred to the council, which is composed of students, faculty, and administrators who recommend penalties for academic dishonesty.


Facebook is hiring 3000 new content monitors for a job AI cannot do

Popular Science, Kelsey D. Atherton


from

Mark Zuckerberg announced that Facebook is hiring 3,000 people to work on its community operations team, which reviews images, videos, and posts that users report. These new hires will join the 4,500 existing employees in an effort to minimize the reach of future events like the shooting of Robert Goodwin. It’s a considerable-but-essential investment for Facebook, but it leads us to a basic question: Can’t this job be automated?

The scale of this labor is vast: Facebook is hiring more people than work in the combined newsrooms of the New York Times, the Wall Street Journal, and the Washington Post. Facebook isn’t saying at this time if the jobs will be employees or contractors, and if they’ll be based in the United States or abroad.


The AI revolution is making game characters move more realistically

Ars Technica, Kyle Orland


from

When we talk about artificial intelligence in games, we usually picture smarter or more realistic enemies that don’t come off as mindless automatons. New research, though, is showing how an AI powered by a neural network could revolutionize the way player avatars animate realistically through complicated game environments in real time.

Phase-Functioned Neural Networks for Character Control is a fundamentally new way of handling character animation that will be presented at the ACM’s upcoming SIGGRAPH conference this summer. In most games, character animation is handled through “canned,” pre-recorded motion capture. This means an average player will see precisely the same motion cycled repeated thousands of times in a single play-through. “Our system works completely differently,” University of Edinburgh researcher Daniel Holden told Ars in a recent interview.


AT&T Labs working to combine drone video footage with artificial intelligence monitoring

FierceWireless, Mike Dano


from

The research unit at AT&T is working to combine two of what are perhaps the hottest buzzwords in the tech industry: drones and artificial intelligence.

Specifically, AT&T Labs is researching ways to take video footage from a drone flying around an AT&T cell tower and use AI to analyze that footage for signs of rust, corrosion or other defects on the tower. The result would save AT&T the time and money it takes to have engineers physically climb cell towers to do inspections (thanks to the drone), as well as the time and money it takes to have engineers review video from a drone inspecting a tower (thanks to the artificial intelligence).

“We are leveraging over 130 AT&T patents and 20 years of video analytics and machine research,” explained Art Pregler, the unmanned aircraft systems (UAS) program director at AT&T. “We are using algorithms … the algorithm can learn what it is that it’s looking at and take appropriate actions.”


Cornell CIS and Adobe collaboration creates AI photo tool

Cornell Chronicle


from

There may a new cool tool for image editing software in the future. If you’re a fan of making your photo into a Monet or Warhol, there’s now a way to make changes to a photograph by transferring the style and other elements from another photograph.

Computer science professor Kavita Bala, doctoral student Fujun Luan, and Adobe collaborators Sylvian Paris and Eli Shechtman have released a paper detailing their new Deep Photo Style Transfer. The paper explains how the researchers have augmented style transfer, transposing the look of one photo onto another using neural networks to make sure the details of the original image are preserved. Th

 
Events



MIT Sloan CIO Symposium

MIT Sloan School of Management


from

Cambridge, MA May 24 [$$$]

 
Deadlines



Ford-Mozilla Open Web Fellowships

The Open Web Fellows program – a collaboration between Ford Foundation and Mozilla – is an international leadership initiative that brings together technology talent and civil society organizations to advance and protect the open Web. Deadline for applications is May 21.

National Science Foundation-funded Data Consortium Fellows

Fellows can apply for up to $2000 in participant support for a new collaboration that results in a short (500+ word) white paper detailing a new research question, method, or finding in creative learning analytics and play. Deadline for applications is May 31.

ICDM 2017 | Call for Papers

New Orleans, LA The IEEE International Conference on Data Mining (ICDM) has established itself as the world’s premier research conference in data mining. Deadline for submissions is June 5.

IARPA, NIST Hosting Face Recognition Challenge to Improve Accuracy

Do you have the most accurate unconstrained face recognition algorithm? Deadline for submissions is June 15.

2nd Workshop on Visualization for the Digital Humanities

Phoenix, AZ Part of IEEE VIS 2017 in October. The purpose of this workshop is to propose new research directions in visualization for the digital humanities, to familiarize the visualization research community with the problems faced by digital humanities researchers, and to foster future collaboration between visualization and digital humanities research. Deadline for submissions is July 21.
 
Tools & Resources



bqplot: Bringing more interactivity to data

Tech at Bloomberg, Dhruv Madeka


from

“bqplot allows anyone to build fully interactive web applications in Python, using surprisingly few lines of code. With about a dozen lines of code, for example, bqplot can generate a map of U.S. electoral results by county.”


Parl.AI, a framework for training and evaluating AI models on a variety of openly available dialog datasets

Facebook, Jason Weston


from

“ParlAI complements FAIR’s existing text research efforts like FastText, our quick and efficient text classification tool, and CommAI, our framework for developing artificial general intelligence through incremental tasks that test increasingly more complex skills.”


Using external C++ functions with PyStan & radial velocity exoplanets

Dan Foreman-Mackey


from

In this blog post, I’ll demonstrate how to use a custom C++ function in a Stan model using the Python interface PyStan. This was previously only possible using the R interface RStan (see an example here) so I hacked PyStan to make this possible in Python as well. You’ll need to install the ‘feature-external-c++’ branch of my fork of PyStan to make this work. This is unlikely to get merged into PyStan because there is a hope that this will one day be a feature of the core Stan language, but if you’re impatient and need this now, read on!


How the TensorFlow team handles open source support

O'Reilly Radar, Pete Warden


from

“Open-sourcing is more than throwing code over the wall and hoping somebody uses it. I knew this in theory, but being part of the TensorFlow team at Google has opened my eyes to how many different elements you need to build a community around a piece of software.”

 
Careers


Full-time positions outside academia

Remote engineering and design roles



Kolide; Boulder, CO

Head of Research



Spotify; New York, NY

Senior Full Stack Developer



Lilt; Munich, Germany
Tenured and tenure track faculty positions

Professorship in Cancer Genomics



Ecole Polytechnique Fédérale de Lausanne; Lausanne, Switzerland

Leave a Comment

Your email address will not be published.