Data Science newsletter – June 16, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 16, 2017

GROUP CURATION: N/A

 
 
Data Science News



Deal or no deal? Training AI bots to negotiate

Facebook Code, Engineering Blog; Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra


from

Researchers at Facebook Artificial Intelligence Research (FAIR) have open-sourced code and published research introducing dialog agents with a new capability — the ability to negotiate.

Similar to how people have differing goals, run into conflicts, and then negotiate to come to an agreed-upon compromise, the researchers have shown that it’s possible for dialog agents with differing goals (implemented as end-to-end-trained neural networks) to engage in start-to-finish negotiations with other bots or people while arriving at common decisions or outcomes.


What Computer Modeling Can Tell Us about Religion

Boston University, Research


from

“I don’t even like computers,” Connor Wood confesses. So why is this doctoral student in religion studies so enthusiastic about the computer simulation displayed on his laptop, resembling the result of an explosion at a spaghetti factory and an adjacent paint store—a tangle of curved, intersecting, multicolored lines?

Because those lines represent the new frontier of computer-simulated religion studies being explored by Wood (GRS’16) and other researchers under Wesley Wildman, a Boston University School of Theology (STH) professor of philosophy, theology, and ethics. The answers they seek might help explain things from religion’s benefits (potentially better mental health) to its evils (violence in the name of God).


Apple is quietly working on turning your iPhone into the one-stop shop for all your medical info

CNBC, Christina Farr


from

Imagine turning to your iPhone for all your health and medical information — every doctor’s visit, lab test result, prescription and other health information, all available in a snapshot on your phone and shared with your doctor on command.


Hard Questions: How We Counter Terrorism

Facebook, Monika Bickert


from

In the wake of recent terror attacks, people have questioned the role of tech companies in fighting terrorism online. We want to answer those questions head on. We agree with those who say that social media should not be a place where terrorists have a voice. We want to be very clear how seriously we take this — keeping our community safe on Facebook is critical to our mission.

In this post, we’ll walk through some of our behind-the-scenes work, including how we use artificial intelligence to keep terrorist content off Facebook, something we have not talked about publicly before. We will also discuss the people who work on counterterrorism, some of whom have spent their entire careers combating terrorism, and the ways we collaborate with partners outside our company.

Our stance is simple: There’s no place on Facebook for terrorism.


DeepMind’s neural network teaches AI to reason about the world

New Scientist, Daily News, Matt Reynolds


from

The world is a confusing place, especially for an AI. But a neural network developed by UK artificial intelligence firm DeepMind that gives computers the ability to understand how different objects are related to each other could help bring it into focus.

Humans use this type of inference – called relational reasoning – all the time, whether we are choosing the best bunch of bananas at the supermarket or piecing together evidence from a crime scene. The ability to transfer abstract relations – such as whether something is to the left of another or bigger than it – from one domain to another gives us a powerful mental toolset with which to understand the world. It is a fundamental part of our intelligence says Sam Gershman, a computational neuroscientist at Harvard University.


The Science of Data Science

Hopkins Bloomberg Public Health Magazine


from

Four million students. That’s the number of enrollments in the data science program taught by associate professor Jeff Leek, PhD, MS, alongside fellow Biostatistics professors Roger Peng, PhD, MS, and Brian Caffo, PhD, MS. They’ve been teaching the program since its 2014 inception on Coursera, a site that offers massive open online courses (MOOC).

What’s a statistician to do with data from 4 million students? Analyze it, naturally. Leek and assistant scientist Leah Jager, PhD, MS, in collaboration with biostatistics PhD student Leslie Myint, started randomly assigning different quizzes to see if giving different sets of explanations or instructions would nudge students toward one interpretation of data over another. It did. In one experiment, students were asked if a study about smoking and lung cancer demonstrated correlation or causation—the correct answer being correlation. However, when the professors added a post-hoc explanation of the study results, more students incorrectly characterized it as a causal analysis. “People may misinterpret the data if you are not careful about the language you use,” Leek concludes.


Facebook’s Director Of AI Research On Why It’s Critical To Take Career Risks

Forbes, Paulina Guditch


from

As a 25-year-old Computer Science PhD student in France, Yann LeCun chose to study a field that was in its infancy: machine learning. Many skeptics questioned his decision to pursue machine learning, an application of artificial intelligence that focuses on teaching machines to learn through data and algorithms.

“I knew exactly what I wanted to work on, but no one I could find in France was working on machine learning at the time,” LeCun said. “The technique I was working on (deep learning) was being invented.”

Deep learning is an application of machine learning that deals with larger data sets and more complex algorithms. Systems that use deep learning are sometimes referred to as deep neural networks because they are modeled after the human brain.

Over a farm-to-table meal at brunchwork at Alley Chelsea, LeCun recalled, “I thought I’d figured something out that nobody else had figured out. I was really convinced. It turns out I was not mistaken, but I could’ve been.”


Adding data science to the Berkeley faculty toolkit

University of California-Berkeley, Division of Data Sciences


from

In June, more than 30 faculty and instructors across a range of disciplines – from English to Sociology to Neuroscience to Physics – took a week out of their summer to explore how to incorporate tools developed for data science into their own teaching.

The faculty members attended the intensive week-long Data Science Pedagogy & Practice workshop hosted in early June by the Data Science Education Program. Held at the Academic Innovation Studio (link is external) in Dwinelle Hall, this second offering of the summer course for faculty was the next step in building out Berkeley’s ecosystem of interdisciplinary courses in the data science field and departments across campus. Its goal was to make it possible to offer more such courses in the future and tailor existing courses to draw upon the already existing Foundations of Data Science (Data 8) course, which is taken by more than a thousand students each year.


NVIDIA Describes AI’s Critical Role in Self-Driving Cars to Key Senate Committee

The Official NVIDIA Blog


from

Artificial intelligence is the key to unlocking the challenge and promise of self-driving cars, NVIDIA told Congress today.

In testimony before a packed hearing of the U.S. Senate Committee on Commerce, Science and Transportation, Rob Csongor, vice president and general manager of the company’s Automotive business, said that AI will in the years ahead enable self-driving cars that save tens of thousands of lives, provide mobility to the disabled, improve urban design and save vast amounts of unproductive time.


An artificial intelligence from Microsoft’s Maluuba got a record Ms. Pac-Man score.

Slate, Jacob Brogan


from

The classic Atari 2600 version of Ms. Pac-Man is brutally unforgiving, partly because it’s so unpredictable. In one recent emulator-assisted session, I managed a mere 26,000 points before being swallowed up by the game’s erratic antagonists. My score pales before those of the game’s true masters, but even they must cede their titles to the game’s new champion, an artificially intelligent system developed by Maluuba, a Microsoft-owned startup. Maluuba’s program managed to accumulate an astonishing 999,990 points—more than any player, human or machine, has managed before.


Why Data Science Argues against a Muslim Ban

Scientific American Blog Network, Eric Siegel


from

Some people have relied on “common sense” to brand an entire religion as dangerous—but rigorous analysis proves they’re wrong


Technology unlocks mold genomes for new drugs

Northwestern University, Northwestern Now


from

Scientists believe there are thousands or even millions of fungal molecules waiting to be discovered, with enormous health, social and economic benefits. The new technology systematically identifies powerful bioactive molecules from the microbial world — honed through millennia of evolution — for new drug leads. These small molecules could lead to new antibiotics, immunosuppressant drugs and treatments for high cholesterol, for example.

For four years, Kelleher has collaborated with Nancy P. Keller, the Robert L. Metzenberg and Kenneth B. Raper Professor of Mycology at Wisconsin, and colleagues at Intact Genomics in St. Louis on developing the technology, called FAC-MS (Fungal Artificial Chromosomes with Metabolomic Scoring).


New Ph.D. Program Highlights Growing Importance of Machine Learning

Georgia Institute of Technology, College of Computing


from

Eight Georgia Tech schools partner to offer advanced degree in emerging field of machine learning

The Georgia Institute of Technology has been approved to offer a new advanced degree program for the emerging field of machine learning.

In a unanimous vote, the Board of Regents of the University System of Georgia approved Georgia Tech’s request to establish a Doctor of Philosophy in Machine Learning.


How a crippling shortage of analysts let the London Bridge attackers through

The Guardian, Mark Townsend


from

Last Tuesday, in the wake of the latest terror atrocity to strike Britain, the former head of MI5 Dame Stella Rimington recalled just how primitive intelligence gathering used to be. Addressing a conference of security officials in west London – four miles from London Bridge where the terror attack had taken place three days earlier – Rimington recounted an anecdote about how her spy training in the 1970s involved infiltrating a local pub to eavesdrop on targets.

Over the four decades since then, intelligence gathering within Britain’s security services has evolved beyond comparison. Eking out a lead is no longer an issue – instead extraordinary volumes of information are relentlessly harvested electronically. The worry, according to experts, is whether they are acquiring too much.


Finding Novel Ways to Trade on Sentiment Data

Tech at Bloomberg


from

In some early experiments, Bloomberg’s Quantitative Research team devised sentiment-driven strategies that could beat the market by double digits. The researchers caution that traders in the ‘messier’ real world should not expect to do quite as well due to transaction costs and other inefficiencies. “We wanted to show some ideas for how sentiment can be used to make money,” says Xin Cui, another quantitative researcher on the team. “What we have shown is that the data has value on its own. It’s up to the portfolio managers to integrate the sentiment signal with their existing proprietary strategies.”

Verma recently presented some of his group’s findings at The Trading Show in Chicago on May 17th. He’ll also be speaking at AI, Machine Learning and Sentiment Analysis Applied to Finance in London on June 28th.


“All of Us” Precision Medicine Program Enters Beta Testing

National Institutes of Health


from

I’m happy to announce that we’ve begun enrolling our first participants as beta testers of the All of Us Research Program. This is a major milestone in our progress to date, and one of many more to come for our startup research consortium that came together less than a year ago.

The mission of the All of Us Research Program is to accelerate health research and medical breakthroughs, enabling individualized prevention, treatment, and care for all of us.


Great paper? Swipe right on the new ‘Tinder for preprints’ app

Science, ScienceInsider, Dalmeet Singh Chawla


from

If you’re tired of swiping left and right to approve or reject the faces of other people, try something else: rating scientific papers. A web application inspired by the dating app Tinder lets you make snap judgments about preprints—papers published online before peer review—simply by swiping left, right, up, or down.

Papr brands itself as “Tinder for preprints” and is almost as superficial as the matchmaker: For now, you only get to see abstracts, not the full papers, and you have to rate them in one of four categories: “exciting and probable,” “exciting and questionable,” “boring and probable,” or “boring and questionable.” (On desktop computers, you don’t swipe but drag the abstract.) The endless stream of abstracts comes from the preprint server bioRxiv.


The Robot-Human Alliance – Call it Multiplicity: diverse groups of people and machines working together.

Wall Street Journal, Commentary, Ken Goldberg


from

Humans have been beaten by a machine again. Last month the world’s top-ranked player of Go, an abstract strategy board game, lost to Google’s AlphaGo program. This marked a major achievement for machine learning, given the game’s complexity.

What might be more important, though, went less noticed: Around the same time, some of the world’s top Go players began competing alongside the AlphaGo program in human-machine teams. The players learned new strategies by studying AlphaGo’s previous games and said working with the program gave them new confidence. This suggests a different way to think about the coming era of artificial intelligence.


The evolution of data center networks

O'Reilly Radar, Brian Anderson and Dinesh Dutt


from

I recently sat down with Dinesh Dutt, chief scientist at Cumulus Networks, to discuss how data centers have changed in recent years, new tools and techniques for network engineers, and what the future may hold for data center networking. Here are some highlights from our talk.


Online Courses May Worsen Academic Outcomes, Especially for At-Risk Students – Pacific Standard

Pacific Standard, Elena Gooray


from

Many non-traditional students turn to, and are recruited for, online degree programs on the promise of greater flexibility and lower fees. But research released by the Brookings Institution last week suggests online courses provide the worst academic outcomes for the students who most need extra support. Looking at the for-profit DeVry University’s enrollment and performance data, Eric Bettinger and Susanna Loeb of Stanford University report that students who took an online class received grades 0.44 points worse on a 4.0 grade point average scale compared to traditional students who took the same class in person. That means the online course takers would receive a C on average, whereas the traditional student would receive a B-minus on average.


Andreessen Horowitz leads $23 million round in Instabase

CNBC, Jordan Novet


from

Andreessen Horowitz, one of Silicon Valley’s most prominent venture capital firms, has placed a bet on a start-up called Instabase that’s quietly building a web service where data scientists and less technical users can work with data, CNBC has learned.

Last month the start-up disclosed that it had raised $23.2 million, and a spokesperson from Andreessen Horowitz confirmed it was the lead investor.

The idea is that people would dump data into Instabase to make it more accessible to colleagues both inside and outside their organization. For instance, salespeople could use the product to store and access web site visitor data without requiring engineers to write queries to a visitor database. Engineers can also use it to store queries to back-end databases, which their colleagues can then explore and use.


Behold, the Largest Simulated Universe Ever Made

VICE, Motherboard, Daniel Oberhaus


from

25 billion galaxies populated with 2 trillion particles will help calibrate a satellite on the hunt for dark matter.

Researchers at the University of Zurich have used a supercomputer to create the largest ever virtual universe, which is populated with some 25 billion galaxies generated from 2 trillion particles. The simulation will be used to calibrate Euclid, a satellite being developed by the European Space Agency that will be deployed in 2020 to investigate the nature of dark matter and dark energy.


Tweet of the Week

Twitter, chris rodley


from

 
Events



Kepler Exoplanet Week, June 17-23

NASA


from

Locations throughout California Beginning June 17, join us for a week-long celebration of NASA’s Kepler and K2 missions and the community of scientists who are using the data to further our understanding of our place in the universe. [free]

 
Tools & Resources



Altair

Brian Granger and Jake VanderPlas


from

Altair is a declarative statistical visualization library for Python, based on Vega-Lite.

With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.


Adversarially Regularized Autoencoders for Generating Discrete Structures

arXiv, Computer Science > Learning; Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun


from

Generative adversarial networks are an effective approach for learning rich latent representations of continuous data, but have proven difficult to apply directly to discrete structured data, such as text sequences or discretized images. Ideally we could encode discrete structures in a continuous code space to avoid this problem, but it is difficult to learn an appropriate general-purpose encoder. In this work, we consider a simple approach for handling these two challenges jointly, employing a discrete structure autoencoder with a code space regularized by generative adversarial training. The model learns a smooth regularized code space while still being able to model the underlying data, and can be used as a discrete GAN with the ability to generate coherent discrete outputs from continuous samples. We demonstrate empirically how key properties of the data are captured in the model’s latent space, and evaluate the model itself on the tasks of discrete image generation, text generation, and semi-supervised learning.

 
Careers


Full-time positions outside academia

Director Applied Research and Data Science



Adidas USA; Portland, OR

Head of Data Science and Analytics



BBVA Compass; San Francisco, CA

Senior Director, Quantitative Research & Data Science



Thomson Reuters; San Francisco, CA

Lead Counsel, Privacy



Facebook; Menlo Park, CA

Lead Engineer



The Coral Project; New York, NY
Postdocs

17.05.31 Postdoctoral Fellow/Research Fellow in Molecular Simulation



The Australian National University, College of Medicine, Biology and Environment; Canberra, Australia

Leave a Comment

Your email address will not be published.