Data Science newsletter – November 25, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for November 25, 2016

GROUP CURATION: N/A

Data Science News

Reinforcement learning with unsupervised auxiliary tasks

Google DeepMind, Max Jaderberg, Volodymyr Mnih Vlad Mnih, Wojciech Marian Czarnecki

from November 17, 2016

Our primary mission at DeepMind is to push the boundaries of AI, developing programs that can learn to solve any complex problem without needing to be taught how. Our reinforcement learning agents have achieved breakthroughs in Atari 2600 games and the game of Go. Such systems, however, can require a lot of data and a long time to learn so we are always looking for ways to improve our generic learning algorithms.

Our recent paper “Reinforcement Learning with Unsupervised Auxiliary Tasks” introduces a method for greatly improving the learning speed and final performance of agents. We do this by augmenting the standard deep reinforcement learning methods with two main additional tasks for our agents to perform during training.

Telling Mosquitoes Apart With a Cellphone

The New York Times

from November 21, 2016

Simple cellphones can tell one type of mosquito from another by their hums, which may be useful in fighting mosquito-borne diseases, according to new research from Stanford University.

Calling their project “Shazam for Mosquitoes,” after the phone app that identifies music, students from the university’s Bio-X institute showed that common cellphones could record mosquito wing beats accurately enough to distinguish, for example, Culex mosquitoes, which spread West Nile virus, from Aedes mosquitoes, which spread Zika.

Even older flip phones, which are still used in parts of Africa, are sensitive enough to do the job.

‘Atlas of the Underworld’ reveals oceans and mountains lost to Earth’s history

Science, Latest News

from November 22, 2016

Earth has a bad habit of erasing its own history.

At intersections of tectonic plates worldwide, slabs of ocean crust dive into the mantle, part of the continuous cycle that not only drives the continents’ drift, but also fuels the volcanism that builds up island chains like Japan and mountains like the Andes. The disappearance of these slabs, called subduction, makes it difficult to reconstruct oceans as they existed hundreds of millions of years ago, as well as the mountains flanking them. “Every day, we’re losing geologic information from the face of the Earth,” says Jonny Wu, a geologist at the University of Houston in Texas. “It’s like losing pieces of broken glass as you’re trying to put it together again.”

But geoscientists have begun to pick up these pieces by peering into the mantle itself, using earthquake waves that pass through Earth’s interior to generate images resembling computerized tomography (CT) scans. In the past few years, improvements in these tomographic techniques have revealed many of these cold, thick slabs as they free fall in slow motion to their ultimate graveyard—heaps of rock sitting just above Earth’s molten core, 2900 kilometers below.

A Name Directory for the Ocean Floor

Eos, Vaughan Stagpoole, Hans Werner Schenke, and Yasuhiko Ohara

from November 22, 2016

An increasing number of ships now routinely survey our oceans, and we are learning more about the seafloor and its features. The rapid developments in multibeam sonar technology and the deployment of these instruments on remotely operated vehicles (ROV) and autonomous underwater vehicles (AUV) mean that some features only tens of meters in relief are now being mapped and named.

To ensure that these features’ names don’t get renamed on subsequent cruises, scientists turn to a reliable tool: the General Bathymetric Chart of the Oceans (GEBCO) Gazetteer online interactive map. This map is supplemented with other online resources, including naming guidelines, links to proposal forms, and a glossary.

Bruce Schneier: ‘The Internet Era of Fun and Games Is Over’

The Daily DotGroup

from November 22, 2016

Internet pioneer Bruce Schneier issued a dire proclamation in front of the House of Representatives’ Energy & Commerce Committee Wednesday: “It might be that the internet era of fun and games is over, because the internet is now dangerous.”

The meeting, which focused on the security vulnerabilities created by smart devices, came in the wake of the Oct. 21 cyberattack on Dyn that knocked Amazon, Netflix, Spotify, and other major web services offline.

Schneier’s opening statement provided one of the clearest distillations of the dangers posed by connected devices I’ve seen. It should be required viewing. He starts around the 1:10:30 mark in the livestream below, but we’ve also transcribed most of his remarks.

This new study may explain why peer review in science often fails

Vox, Julia Belluz

from November 23, 2016

If there’s any part of the research enterprise that can elicit rage amongst scientists, it’s peer review.

Academics vet the work of their peers — for free, in their spare time — in a process that is supposed to weed out junk science before it’s published. But researchers say the task is thankless, that it slows down the publication process. To make matters worse, this cornerstone of the scientific method has surprisingly little evidence for its effectiveness, and many mysteries about how it works.

In an effort to better understand peer review, researchers have been trying to study the process itself. And one such new study may help explain why peer review fails, and why it may not ensure quality in science. Its main finding is that a small minority of researchers are shouldering most of the burden of peer review.

Patent Office Joins Open Source Movement

Nextgov.com

from November 21, 2016

Citizens can now browse the source code for an app that tracks the trademark process, part of a broader federal effort to share more software code with the public.

The U.S. Patent and Trademark Office has published the source code on GitHub for the app, which notifies a trademark applicant every time the status changes.

With Gift, Carnegie Mellon Scholars Seek To Better Define Artificial Intelligence Ethics

NPR, All Tech Considered

from November 21, 2016

This month, a law firm gave Pittsburgh’s Carnegie Mellon University $10 million to explore the ethics of artificial intelligence — or AI. This comes after industry leaders recently joined together to form the group called the Partnership on Artificial Intelligence to Benefit People and Society.

Peter Kalis is chairman of the law firm, K&L Gates. He says technology is dashing ahead of the law, leading to questions that were never taken seriously before. Such as what happens when you make robots that are smart, independent thinkers — and then try to limit their autonomy?

The best ways to combat bias from Airbnb to eBay

Tim Harford

from November 23, 2016

Six months ago, tech entrepreneur Rohan Gilkes tried to rent a cabin in Idaho over the July 4 weekend, using the website Airbnb. All seemed well, until the host told him her plans had changed: she needed to use the cabin herself. Then a friend of Rohan’s tried to book the same cabin on the same weekend, and his booking was immediately accepted. Rohan’s friend is white; Rohan is black.
This is not a one-off. Late last year, three researchers from Harvard Business School — Benjamin Edelman, Michael Luca and Dan Svirsky — published a working paper with experimental evidence of discrimination. Using fake profiles to request accommodation, the researchers found that applicants with distinctively African-American names were 16 per cent less likely to have their bookings accepted. Edelman and Luca have also published evidence that black hosts receive lower incomes than whites while letting out very similar properties on Airbnb. The hashtag #AirbnbWhileBlack has started to circulate.

Can anything be done to prevent such discrimination? It’s not a straightforward problem. Airbnb condemns racial discrimination but, by making names and photographs such a prominent feature of its website, it makes discrimination, conscious or unconscious, very easy.

“It’s a cheap way to build trust,” says researcher Michael Luca. But, he adds, it “invites discrimination”.

Oxford University to teach Data Science for IoT

Electronics EETimes

from November 24, 2016

Responding to the global talent drought of data scientists, Oxford University’s CPD Centre will be launching a new course, Data Science for the Internet of Things IoT in January next year.

Big news for Montreal!

Ivado

from November 21, 2016

Google will expand its academic support to deep learning at U. Montreal’s MILA (core to IVADO AI-OR hub), offering Focused Research Awards to 7 MILA faculty at U. Montreal and at neighbour McGill U.: Pascal Vincent, Aaron Courville, Christopher Pal, Doina Precup, Joelle Pineau, Simon Lacoste-Julien, Laurent Charlin. Furthermore, Google will renew Yoshua Bengio’s Focused Research Award. All these awards taken together, for a period of 3 years, bring Google’s donation to MILA (bridging U. Montreal, Polytechnique Montreal, HEC Montreal and McGill U.) to 3.375M$US, i.e., about 4.5 M$C or 1.5M$C per year.

In addition, Google has decided to open a deep learning & AI research group here in Montreal, linked to its Google Brain group in Mountain View, California. This group will be led by Hugo Larochelle, a forerunner of the deep learning community and a PhD graduate from Yoshua Bengio’s lab at U. Montreal. This is a great example of reverse brain-drain, since Hugo will be coming back from the US to Montreal.

Amazon wants to stream live sports

Marketplace

from November 22, 2016

Amazon is exploring a new frontier: live sports. The tech giant is in talks with the NFL, the NBA and others for the rights to stream lives games, according to a report from the Wall Street Journal. Why is Amazon eyeing the world of live sports and what does the potential move mean for plain old broadcasting? [audio, 1:58]

Unlocking the Potential of Open Data through Community Engagement

Data-Smart City Solutions, Jennifer Angarita

from November 21, 2016

From exploring neighborhood maps to generating analysis and building applications, citizens, businesses, nonprofits and technologists can all benefit from open data. However, much of the recent focus on open data has centered on the role of technology and overlooked community members and users themselves.

Harnessing the power of open data necessitates meaningful engagement with diverse external stakeholders like journalists, community-based organizations, and local residents. While many cities may strive to engage diverse communities through open data, in practice, cities often struggle to inclusively and systematically collaborate with external partners.

Recently, Cambridge, MA led an in-depth analysis and released a first-of-its-kind report exploring how cities can better collaborate with open data users to increase the impact of civic innovation. The report, Amplifying Civic Innovation: Community Engagement Strategies for Open Data Collaborations, presents numerous findings on how local municipalities can support inclusive and sustained collaborations with different users of open data.

MXNet – Deep Learning Framework of Choice at AWS

Werner Vogels, All Things Distributed blog

from November 22, 2016

We have concluded that MXNet is the most scalable framework. We believe that the AI community would benefit from putting more effort behind MXNet. Today, we are announcing that MXNet will be our deep learning framework of choice. AWS will contribute code and improved documentation as well as invest in the ecosystem around MXNet. We will partner with other organizations to further advance MXNet.

NVIDIA, Carnegie Mellon Create S.A.R.A., the Social Robot

Datamation

from November 23, 2016

Using NVIDIA technology, researchers from the ArticuLab at Carnegie Mellon University have created a Socially Aware Robot Assistant (S.A.R.A.). Now, other than the fact that this robot can read emotions, which means she will likely be reasonably good at telling when you are lying, it will also mean that the camera on your laptop, phone or monitor will suddenly have a more important use than taking selfies.

GE Wants To Be The Next Artificial Intelligence Powerhouse | Fast Company | Business + Innovation

Fast Company, Sean Captain

from November 15, 2016

Sean Captain 11.15.16 10:00 AM

When you hear the term “artificial intelligence,” you may think of tech giants Amazon, Google, IBM, Microsoft, or Facebook. Industrial powerhouse General Electric is now aiming to be included on that short list. It may not have a chipper digital assistant like Cortana or Alexa. It won’t sort through selfies, but it will look through X-rays. It won’t recommend movies, but it will suggest how to care for a diesel locomotive. Today, GE announced a pair of acquisitions and new services that will bring machine learning AI to the kinds of products it’s known for, including planes, trains, X-ray machines, and power plants.

The effort started in 2015 when GE announced Predix Cloud—an online platform to network and collect data from sensors on industrial machinery such as gas turbines or windmills. At the time, GE touted the benefits of using machine learning to find patterns in sensor data that could lead to energy savings or preventative maintenance before a breakdown. Predix Cloud opened up to customers in February, but GE is still building up the AI capabilities to fulfill the promise. “We were using machine learning, but I would call it in a custom way,” says Bill Ruh, GE’s chief digital officer and CEO of its GE Digital business (GE calls its division heads CEOs). “And we hadn’t gotten to a general-purpose framework in machine learning.”

Facebook messenger is using AI behind the scene with some machine learning algorithm.

Medium, Sachim

from November 23, 2016

Facebook messenger is using AI behind the scene with some machine learning algorithm. It will get smarter as more people use it. However who will use it is a big question.

May be millennials like to shop this way.

I can not believe that this messenger can easily beat the human found deals. May be after few years of learning things will change.

Experian solicits input from Big Data Hubs Community

South Big Data Hub, Hubbub! blog

from November 23, 2016

Experian is soliciting input from the Big Data Innovation Hubs regarding our community’s data needs. To support academic research and new use cases that may involve industry, Experian has shared information on data sets they maintain.

PIDapalooza – Revenge of the Nerds

The Scholarly Kitchen, Alice Meadows

from November 21, 2016

PIDapalooza, the first ever festival for scholarly persistent identifiers, set out to make PIDs – and the nerds who create, develop, and use them – cool! And, to a large extent, I think it succeeded. From the offbeat location (Reykjavik in November, anyone?) to the music festival vibe and fast-paced program (three tracks of mostly half-hour sessions interspersed with plenaries), it certainly felt very different from other scholarly communications conferences I’ve attended.

But what about the content? What happens when you put 120 or so PID aficionados in a room together? What did we talk about? Which topics generated the most buzz and why?

Storytelling in Virtual Reality: Hamlet takes the (virtual) stage in new production

NYU, Tandon School of Engineering

from November 18, 2016

William Shakespeare’s famous play Hamlet has been adapted thousands of times: on the stage of the Globe Theatre, in high school productions, and even on the TV show The Simpsons. Now, in the 400th year since Shakespeare’s death, a team of artists, actors, engineers, and developers are transporting the canonical text into virtual reality in a new production titled To Be with Hamlet. Conceived by Javier Molina, a graduate and current adjunct professor of Integrated Digital Media at NYU Tandon, the production is the first live performance of theater in social virtual reality (VR). Molina is also the director of the Virtual Reality Lab at MAGNET (Media and Game Network), the NYU program that promotes intersectional education and collaboration between technology and culture.

Though VR is a new experience for many, Molina said that familiarity with the Bard’s play would allow audiences to easily relate to the production. The performance focuses on the fifth scene of Act I, which features the encounter between Hamlet and his father’s ghost. The team has been tirelessly creating the virtual skeletons and avatars of the actors playing Hamlet and the Ghost — Zachary Koval and Roger Casey, respectively.

Helping patients share EHR data with research

NIH, ONC

from March 20, 2016

S4S is a collaboration among researchers (Harvard Medical School Department of Biomedical Informatics), electronic health record vendors (Allscripts, athenahealth, Cerner, drchrono, eClinicalWorks, Epic, McKesson), and the United States federal government (Office of the National Coordinator for Health IT, Office of Science and Technology Policy, and National Institutes of Health).

SocInfo16 Twitter NodeXL SNA Map and Report for Saturday, 19 November 2016

NodeXL Graph Gallery, Marc Smith

from November 19, 2016

What will be the first blockchain applications for pharma and healthcare?

MedCity News

from November 22, 2016

While many possible uses for the blockchain in healthcare are still years away, the potential applications of it that would directly affect patients seem to be closer than previously thought. For instance, a patient-centered database on the blockchain could be successfully implemented in less than two years.

Blockchain, for the uninitiated, is the underlying technology for bitcoin and other crypto-currencies.

That was one of the main ideas to come out of the first Pharma Blockchain Bootcamp held in November by DisruptiveRx, a group that organizes events focused on innovations in the pharma field. All of the speaker discussions at the daylong conference “looked at patients taking control of their data on the blockchain and using it as ‘leverage’ to manage their healthcare,” said Maria Palombini, managing partner of DisruptiveRx, in a post-conference interview.

NuTonomy to Test a Robotaxi in Boston

IEEE Spectrum, Philip E. Ross

from November 21, 2016

Bostonians will get the chance to hail a self-driving cab by the end of this year, roboride startup NuTonomy announced today. It’s a continuation of a testing program the company began some months ago in Singapore.

Events

Dec 1 – Science Literacy: What It Means, Why It Matters, and What We Can Do About It

New York, NY Join us on December 1 at the American Museum of Natural History’s Kaufmann Theater in New York, NY for a discussion of current research surrounding these questions. Begins 10 a.m.

DataKind / Annie E. Casey Foundation DataDive Registration

Washington, DC December 2-4, AGU Conference Center (2000 Florida Ave. NW) [free]

Toronto #DataThon!

Toronto, Ontario, Canada Saturday, December 3, starting at 8 , Paytm Labs (220 Adelaide St West) [free]

The Internet of Things: Impacting Scientific Data and Information Flows

Washington, DC Friday, December 9, starts at 9 a.m., National Academies of Sciences, Engineering & Medicine
Keck Center (500 Fifth St., NW) [$$$]

Athletic Aesthetic

Los Angeles, CA Second Spectrum presents Athletic Aesthetic, an exhibition exploring the intersection of sports and art. December 10-11, 108 S. San Pedro

Putting Deep Learning into Production EXPERT TALKS | HANDS-ON WORKSHOP

San Francisco, CA Starts at 9:30 a.m., Capital One (201 3rd St, 5th Floor)

Artificial Intelligence and Data Science event,

London, England Newsweek is to co-host an Artificial Intelligence and Data Science event at the Barbican in London, March 1-2. [$$$$]

IEEE Tech for Humanity Series, SXSW Interactive

Austin, TX March 11-16 [$$$$]

Tools & Resources

Introducing DataChats: A New Video Series

DataCamp

from November 22, 2016

At DataCamp, we’re big fans of learning via expert interviews, fireside chats with key community contributors and all good data science talks in general. That is why we are proud to launch a new video series: DataChats

Improving Variational Approximations

Andrew Miller

from November 23, 2016

Nick Foti, Ryan Adams, and I just put a paper on the arxiv about improving variational approximations (short version accepted early to AABI2016). We focused on one problematic aspect of variational inference in practice — that once the optimization problem is solved, the approximation is set and there isn’t a straightforward way to improve it, even when we can afford some extra compute time.

Markov chain Monte Carlo methods have a simple solution — run the chain for more steps and the posterior approximation will get better and better. Variational inference (VI) methods, on the other hand, typically pre-specify some class of approximating distributions, and then optimize the VI objective. When that pre-specified class of distributions (the variational family) doesn’t include the neighborhood around the target distribution, the resulting VI solution will still be sub-optimal, in that there will be a non-zero KL-divergence between the approximation and the target. This will result in biased posterior estimates.

The rOpenSci geospatial suite

rOpenSci

from November 22, 2016

Geospatial data – data embedded in a spatial context – is used across disciplines, whether it be history, biology, business, tech, public health, etc. Along with community contributors, we’re working on a suite of tools to make working with spatial data in R as easy as possible.

America’s Next Topic Model

KDnuggets, Lev Konstantinovskiy

from November 23, 2016

Topic modeling is a a great way to get a bird’s eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.

Reading data from google sheets into R

Opiniomics

from November 22, 2016

Reading data from google sheets into R is something you imagine should be really simple, but often is anything but. However, package googlesheets goes a long way to solving this problem.

Let’s crack on with an example.

Deploying R, RStudio and Shiny applications on Unbuntu Server

R-bloggers,guillotantoine

from November 22, 2016

In this post, we are going to see how to deploy R, RStudio, and Shiny apps on a virtual server.

The post won’t focus on Shiny programming aspect, neither on Unbuntu programming, however, the tutorial is self-sufficient. Only by following the different steps, you’ll get a working server and Shiny app.

Careers

Tenured and tenure track faculty positions

Assistant Professor in Data Science

DePaul University; Chicago, IL

Internships and other temporary positions

Helen Fellowship

American Museum of Natural History; New York, NY

Postdocs

Rebanks Postdoctoral Research Fellowship

Royal Ontario Museum; Toronto, Ontario, Canada

Full-time positions outside academia

Supervisory Information Technology Specialist (DataMgt)

Office of the Chief Data Officer, Department of Commerce; Washington, DC

Sports.BradStenger.com

Data Science newsletter – November 25, 2016

Leave a Comment Cancel reply