Data Science newsletter – October 12, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for October 12, 2016

 
 
Data Science News





Driverless Cars Need Just One Thing: Futuristic Roads

Medium, Backchannel, Brent Skorup


from October 10, 2016

… Building roadside sensors along our streets and highways so that cars can navigate them sounds, initially, like an expensive pipe dream. But a series of advances are coinciding to make such a system possible. Fifth Generation wireless, or “5G,” is a wireless broadband standard that’s currently under development and will improve on our current networks in ways that will be useful for driverless vehicles. A 5G system will include denser wireless networks (with facilities every few hundred meters in cities) speeding up connections. 5G will also have broadcast capabilities, allowing networks to distribute large amounts of mapping data without time lags.




Why AI Makes It Hard to Prove That Self-Driving Cars Are Safe – IEEE Spectrum

IEEE Spectrum


from October 07, 2016

… Google researchers once tried identifying dumbbells with an artificial neural network, a common machine learning model that mimics the neurons in the brain and their connections. Surprisingly, the trained model could identify dumbbells in images only when an arm was attached.

Other problems with safety verification, [Phillip] Koopman says, include training and testing the algorithm too much on similar data; it’s like memorizing flash cards and regurgitating the information on an exam.




Launched: A Synthetic Biology Factory for Making Weird New Organisms

IEEE Spectrum


from October 10, 2016

Raising glasses of genetically modified beer, the synthetic biologists at Ginkgo Bioworks celebrated the launch of a new automated lab last month. By applying engineering principles to biology, and with the help of some nifty robotic equipment, Ginkgo has created a factory for churning out exotic lifeforms, the likes of which have never before been seen on this planet.

The home brew was an example of the potential applications of synthetic biology, a new field that builds on recent progress in genetic assembly methods. Scientists can now manufacture snippets of synthetic DNA and slip them into organisms, giving those critters strange new capabilities.




Open Source, Open Data: So much frustrate. Send a goat barrage?

Our fearlessly rational colleague Carly Strasser wrote a piece explaining how much interest there is in supporting open source software. Oh hey, hundreds of thousands of users who don’t pay a dime. Or contribute time. And how hard it is to find stable, ongoing financial support for it. As if in response, an open source contributor who goes by nayafia on GitHub posted a handy guide to financial support for open source.

One of many such projects is the OpenTrials API featured at the World Health Summit in Madrid last week. This is the open data bright spot. Shine on, OpenTrials.

Writing for MIT Technology Review, Jamie Condliffe reminds us that “we all enable a surveillance society by simply choosing to use social networks” and are thereby participating in a grey, often unintended practice of opening our data that has been used by police departments to target social activists. Star Ying and Tyrone Grandison largely concur with the sentiment that worrying about personal privacy is futile in a paper they will present in November. Both PhDs and practicing data scientists, they apply current legal guidelines to a typical personal dataset to argue that “big data privacy is near impossible.”

This: in case you’ll feel better by sending yourself an SMS goat barrage.




Cybathlon: Battle of the bionic athletes

BBC News


from October 10, 2016

Bionic arms, robotic legs, powered exoskeletons, brain-controlled computer interfaces and supercharged wheelchairs – all took centre stage to compete at this weekend’s Cybathlon.

Dubbed as the world’s first “bionic Olympics”, you might assume this to be a mini-Paralympics with some cutting-edge technology added to the mix.

But the focus of the event was less about athletics – and you could argue, less about competition – and more about what the future could hold for those with disabilities.




Automatic chemical design using a data-driven continuous representation of molecules

arXiv, Computer Science > Learning; Alán Aspuru-Guzik et al.


from October 07, 2016

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This generative model allows efficient search and optimization through open-ended spaces of chemical compounds. We train deep neural networks on hundreds of thousands of existing chemical structures to construct two coupled functions: an encoder and a decoder. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to the discrete representation from this latent space. Continuous representations allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the design of drug-like molecules as well as organic light-emitting diodes.




With bells on! – Study of the Sea-Surface Microlayer and Air-Sea Boundary

Schmidt Ocean Institute


from October 11, 2016

The boundary layer of the ocean -where water touches air- can be thinner than a millimeter, but you would not know it judging from the amount of equipment involved in its study: unmanned airborne vehicles, remotely controlled catamarans, free floating chambers, aerosol sensors, CTD profilers, drifting buoys… the list goes on. There is much to be learned, and the team is sparing no effort to do so.




These are three of the biggest problems facing today’s AI

The Verge, James Vincent


from October 10, 2016

Machine learning in 2016 is creating brilliant tools, but they can be hard to explain, costly to train, and often mysterious even to their creators. Let’s take a look at these challenges in more detail:

1. First you get the data, then you get the AI




The Next Wave of Deep Learning Applications

The Next Platform, Nicole Hemsoth


from September 14, 2016

In the last two weeks alone we have seen research that breaks new ground in each of the following domains via neural networks and advanced machine learning frameworks. The listing below provides just a few select examples of the wave that hit the publication shores since the summer.

1. Advanced Melanoma Screening and Detection




Inside the Cybathlon, where even paraplegics can feel the adrenaline rush of competitive sport

The Conversation, Ana Matran-Fernandez


from October 11, 2016

The Cybathlon is the world’s first athletics event where the competitors are those living with severe disabilities such as paralysis or the loss of limbs, supported by cutting-edge assistive technology such as prosthetics and implants developed by university research teams. The Cybathlon, which has just taken place, aims to harness competition to spur on the technology that could offer huge improvements to millions of people’s lives.

The event’s six disciplines are based on the competitors’ physical needs, including agility courses for those with bionic arms and legs, obstacle courses including flights of stairs for wearable exoskeletons, powered wheelchair races, and a bike race for paralysed competitors using electronic muscle stimulation to move their legs.

BrainStormers, our team from the University of Essex Brain-Computer Interface and Neural Engineering Lab, competed in the final category: a videogame challenge played by paralysed parathletes using a brain-computer interface.




We need to learn a lot more about what’s stressing whales, study emphasizes

Science, ScienceInsider


from October 11, 2016

Human-produced noise in the ocean is likely harming marine mammals in numerous unknown ways, according to a comprehensive new report from the National Academies of Sciences, Engineering, and Medicine. That’s because there are insufficient data to determine how the ill effects of noise created by ships, sonar signals, and other activities interact with other threats, including pollution, climate change, and the loss of prey due to fishing. The report, which was sponsored by several government agencies and released on 7 October, provides a new framework for researchers to begin exploring these cumulative impacts.

“There’s a growing recognition that interactions between stressors on marine mammals can’t right now be accurately assessed,” said Peter Tyack, a marine mammal biologist at the University of St Andrews in the United Kingdom, in a webinar on the report. Tyack also chaired the committee that prepared the study, “Approaches to Understanding the Cumulative Effects of Stressors on Marine Mammals.”




Japanese Robotics Giant Gives Its Arms Some Brains

MIT Technology Review


from October 07, 2016

Fanuc, a company that produces robot arms for factories, is trying to get them to learn on the job.




Technology Will Replace Many Doctors, Lawyers, and Other Professionals

Harvard Business Review; Ron Susskind and David Susskind


from October 11, 2016

Faced with the claim that AI and robots are poised to replace most of today’s workforce, most mainstream professionals — doctors, lawyers, accountants, and so on — believe they will emerge largely unscathed. During our consulting work and at conferences, we regularly hear practitioners concede that routine work can be taken on by machines, but they maintain that human experts will always be needed for the tricky stuff that calls for judgment, creativity, and empathy.

Our research and analysis challenges the idea that these professionals will be spared. We expect that within decades the traditional professions will be dismantled, leaving most, but not all, professionals to be replaced by less-expert people, new types of experts, and high-performing systems.

We conducted around 100 interviews, not with mainstream professionals but with leaders and new providers in eight professional fields: health, law, education, audit, tax, consulting, journalism, architecture, and divinity. Our focus was on what has actually been achieved at the cutting edge. We also immersed ourselves in over 800 related sources — published books, internal reports, and online systems. We found plenty of evidence that radical change in professional work is already under way.

 
Events



NYU Computer Science Department Colloquium: Structured Prediction and Deep Learning



New York, NY <span class="person" style="font-weight:bold
“>Andrew McCallum from University of Massachusetts Amherst, Friday October 21 at 11:30 a.m., Warren Weaver Hall 1302
 
Deadlines



Call for Papers ICWSM-17

deadline: Conference

Deadline for abstract submission is Friday, January 6, 2017. ICWSM-17 is May 15-18 in Montreal.

 
NYU Center for Data Science News





[1610.03017] Fully Character-Level Neural Machine Translation without Explicit Segmentation

arXiv, Computer Science > Computation and Language; Jason Lee, Kyunghyun Cho, Thomas Hofmann


from October 10, 2016

Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We also observe that the quality of the multilingual character-level translation even surpasses the models trained and tuned on one language pair, namely on CS-EN, FI-EN and RU-EN.

 
Tools & Resources



Agile Data Science: Building Full-Stack Data Analytics Applications with Spark

O'Reilly Media books, Russell Jurney


from October 11, 2016

Building analytics products at scale requires a deep investment in people, machines, and time. How can you be sure you’re building the right models that people will pay for? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Spark.


Introducing the Team Data Science Process from Microsoft

Microsoft Technet, Cortana Intelligence and Machine Learning Blog


from October 11, 2016

Are you building a data science team but unsure how to make the team productive? Are you concerned that the lack of collaboration or consistent processes could hinder project success? Are you doing too many routine data science tasks manually? Do you face challenges capturing or reusing knowledge from data initiatives across your teams?

Microsoft is happy to introduce the Team Data Science Process (TDSP) – a methodology and set of practices for collaborative data science. TDSP is designed to help you fully realize the promise of data science for your business, and addresses each of the issues above.


Amazon Web Services — a practical guide

GitHub – open-guides


from October 12, 2016

A lot of information on AWS is already written. Most people learn AWS by reading a blog or a “getting started guide” and referring to the standard AWS references. Nonetheless, trustworthy and practical information and recommendations aren’t easy to come by. AWS’s own documentation is a great but sprawling resource few have time to read fully, and it doesn’t include anything but official facts, so omits experiences of engineers. The information in blogs or Stack Overflow is also not consistently up to date.

This guide is by and for engineers who use AWS. It aims to be a useful, living reference that consolidates links, tips, gotchas, and best practices. It arose from discussion and editing over beers by several engineers who have used AWS extensively.


New Version of Clarifai API Actually Offers Developers Bespoke Machine Learning

ProgrammableWeb


from October 10, 2016

The latest version of Clarafai’s image recognition API could be one of those. ProgrammableWeb hasn’t tested the API so it cannot vouch for how well it works. But the general idea is that Clarafai’s API can be used to train the Clarafai cloud how and what to recognize in certain images in a way that’s specific to an organization. For example, for one organization (perhaps one that is a used car brokerage), the image of a car might be recognized by Clarafai’s image recognition technology as a car of a certain year, manufacturer, and model. But to another organization, that same image might be interpreted for specific information about specifics of the tires on the vehicle. Out of the box, Clarafai’s image recognition technology might not do either very well. But, because the API can be leveraged to teach the engine how to recognize certain images based on what the organization sees in them, the machine can then learn for each organization how to respond the next time it sees those images and ones like them. Clarfai l refers to these machine-learned rules as “models” which in turn are essentially collections of what’s known as “concepts.”


The Care and Feeding of Your Scientist Collaborator

Simply Statistics blog, Roger Peng


from October 09, 2016

This post is a followup to Elizabeth Matsui’s previous post for scientists/clinicians on collaborating with biostatisticians. Elizabeth and I have been working for over half a decade and I think the story of how we started working together is perhaps a brief lesson on collaboration in and of itself. Basically, she emailed someone who didn’t have time, so that person emailed someone else who didn’t have time, so that person emailed someone else who didn’t have time, so that person emailed me, who as a mere assistant professor had plenty of time! A few people I’ve talked to are irked by this process because it feels like you’re someone’s fourth choice. But personally, I don’t care. I’d say almost all my good collaborations have come about this way. To me, it either works or it doesn’t work, regardless of where on the list you were when you were contacted.

 
Careers


Full-time, non-tenured academic positions

Head of Education



Santa Fe Institute; Santa Fe, NM
Tenured and tenure track faculty positions

Professor – Open Rank, BioFrontiers



University of Colorado; Boulder, CO

Leave a Comment

Your email address will not be published.