Data Science newsletter – February 1, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 1, 2017

GROUP CURATION: N/A

 
 
Data Science News



Concerned about Internet of Things security?

Princeton Center for Information Technology Policy, Freedom to Tinker blog, Sam Ransbotham


from

In our recent research at MIT Sloan Management Review, we found that only 34% of the respondents felt that they needed to improve their IoT data security. If you are trying to decide if the glass is full or empty, that glass seems two-thirds empty to me.

The research included responses from 1,480 executives, managers, and IT professionals working in a wide variety of industries. It focused on the perspective of organizations, not security professionals, and tried to understand their challenges and opportunities associated with the Internet of Things.


Hungry penguins help keep car code safe

BBC News


from

Hungry penguins have inspired a novel way of making sure computer code in smart cars does not crash.


Friday Eye Candy: Comparing a Square Mile of the World’s Famous Cities

Planetizen, James Brasuell


from

On his personal blog, Geoff Boeing, an urban planning PhD candidate at the University of California, Berkeley, has created a series of illustrations that allows easy comparison of the urban forms of cities as disparate as Rome and Irvine.

Inspired by Great Streets by Allan Jacobs, Boeing used Python and the OSMnx package he created for his dissertation. “With OSMnx we can download a street network from OpenStreetMap for anywhere in the world in just one line of code,” explains Boeing.


Really Big Data At Walmart: Real-Time Insights From Their 40+ Petabyte Data Cloud

Forbes, Bernard Marr


from

Walmart – the world’s biggest retailer with over 20,000 stores in 28 countries, is in the process of building the world’ biggest private cloud, to process 2.5 petabytes of data every hour.

To make sense of all of this information, and put it to work solving problems, the company has created what it calls its Data Café – a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters.

Here, over 200 streams of internal and external data, including 40 petabytes of recent transactional data, can be modelled, manipulated and visualized. Teams from any part of the business are invited to bring their problems to the analytics experts and then see a solution appear before their eyes on the nerve centre’s touch screen “smart boards”.


Turo Surprisingly Became a Car-Sharing Start-Up for Dope Rides

Inverse, James Grebey


from

f you had a swanky, expensive car, you might not be eager to hand off the keys to a total stranger looking to open it up or slow roll their high school reunion. You might be protective. You might be reluctant. And you might be convinced to change your mind. That is the premise at the core of Turo, a car-sharing startup with a presence in more than 4,500 cities that specializes in dope rides. Turo is building a community of car enthusiasts psyched to let other people take the wheel — if only because the money helps justify having a sweet whip.


Yale team uses Google analytical tool to gauge vaccine effectiveness

Yale University, YaleNews


from

Using a statistical method initially developed by Google, a Yale School of Public Health-led research team has devised a novel way to better analyze the impact of vaccines. The research was published in the Proceedings of the National Academy of Sciences. … A team led by Daniel Weinberger, assistant professor in the Department of Epidemiology of Microbial Diseases, used a method called “synthetic controls,” which was not previously applied in an epidemiology context, to analyze the impact of the pneumococcal vaccine.


How AI, IoT and cybersecurity will fuel growth for tech and media companies

World Economic Forum, Patrick Forth


from

In their recent report, “Unleashing Technology, Media, and Telecom with Digital Transformation”, the Boston Consulting Group shines a light on three trends in particular: artificial intelligence (AI), the Internet of Things (IoT), and cybersecurity.

Understanding these trends may give companies in the sector a competitive advantage. AI can help companies automate their customer support or network operations. Internet-connected devices can help technology vendors improve business processes and increase efficiencies in manufacturing, transportation, logistics, and utilities companies. And a new focus on cybersecurity can help TMT companies prevent breaches, to which they are particularly vulnerable. While AI, IoT and cybersecurity have not actively fueled growth for TMT companies in the past, they are all now active targets of venture funding and will undoubtedly shape future performance.


Can Artificial Intelligence Silence Internet Trolls?

Fortune, Jeff John Roberts


from

A new strategy promises to tame the trolls and reinvigorate civil discussion on the Internet. Hatched by Jigsaw, an in-house think tank at Google’s parent company, Alphabet (googl, -0.75%), the tool relies on artificial intelligence and could solve the once-impossible task of vetting floods of online comments.

To explain what Jigsaw is up against, chief research scientist Lucas Dixon compares the troll problem to so-called denial-of-service attacks in which attackers flood a website with garbage traffic in order to knock it off-line.

“Instead of flooding your website with traffic, it’s flooding the comment section or your social media or hashtag so that no one else can have a word, and basically control the conversation,” says Dixon.


Paralyzed patients communicate thoughts via brain-computer interface

Reuters, Kate Kelland


from

“The striking results overturn my own theory that people with complete locked-in syndrome are not capable of communication,” said Niels Birbaumer, a neuroscientist at Switzerland’s Wyss Center for Bio and Neuroengineering, who co-led the study.

The trial, published in the journal PLOS Biology on Tuesday, involved four patients with amyotrophic lateral sclerosis (ALS) – a progressive motor neuron disease that destroys the part of the nervous system responsible for movement.


Has Technology Ruined the Radiology Profession?

KQED Future of You, Bob Wachter


from

In 2000, only 8 percent of U.S. hospitals had some version of a game-changing computer technology called the Picture Archiving and Communications System, or PACS. By 2008, more than three out of four did.

Because radiology was the first medical specialty to computerize, what has happened to it — at once shocking and, in retrospect, entirely predictable — is our canary in the digital coal mine, its experience offering important lessons for patients, clinicians and health care systems.


A Mystery AI Just Crushed the Best Human Players at Poker

WIRED, Business


from

Another game just fell to the machines.

Yesterday, after 20 days of play at a casino in Pittsburgh, an AI built by two Carnegie Mellon researchers officially defeated four top players at no-limit Texas Hold ‘Em—a particularly complex form of poker that relies heavily on longterm betting strategies and game theory. Over the past twenty years, machines have topped the best humans at checkers, chess, Scrabble, Jeopardy!, and even the ancient game of Go. But no AI had ever beaten the best at such an extreme game of “imperfect information,” a game where certain elements, such as the cards on the table, are hidden. Among humans, no-limit Hold ‘Em requires a certain degree of intuition, not to mention luck.


Four Questions For: Randy Olson

Gigaom, Byron Reese


from

You believe that automated machine learning is going to transform data science. What major changes do you expect to see in your lifetime?

I can easily see automated machine learning transforming the field of data science within the next 5 years. In essence, most advancements in data science, AI, and machine learning lately— for example, AlphaGo dominating the world’s best Go players—have been made by large teams of researchers and engineers meticulously designing and optimizing complex machine learning systems that specialize in one particular problem. Automated machine learning seeks to augment and perhaps one day replace those teams of researchers with AI systems that follow the same design and optimization process, but aren’t constrained by human biases. As an added bonus, automated machine learning systems won’t be limited by human needs such as food and sleep, so they can work as long as there’s computer hardware and electricity.

 
Events



ICWSM-17 – Workshop Program



Montreal, Quebec, Canada May 15 [$$$]
 
Deadlines



Understanding researcher needs and values related to software

The UC Berkeley Library and the California Digital Library are investigating researchers’ perceptions, values, and behaviors in regards to software generated as part of the research process. “If you are a researcher, it would be greatly appreciated if you could spare 10-15 minutes to complete the survey.”

Data and Analytics Camp

Miami, FL April 3-7. Wellstone Action’s Data and Analytics Camp equips progressive leaders with the cutting edge data and analytics skills necessary to support electoral and issue based organizing. Cost is $1800. Deadline to apply is Friday, February 24.

Call for Applications – Adversarial Design: Pushing Back on the Internet of Things

Princeton, NJ Workshop is April 5-7. Deadline to apply is Friday, February 24.
 
Tools & Resources



PathNet: Evolution Channels Gradient Descent in Super Neural Networks

arXiv, Computer Science > Neural and Evolutionary Computing; Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra


from

For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C).


Julia – A Fresh Approach to Numerical Computing

Microsoft, Cortana Intelligence and Machine Learning Blog


from

The Julia language provides a fresh new approach to numerical computing, where there is no longer a compromise between performance and productivity. A high-level language that makes writing natural mathematical code easy, with runtime speeds approaching raw C, Julia has been used to model economic systems at the Federal Reserve, drive autonomous cars at University of California Berkeley, optimize the power grid, calculate solvency requirements for large insurance firms, model the US mortgage markets and map all the stars in the sky.

It would be no surprise then that Julia is a natural fit in many areas of machine learning. ML, and in particular deep learning, drives some of the most demanding numerical computing applications in use today. And the powers of Julia make it a perfect language to implement these algorithms.


[1701.07875] Wasserstein GAN

Statistics > Machine Learning; Martin Arjovsky, Soumith Chintala, Léon Bottou


from

We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.


KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

UC Berkeley, AMPLab


from

Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements. We present KeystoneML, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API. This approach offers increased ease of use and higher performance over existing systems for large scale learning. We demonstrate the effectiveness of KeystoneML in achieving high quality statistical accuracy and scalable training using real world datasets in several domains.


Experience Design in the Machine Learning Era

Medium, Fabien Girardin


from

Traditionally the experience of a digital service follows pre-defined user journeys with clear states and actions. Until recently, it has been the designer’s job to create these linear workflows and transform them into understandable and unobtrusive experiences. This is the story of how that practice is about to change.


Building Voice-Enabled Products With Amazon Alexa

Gigaom, Chris Mohritz


from

This guide will step you through configuring an Amazon Alexa Skill and Lambda function to control the relay attached to an IoT device — specifically, an Intel Edison-powered starter kit. But with some simple modification to the code, you could make Alexa control nearly anything you like.


Hard things that look easy

Agile Scientific, Matt Hall


from

After working on a few data science (aka data analytics aka machine learning) problems with geoscientific data, I think we’ve figured out the 10-step workflow. I’m happy to share it with you now:

  • Look at all these cool problems, machine learning can solve all of these! I just need to figure out which model to use, parameterize it, and IT’S GONNA BE AWESOME, WE’LL BE RICH. Let’s just have a quick look at the data…
  • Oh, there’s no data.
  •  
    Careers


    Full-time positions outside academia

    Support Scientists



    NASA Ames Research Center, Kepler/K2 Science Center; Moffett Field, CA

    Leave a Comment

    Your email address will not be published.