Data Science newsletter – October 31, 2016

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for October 31, 2016

GROUP CURATION: N/A

Data Science News

Ask a Swiss: Highlights from the Moore/Sloan Data Science Environments Summit (MSDSE 2016)

Michael Beyeler, Ask a Swiss blog

from October 28, 2016

The Moore/Sloan Data Science Environments (MSDSE) program is an ongoing effort to enhance data-driven discovery by supporting cross-disciplinary academic data scientists at research institutions across the nation. Halfway through the program, researchers from the University of Washington (UW), New York University (NYU), and University of California, Berkeley (UCB) came together to present their latest research and discuss the potential future of data science at a three-day summit.

The more i-things we come across, the more problems we have #IoT

Last week was a reality-check for internet-connected devices. They turn out to have woefully inadequate security. Free Software Foundation board of directors member Matthew Garrett writes (with panache) that fixing these security issues will not be easy.

As many of us battled the DDoS for our right to tweet early last week (the injustice! the agony!), the US Army publicly hinted that having sensors attached to everything may produce a sensation of drowning in data, which is not the same as the instant insights and analytical edge that is usually part of #IoT promos.

If you are ready for a smart, longish read Alain Louchez has a superb broad-minded article on the internet of things. He introduces a new level of analysis – ‘normative analytics’ – to the descriptive, diagnostic, predictive and prescriptive analytics categories used by places like Gartner. Normative or self-adaptive analytics encompass the additional complexity, scale, and impact of algorithmic decision making by technologies like driverless cars and surgical robots. Still, if the devices are not secured or spew forth data nobody in the organization can analyze at all, IoT may operate with crippling overabundance for a while.

NASA Scientists Suggest We’ve Been Underestimating Sea Level Rise

Motherboard, Vice.com

from October 29, 2016

A new study, published this month to Geophysical Research Letters, found evidence to suggest that historical sea level records have been off—way off in some areas—by an underestimation of 5 to 28 percent. Global sea level, the paper concluded, rose no less than 5.5 inches over the last century, and likely saw an increase of 6.7 inches.

The reason for this discrepancy was uncovered by earth scientists at NASA’s Jet Propulsion Laboratory and the University of Hawai’i at Mānoa. By comparing newer climate models with older sea level measurements, the team discovered that readings from coastal tide gauges may not have been as indicative as we thought.

Cleantech Rebrands as ‘Smart City’ to Attract Venture Dollars

BloombergTechnology

from October 28, 2016

After years as a venture capital pariah, cleantech startups are rebranding themselves to capitalize on the quest to build a smart city.

Cleantech companies, whose missions are broadly linked to environmental improvement, are benefiting as municipalities around the world embrace smart cities—where internet-connected devices collect data that can be used to address challenges from energy efficiency to traffic, crime and public health.

The smart city market, expected to balloon to $1.57 trillion by 2020, according to a 2014 report from research and consulting firm Frost & Sullivan, is a way for cleantech startups to revive venture interest after the sector’s reputation suffered from failed experiments in areas like solar panels and biofuels in the 2000s.

Using data to hire, fire, and manage you

Frederick Winslow Taylor committed the original sin with respect to using data-driven thinking to manage humans. He hovered over workers’ shoulders in turn of the century American manufacturing plants with a stopwatch, recording the number of seconds each task on an assembly line took. The need for a managerial class (and possibly the dislike of managers) arose from these stopwatch studies.

Using quantitative analytics to aid human resource management is not new. The Chairperson of the U.S. Equal Employment Opportunities Commission, Jenny Yang, noted that AI could reduce human bias and help employers identify employees they may not have found, but only if they are dedicated to developing systems that do not simply replicate their current employee roster. That could be tough. Knowing what we don’t know is always a big challenge. Where do employers get training data to help them build a workforce different and better than the one they have? More likely, they will use AI to increase efficiency in applicant screening in ways that replicate existing conscious or subconscious biases.

Maybe Wall Street will figure out how to find hidden talent as banks realize they need more programmers and cyber security experts, look to dropouts and tinkerers?

Facebook’s AI team, led by Yann LeCun, relies on meaningful human connections to find their new recruits (e.g. the AI team doesn’t use AI for HR). And Thomas Dinsmore notes that even when companies manage to hire skilled data scientists and programmers, they are often so siloed that their insights do not have broad impact. My organizational sociology skills are still safe from automation!

While it may be extremely disheartening to think that you may not make it past the AI assessment for the next job you want because you’re different from the training set, the creepiest application of automated human resources is the Fraudoscope. [I did not make this name up.] The Fraudoscope is a camera that captures “breathing, pulse, pupil dilation, facial tics” and “subconscious gestures” to automatically determine whether or not people are lying. Historically, lie detection technology has not been reliable, to say the least. The problem: “There’s no signal that’s unique to deception, no matter how you measure it,” says Social Psychology Professor Mark Frank. That may not stop HR departments eager to cut down on insider jobs with the Fraudoscope.

Data Sharing — Is the Juice Worth the Squeeze?

New England Journal of Medicine

from November 02, 2016

The past few years have seen considerable interest in the sharing of patient-level data from clinical trials. There is a clear and logical “ethical and scientific imperative”1 for doing so, to permit activities ranging from verification of the original analysis to testing of new hypotheses. This interest has resulted in many publications and meetings, attention from the Institute of Medicine,2 proposed changes in journals’ policies,3 and enormous effort from pharmaceutical sponsors and other groups to provide access to patient-level data.4 It is critical that we learn from these early experiences as we move forward.

NVIDIA and FANUC Join Forces to Implement AI Robotics System

ENGINEERING.com

from October 27, 2016

Super sophisticated GPUs will be produced by one of the world’s largest manufacturers and a robotics company known for lights-out manufacturing operations where robots build other robots.

What could possibly go wrong?

Recently, NVIDIA and FANUC Corporation announced a collaboration in Japan to implement artificial intelligence (AI) on the FANUC Intelligent Edge Link and Drive (FIELD) system. According to the press release, the goal is to “increase robotics productivity and bring new capabilities to automated factories worldwide.” That seems innocent enough.

Salesforce, Uber, and Amazon Bought AI Startups: Here’s Why

Fortune,

from October 28, 2016

At the Machine Learning and the Market for Intelligence conference this week put on by the Rotman School of Management at the University of Toronto, several founders behind artificial intelligence startups that have been acquired by industry heavyweights like Salesforce.com, Uber, and Amazon shared lessons they’ve learned since joining the big-time corporate world.

Helix partners with Mount Sinai, NatGeo for genetic testing platform

MobiHealthNews

from October 27, 2016

Personal genomics startup Helix, which is developing a digital hub for all things related to DNA testing ranging from direct-to-consumer to doctor-ordered, has announced several partnerships: the Icahn School of Medicine at Mount Sinai, National Geographic and ExploraGen will collaborate to deploy Helix’s genetic-testing service, which is powered by DNA sequencing giant Illumina. Additionally, the company secured an investment from Kleiner Perkins Caufield & Byers, and the company also recently announced a partnership with genetic information company Invitae.

Helix aims to offer an approachable platform for people who order DNA tests from multiple organizations – National Geographic’s popular ancestry product, Gene 2.0 Next Generation will launch on the Helix platform in November. Helix will send the saliva swab kits, sequence their DNA and store data in the cloud, and National Geographic will look for the ancestry markers in the DNA. Additonally, innovation company Exploragen is working on a number of applications that match DNA and the science of taste to create personalized “epicurean experiences,” according to the company.

IoT’s moment of truth — who can secure the data flows?

ZDNet, Collaboration 2.0, Oliver Marks

from October 27, 2016

From a service provider perspective, the Internet of Things currently has two main dimensions, both of which are attributes of larger battles for digital dominance.

The first dimension is the Machine to Machine (M2M) industrial internet, which evolved from heavy equipment telemetrics (the measurement and transmission of data by wire, radio, or other means from remote sources to receiving stations for recording and analysis) and has matured and grown on a linear path alongside ‘traditional’ enterprise IT systems for the last fifteen years. Examples of this are ‘time to failure’ monitoring of all types of rotating mass heavy equipment, and data flows into and from ERP, and other enterprise software.

The other, newer dimension is the explosion of product innovation enabled by new sensor developments and big data, enabling data flows from ‘born digital’ devices from and to physical ‘things’ of all sizes to modern digital backbones. It is this newer dimension of our connected world which is causing recent giant societal waves.

Wall Street’s Frantic Push to Hire Coders

Bloomberg

from October 28, 2016

For almost five years, Gregory Furlong worked 50-hour weeks as a shipping clerk at a Best Buy two miles from his childhood home in Wilmington, Delaware. It was a kind of employment purgatory for a computer obsessive who tinkers with motherboards in his free time.

So last year, Furlong, 30, enrolled in a three-month coding boot camp that uses HackerRank, a web platform that trains and grades people on writing computer code. After earning a top ranking for Java developers globally, Furlong was hired by JPMorgan Chase & Co. in December for its two-year technology training program.

This is Wall Street’s new tech meritocracy. Financial institutions traditionally coveted graduates from Stanford and other big-name schools and people already working in Silicon Valley. But that system tends to overlook good programmers from other schools or gifted dropouts, according to recruiters.

Broadband Providers Will Need Permission to Collect Private Data

The New York Times

from October 27, 2016

Federal officials approved broad new privacy rules on Thursday that prevent companies like AT&T and Comcast from collecting and giving out digital information about individuals — such as the websites they visited and the apps they used — in a move that creates landmark protections for internet users.

By a 3-to-2 vote, the Federal Communications Commission clearly took the side of consumers. The new rules require broadband providers to obtain permission from subscribers to gather and give out data on their web browsing, app use, location and financial information. Currently, broadband providers can track users unless those individuals tell them to stop.

It was the first time the F.C.C. has passed such online protections.

NYU Center for Data Science News

Hogg’s Research: #dsesummit, day 2

David Hogg, Hogg's Research blog

from October 25, 2016

My day started with a long breakfast conversation with Yann LeCun (NYU) about adversarial methods in deep learning. In these methods, a generator and discriminator are trained simultaneously, and against one another. It is a great method for finding or describing complex density functions in high dimensions, and people in the business have high hopes. In particular, it is crushing in image applications. We discussed the problem that is currently on my mind, which is modeling the color–magnitude diagram of stars in Gaia, using one of these adversarial systems, plus a good noise model for the parallaxes. I would love to do that, and it should be much easier than the image problems, because the data are much lower in dimensionality.

NYU Researchers Innovating with Deep Learning

insideBIGDATA

from October 29, 2016

In the two video presentations below, learn about how NYU and the Godfather of deep learning & AI, Yann LeCun accepted its first delivery of the NVIDIA DGX-1 AI supercomputer in a box in order to innovate in many diverse fields of study. The DGX-1 is a unique purpose-built system for deep learning and AI accelerated analytics, delivering performance equal to 250 conventional servers. It comes fully integrated with hardware, deep learning software, development tools, and runs popular accelerated analytics applications.

In the first video, discover how NYU’s top AI team of Yann Lecun, Kyunghyung Cho, and Joan Bruna plan to advance building intelligent machines, particle physics models, and medicine with deep learning and AI. [2 videos]

Tools & Resources

fuzzy.ai – Easy AI-powered decision making via API

fuzzy.ai

from October 31, 2016

Fuzzy.ai takes your knowledge and intuition about your business and converts that into an algorithm that improves automatically. No training data or data scientists required.

tsfresh: Automatic extraction of relevant features from time series

GitHub – blue-yonder

from October 26, 2016

TSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value or more complex features such as the time reversal symmetry statistic.

How to Start Learning Deep Learning

Data Science Central, Ofir Press

from September 07, 2016

Due to the recent achievements of artificial neural networks across many different tasks (such as face recognition, object detection and Go), deep learning has become extremely popular. This post aims to be a starting point for those interested in learning more about it.

Careers

Full-time positions outside academia

Open Positions at Center for Open Science

Center for Open Science; Charlottesville, VA

Sports.BradStenger.com

Data Science newsletter – October 31, 2016

Leave a Comment Cancel reply