Data Science newsletter – June 13, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 13, 2017

GROUP CURATION: N/A

Data Science News

Industrial IoT is the most important tech scene you haven’t heard of

The Next Web, Sundeep Sanghavi

from June 12, 2017

Artificial Intelligence, the technology Hollywood predicts might be the death of us all — has been around for ages. From fictional Skynet to Apple’s Siri, we’ve all been gawping at AI’s unstoppable rise. Recently, although the hype around AI has continued to build, it hasn’t been matched by real-world rollouts. The same happened after the initial buzz caused by the Internet of Things (IoT), the yawns of once excited investors grew ever more audible as pundits across the web started considering that the fad was coming to an end.

However, this couldn’t be more wrong. Sure, consumer IoT may have slowed down a bit — how many people really need ‘smart’ vacuum cleaners anyway? — but, in factories all over the world, Industrial IoT (IIoT) is booming. Whereas consumer IoT focuses on connecting your cell phone to a coffee maker, IIoT connects industrial equipment in factories, or monitors vehicle fleets, and so on.

According to a report from the bean counters at accountancy firm Pricewaterhouse Cooper (PwC), business leaders see AI and the IoT as phenomena worth investing in.

AMD and Intel Race Towards High Core Count CPU Future

Tech.pinions,, Ryan Shrout

from June 12, 2017

As we prepare for a surprisingly robust summer season of new hardware technologies to be released to the consumer, both Intel and AMD have moved in a direction that both seems inevitable and wildly premature. The announcement and pending introduction of high core count processors, those with many cores that share each company’s most modern architecture and design, brings with it an interesting combination of opportunity and discussion. First and foremost, is there a legitimate need for this type of computing horsepower, in this form factor, and secondly, is this something that consumers will want to purchase?

To be clear, massive core count CPUs have existed for some time but in the server and enterprise markets. Intel’s Xeon line of products have breached the 20-core count in previous generations and if you want to dive into Xeon Phi, a chip that uses older, smaller cores, you will find options with over 70 cores. Important for applications that require a significant amount of multi-threading or virtualization, these were expensive. Very expensive – crossing into the $9000 mark.

What Intel and AMD have begun is a move to bring these high core count products to consumers at more reasonable price points.

This is how Big Oil will die

Medium, Seth Miller

from May 25, 2017

Big Oil is perhaps the most feared and respected industry in history. Oil is warming the planet — cars and trucks contribute about 15% of global fossil fuels emissions — yet this fact barely dents its use. Oil fuels the most politically volatile regions in the world, yet we’ve decided to send military aid to unstable and untrustworthy dictators, because their oil is critical to our own security. For the last century, oil has dominated our economics and our politics. Oil is power.

Yet I argue here that technology is about to undo a century of political and economic dominance by oil. Big Oil will be cut down in the next decade by a combination of smartphone apps, long-life batteries, and simpler gearing. And as is always the case with new technology, the undoing will occur far faster than anyone thought possible.

Turning Healthcare Big Data into Actionable Clinical Intelligence

HealthIT Analytics

from May 25, 2017

The use cases for big data are proliferating rapidly as organizations move deeper into population health management and accountable care, and consumers are keeping pace with their growing demand for cost-effective services that leverage the convenience of their favorite apps and devices.

But despite the data-driven promises looming just over the horizon, the majority of healthcare organizations still have a great deal of work to do before they can turn their budding big data analytics competencies into truly actionable clinical intelligence.

A chronic lack of direction, exacerbated by deeply entrenched interoperability issues and a widespread inability to secure a qualified data science team, have left organizations in something of a slump. A series of industry surveys from recent months point out significant staffing gaps, frustrating health data exchange roadblocks, and organizational planning deficiencies that are keeping providers from breaking through their data doldrums.

NYC Planning Department launches ‘Labs’ unit to boost innovation

Technical.ly Brooklyn, Tyler Woods

from June 12, 2017

A new unit from the NYC Planning Department will aim to make the work of planning the systems of this enormously complicated city easier, more efficient and more open. (The group’s full charter was posted to GitHub, for example.)

The NYC Planning Labs announced itself to the world Monday morning, with a promise to incorporate state-of-the-art technology and best practices to the Planning Department. Mapping guru Chris Whong will head the unit.

Microsoft Research looks back at a year of successful collaboration with the Big Data Hubs

South Big Data Hub, Hubbub! blog

from June 09, 2017

Microsoft Research understands that taking full advantage of big data and new data technologies requires more than developing new tools and technologies. To paraphrase Vani Mandava, director of data science for the research arm of the tech giant, it requires cross-disciplinary research that extends well beyond computer science, and collaboration among domain science experts, computing and data science specialists, and industry leaders in technology and other verticals.

In a post on the Microsoft Research blog, she calls out the Big Data Regional Innovation Hubs as “one such bold opportunity to focus on vertical-specific, yet interdisciplinary research.”

Computer Science: Reprogramming Bias

Princeton Alumni Weekly, Josephine Wolff

from June 07, 2017

One goal of artificial intelligence (AI) is to make computers better able to imitate human reasoning and tasks — but there are downsides to teaching machines to mimic humans too closely, according to a paper published in Science in April by Princeton researchers Aylin Caliskan, Joanna Bryson, and Arvind Narayanan. The team of computer scientists drew from billions of sentences taken from sources across the internet to analyze how human biases about gender, race, and other characteristics might appear when these sentences are used to train a machine. They found that many of the biases observed in humans could be transferred to AI technologies and algorithms that learn from these bodies of text. The findings have implications for a machine’s ability to objectively perform language-based tasks such as web searches, translations, and automated résumé scanning.

New Computing System Takes Its Cues from Human Brain

Georgia Institute of Technology, Georgia Tech Research Horizons

from June 08, 2017

Some problems are so challenging to solve that even the most advanced computers need weeks, not seconds, to process them.

Now a team of researchers at Georgia Institute of Technology and University of Notre Dame has created a new computing system that aims to tackle one of computing’s hardest problems in a fraction of the time.

“We wanted to find a way to solve a problem without using the normal binary representations that have been the backbone of computing for decades,” said Arijit Raychowdhury, an associate professor in Georgia Tech’s School of Electrical and Computer Engineering.

GOAI: Open GPU-Accelerated Data Analytics

NVIDIA, Parallel Forall blog

from June 12, 2017

Recently, Continuum Analytics, H2O.ai, and MapD announced the formation of the GPU Open Analytics Initiative (GOAI). GOAI—also joined by BlazingDB, Graphistry and the Gunrock project from the University of California, Davis—aims to create open frameworks that allow developers and data scientists to build applications using standard data formats and APIs on GPUs. Bringing standard analytics data formats to GPUs will allow data analytics to be even more efficient, and to take advantage of the high throughput of GPUs. NVIDIA believes this initiative is a key contributor to the continued growth of GPU computing in accelerated analytics.

A new way to extend Moore’s law – IBM unveils a transistor that does not leak electrons

The Economist

from June 08, 2017

For all the fearsome complexity of computer chips, their basic components are simple. Transistors are nothing more than switches. To turn one on, a voltage is applied to part of it called a gate. This allows electrical current to flow through a channel between the transistor’s input and output. As transistors shrink, though, insulation breaks down and the current applying the voltage tends to leak away, reducing the gate’s ability to control the channel. One reason for this is a phenomenon called quantum tunnelling, in which the uncertainty of an electron’s position means it is sometimes found in another part of the transistor without having physically crossed there.

To try to keep things ticking along, chipmakers have been tinkering with the basic design of the transistor itself. In 2012, for example, Intel, the biggest chipmaker of the lot, introduced transistors in which the gate surrounds the channel on three sides, making it better able to impose its will.

DeepMind’s neural network teaches AI to reason about the world

New Scientist, Daily News, Matt Reynolds

from June 12, 2017

The world is a confusing place, especially for an AI. But a neural network developed by UK artificial intelligence firm DeepMind that gives computers the ability to understand how different objects are related to each other could help bring it into focus.

Humans use this type of inference – called relational reasoning – all the time, whether we are choosing the best bunch of bananas at the supermarket or piecing together evidence from a crime scene. The ability to transfer abstract relations – such as whether something is to the left of another or bigger than it – from one domain to another gives us a powerful mental toolset with which to understand the world. It is a fundamental part of our intelligence says Sam Gershman, a computational neuroscientist at Harvard University.

New momentum for Harvard ideas in language, health, astronomy

Harvard Gazette

from June 12, 2017

Investigating how languages emerge and evolve. Using climate-change data to predict dust storms and bacterial meningitis outbreaks in Northern Africa. Understanding whether age-related diseases may stem from a common driver. Determining whether the presence of oxygen can be used to predict life on distant exoplanets.

Harvard scientists are known for pushing boundaries, but the projects funded through the 2017 Star Family Challenge for Promising Scientific Research are poised to take that reputation to new heights.

Created through a gift from James A. Star ’83, the annual challenge funds high-risk, high-reward research that might not receive funding through other programs.

Electronic Health Record Implementation Is Associated With a Negligible Change in Outpatient Volume and Billing

American Journal of Orthpedics; Matthew L. Webb et al

from May 17, 2017

The Health Information Technology for Economic and Clinical Health (HITECH) Act mandated that hospitals begin using electronic health records (EHRs). To investigate potential up-coding, we reviewed billing data for changes in patient volumes and up-coding around the time of EHR implementation at our academic medical center. We identified all new, consultation, and return outpatient visits on a monthly basis in the general internal medicine and orthopedics departments at our center. We compared the volume of patient visits and the level of billing coding in these 2 departments before and after their transitions to ambulatory EHRs. Pearson χ2 test was used when appropriate. Patient volumes remained constant during the transition to EHRs. There were small changes in the level of billing coding with EHR implementation. In both departments, these changes accounted for minor, but statistically significant shifts in billing coding (Pearson χ2, P < .001). However, the 44.7% relative increase in level 5 coding in our orthopedics department represented only 1.7% of patient visits overall. These findings indicate that lay media reports about an association between dramatic up-coding and EHRs could be misleading.

Where Does It Hurt? New AI Platform Can Help Assess Your Ailments

MediaPost, Larissa Faw

from June 08, 2017

After six years in incubator development, KBS Albion and start-up Ada Health have released a artificial intelligence (AI) engine and app to provide health assessments in response to real-time patient symptom data.

Berlin-based health tech startup Ada Health approached London-based KBS Albion to help develop this app.

“Almost all our work is analogous to the work we did for Ada, blending business innovation with brand and product design,” says Adam Lawrenson, ECD, KBS Albion.

Apple just hired the star of Stanford’s digital health efforts

CNBC, Christina Farr

from June 08, 2017

Apple has quietly scooped up Dr. Sumbul Desai, the executive director of Stanford Medicine’s center for digital health, who led a groundbreaking telemedicine project there and has been overseeing a project to promote health uses for the Apple Watch.

Desai will serve in a senior role at Apple in the growing health team but will continue to see patients at Stanford, said people familiar with the move.

The End of Human Doctors – The Bleeding Edge of Medical AI Research (Part 2)

Dr Luke Oakden-Rayner

from June 05, 2017

Today we continue looking at breakthrough medical deep learning research, and review a major paper from Stanford researchers that reports “dermatologist level classification of skin cancer”, published in Janurary 2017.

As a reminder, a major focus of this dive into the state of the art research will be barriers to medical AI, particularly technical barriers.

This week I thank Andre Esteva, one of the authors of the paper, for answering several questions I had.

Power-Blox Uses Swarm Intelligence to Create Distributed Micro-Energy Grids

The New Stack, Kimberly Mok

from June 11, 2017

Another big asset is the system’s built-in artificial intelligence algorithms which form a kind of “swarm intelligence,” which gives it the ability to adapt automatically to fluctuating electrical loads and an ever-changing roster of power sources. Available energy in the swarm is stored in its nodes, and each component within the system is able to regulate itself, adapting to the current state of the overall grid, thanks to this relatively simple set of software code.

Power-Blox’s swarm intelligence feature is apparently inspired by nature itself, and it’s this swarm concept that makes Power-Blox system autonomous, user-friendly and scalable. By keeping the fundamentals simple, more complex configurations are possible.

“Swarm technology is based upon nature’s approach towards organizing complex structures in a fully decentralized way,” says the company.

Vaxrank: A Computational Tool For Designing Personalized Cancer Vaccines | bioRxiv

bioRxiv; Alex Rubinsteyn, Isaac Hodes, Julia Kodysh, Jeffrey Hammerbacher

from May 27, 2017

Therapeutic vaccines targeting mutant tumor antigens (“neoantigens”) are an increasingly popular form of personalized cancer immunotherapy. Vaxrank is a computational tool for selecting neoantigen vaccine peptides from tumor mutations, tumor RNA data, and patient HLA type. Vaxrank is freely available at www.github.com/hammerlab/vaxrank under the Apache 2.0 open source license and can also be installed from the Python Package Index.

Satellites are critical for IoT sector to reach its full potential

TechCrunch, Mohammad Marashi

from June 08, 2017

Given the ubiquity of space-based communications, satellite technology will play a critical role in supporting the development of the IoT sector and realizing the full potential of interconnected devices.

This is why satellite operators are investing and collaborating to develop services and hardware that can enable IoT. At the same time, operators are making sure satellite-based solutions can be easily integrated into hybrid networks that combine fiber, wireless networks and satellite to best serve the IoT sector and customers at all levels

U.S. ranks 4th in open data, with leadership by cities and states helping support the numbers

statescoop, Jason Shueh

from June 09, 2017

An analysis finds that federal programs have lost ground, but other levels of government are making strides.

Events

Program – Women and Underrepresented Minorities in Natural Language Processing

Association for Computational Linguistics

from July 30, 2017

Vancouver, Canada Sunday, July 30, before the Annual Meeting of the
Association for Computational Linguistics. [$$$]

Workshop – Research on Learning about Machine Learning | ICER Conference

ACM International Computing Education Research

from June 16, 2017

Tacoma, WA Monday, August 21, at ICER 2017. Deadline to apply is June 16.

Wearable Tech + Digital Health + NeuroTech Boston

ApplySci

from September 19, 2017

Cambridge, MA September 19. The ApplySci faculty will lead interactive discussions on topics including AI in healthcare, novel sensors, digital pharma companions, brain computer interfaces, robotics, breakthrough surgical and ]diagnostic techniques, and meaningful data analysis for health and wellness. [$$$$

Deadlines

Please help us and NSF with this largest-ever survey of CS faculty

This survey is about patterns in faculty hiring, advancement, and productivity
in the North American academic system. It is supported by the NSF (SBE 1633791).

Measuring Behavior 2018: Methods and techniques for measuring human and animal behaviour.

Manchester, England Measuring Behavior 2018 will take place in Manchester, June 5-8, 2018. Deadline for submissions is September 15.

Tools & Resources

nel

wikilinks

from March 25, 2017

nel is an fast, accurate and highly modular framework for linking entities in documents.

Essential Cheat Sheets for Machine Learning and Deep Learning Engineers

Medium, Kailash Ahirwar

from May 28, 2017

Learning machine learning and deep learning is difficult for newbies. As well as deep learning libraries are difficult to understand. I am creating a repository on Github(cheatsheets-ai) with cheat sheets which I collected from different sources. Do visit it and contribute cheat sheets if you have any. Thanks.

The Data Ecosystem: Taking a Cloud-First approach to IT

Dataconomy, Jaspreet Singh

from June 12, 2017

The amount of data that businesses create continues to grow; at the same time, more of it is slipping out of our central data centres and into a mix of Cloud applications, mobile endpoints and remote or branch office file servers. But how can companies keep track of all this data when it is not in one place or even one type of location? The answer lies in taking a “Cloud First” approach to IT.

Taking a “Cloud First” approach to IT doesn’t mean simply lifting everything into public Cloud services like Amazon Web Services or Microsoft Azure then trying to maintain existing management processes. Instead, it requires a new mindset around how to make the most of Cloud architectures while still meeting compliance and security goals.

How A Data Scientist Can Improve His Productivity

dataversioncontrol, Dmitry Petrov

from May 15, 2017

Business context, ML algorithm knowledge and intuition all help you to find a good model faster. But you never know for sure what ideas will bring you the best value.

This is why the iteration time is a critical parameter in data science process. The quicker you iterate, the more you can check ideas and build a better model.

Learning to Cooperate, Compete, and Communicate

OpenAI; Ryan Lowe, Igor Mordatch, Pieter Abbeel, Yi Wu, Aviv Tamar & Jean Harb

from June 08, 2017

Multiagent environments where agents compete for resources are stepping stones on the path to AGI. Multiagent environments have two useful properties: first, there is a natural curriculum — the difficulty of the environment is determined by the skill of your competitors (and if you’re competing against clones of yourself, the environment exactly matches your skill level). Second, a multiagent environment has no stable equilibrium: no matter how smart an agent is, there’s always pressure to get smarter. These environments have a very different feel from traditional environments, and it’ll take a lot more research before we become good at them.

Cloud Maven website released

University of Washington, eScience Institute

from June 09, 2017

“We’re trying to help researchers collaborate,” said Rob Fatland, UW director of cloud and data solutions. “We’re advocating for their success. We offer options they may not realize are available to them.” added Amanda Tan, UW research computing cloud technology lead developer. The pair, who work at the eScience Institute, are the creators of Cloud Maven, a new website devoted to answering any and all questions about cloud computing.

SkyMapper | Southern Sky Survey

The Australian National University

from June 05, 2017

Using a specially-built, 1.3-meter telescope at Siding Spring Observatory near Coonabarabran, the SkyMapper Southern Sky Survey is producing a high-fidelity digital record of the entire southern sky for Australian astronomers.

Careers

Internships and other temporary positions

Data Curator/Web Scraping Specialist

Saamarthya Foundation; Minneapolis, MN

Sports.BradStenger.com

Data Science newsletter – June 13, 2017

Leave a Comment Cancel reply