Data Science newsletter – June 19, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 19, 2017

GROUP CURATION: N/A

 
 
Data Science News



Can’t Find Analytic Experts? Here’s How Academic Collaboration Can Help Close the Talent Gap

Forbes, OracleVoice, Rich Clayton


from

An infusion of data-savvy university graduates has not fully replenished the talent shortage, but I believe business investment in higher education will pay dividends.

McKinsey’s reporting on the topic nearly eight years ago indicated higher education increased student supply by approximately 7.5%. This growth is coming from about 700 schools—just 9% of higher-education institutions—and the combined total of now more than 2,000 advanced-analytics programs they offer. More than half of these academic programs were launched in the last six years, yet the talent deficit continues to grow. Why is that?


Leading the way in Data Science

City, University of London


from

Following on the success of the MSc programme, City is set to introduce a four-year undergraduate MSci Data Science degree in October 2017.


What is computational sociolinguistics? (And who’s doing it?)

Rachael Tatman


from

I’ve talked about my different research projects quite a bit on this blog (and I’ve got a couple more projects coming up that I’m excited to share with y’all!) but they can seem a little bit scattered. What do patterns of emoji use have to do with how well speech recognition systems deal with different dialects with how people’s political affiliation is reflected in their punctuation use? The answer is that they all fall within the same discipline: computational sociolingustics.

Computational sociolinguistics is a fairly new field that lies at the intersection of two other, more established fields: computational linguistics and sociolinguistics. You’re actually probably already familiar with at least some of the work being done in computational linguistics and its sister field of Natural Language Processing (commonly called NLP). The technologies that allow us to interact with computers or phones using human language, rather than binary 1’s and 0’s, are the result of decades of research in these fields. Everything from spell check, to search engines that know that “puppy” and “dog” are related topics, to automatic translation are the result of researchers working in computational linguistics and NLP.


Earth Is Not in the Midst of a Sixth Mass Extinction

The Atlantic, Peter Brannen


from

[Doug] Erwin is one of the world’s experts on the End-Permian mass extinction, an unthinkable volcanic nightmare that nearly ended life on earth 252 million years ago. He proposed that earth’s great mass extinctions might unfold like these power grid failures: most of the losses may come, not from the initial shock—software glitches in the case of power grid failures, and asteroids and volcanoes in the case of ancient mass extinctions—but from the secondary cascade of failures that follow. These are devastating chain reactions that no one understands. Erwin thinks that most mass extinctions in earth’s history—global die-offs that killed the majority of animal life on earth—ultimately resulted, not from external shocks, but from the internal dynamics of food webs that faltered and failed catastrophically in unexpected ways, just as the darkening eastern seaboard did in 2003.


pulse2percept: A Python-based simulation framework for bionic vision

bioRxiv; Michael Beyeler, Geoffrey M. Boynton, Ione Fine, Ariel Rokem


from

By 2020 roughly 200 million people worldwide will suffer from photoreceptor diseases such as retinitis pigmentosa and age-related macular degeneration, and a variety of retinal sight restoration technologies are being developed to target these diseases. One technology, analogous to cochlear implants, uses a grid of electrodes to stimulate remaining retinal cells. Two brands of retinal prostheses are currently approved for implantation in patients with late stage photoreceptor disease. Clinical experience with these implants has made it apparent that the vision restored by these devices differs substantially from normal sight. To better understand the outcomes of this technology, we developed pulse2percept, an open-source Python implementation of a computational model that predicts the perceptual experience of retinal prosthesis patients across a wide range of implant configurations. A modular and extensible user interface exposes the different building blocks of the software, making it easy for users to simulate novel implants, stimuli, and retinal models. We hope that this library will contribute substantially to the field of medicine by providing a tool to accelerate the development of visual prostheses.


Developers Who Use Spaces Make More Money Than Those Who Use Tabs

Stack Overflow blog, David Robinson


from

There were 28,657 survey respondents who provided an answer to tabs versus spaces and who considered themselves a professional developer (as opposed to a student or former programmer). Within this group, 40.7% use tabs and 41.8% use spaces (with 17.5% using both). Of them, 12,426 also provided their salary.

Analyzing the data leads us to an interesting conclusion. Coders who use spaces for indentation make more money than ones who use tabs, even if they have the same amount of experience.


CDC Works to Improve Public Health Data Analytics, Surveillance

HealthIT Analytics


from

It may seem as if the clinical environment has taken the lead in collecting data and leveraging analytics for population health, but ambulatory providers and hospitals aren’t the only entities involved in surveilling and predicting community health issues.

< The public health sphere, including the Centers for Disease Control and Prevention (CDC) is also working diligently to improve public health data analytics and disease surveillance in the United States and around the world, providing stakeholders with critical insights into significant population health concerns. Public health entities face the same technical data aggregation and analytics challenges as care providers, including fragmented data sources, poor interoperability, and difficulty developing algorithms that can accurately predict and prepare stakeholders for a rapid response to disease hotspots.


This is What Happens When You Teach an AI to Name Guinea Pigs

Gizmodo, Rae Paoletta


from

“I used Andrej Karpathy’s char-rnn, an open-source neural network framework for torch (written in Lua),” Shane told Gizmodo. “I gave the neural network the list of 600+ guinea pig names that the Portland Guinea Pig Rescue assembled for me, and let it train itself to produce more names like the ones on its list. It gradually formed its own internal rules about which letters and letter combinations are the most quintessentially guinea pig.”


Google Says It Will Do More to Suppress Terrorist Propaganda

Bloomberg Technology, Lindsey Rupp and John Lauerman


from

Alphabet Inc.’s Google says it is creating new policies and practices to suppress terrorism-related videos, a response to U.K. lawmakers who have said the internet is a petri dish for radical ideology.

Google will increase its use of technology to identify extremist and terrorism-related videos across its sites, which include YouTube, and will boost the number of people who screen for terrorism-related content, Google’s General Counsel Kent Walker wrote in an editorial in the Financial Times Sunday. The company will also be more aggressive in putting warnings on and limiting the reach of content that, while not officially forbidden, is still inflammatory.


How to build a smart city: Expert tips on where to start (and pitfalls to avoid)

Smart Cities Dive, Jackie Snow


from

While there are hundreds of cities with smart initiatives in place, there are even more that have not started on their smart journeys. Because there is no specific blueprint on how to build a smart city, many have developed new tactics for creative development. Google’s Sidewalk Labs is even dropping a smart city project on a 12-acre section of downtown Toronto where they want to experiment with “reimagining cities from the internet up.”

Luckily for Google and the rest of the world, there is a host of guidance, ideas and lessons learned (sometimes the hard way) for cities to follow when starting off. Below are some ideas from experts and business leaders on where and how to start, as well as pitfalls to avoid.


ElementAI raises historic $137.5 million Series A round

Montreal In Technology, Quinn Mason


from

The funding will allow Element AI to invest in large-scale AI projects internationally, solidifying its position as the largest global AI company in Canada and creating 250 jobs in the Canadian high tech sector by January 2018.

Co-founded by serial entrepreneurs Jean-François Gagné and Nicolas Chapados, Real Ventures and Yoshua Bengio, a co-father of deep learning technology, Element AI aims to bring academic AI innovation to global organizations. Started in October 2016 to empower industry with the massive scale of academic AI innovation Bengio was driving at the world-leading Montreal Institute of Learning Algorithms (MILA), the two groups pioneered a unique, non-exploitative model of academic cooperation they have since replicated at many other institutes.


Ggplot2 is 10 years old: The program that brought data visualization to the masses

Quartz, Dan Kopf


from

On June 10, 2007, the statistician and developer Hadley Wickham officially released ggplot2, a chart-making system for the statistical programming language R. It would alter the course of his life and the future of data visualization.

“When I created it, I was hoping that maybe a thousand people use it,” Wickham told Quartz. “At the time, that seemed like an incredibly large number.”

In just the last five years, ggplot2 has been downloaded well over 10 million times, with more than 400,000 downloads in the month before this article was published.


At the Intersection of AI, Governments, and Google – Tim Hwang

Y Combinator podcast


from

Today’s episode is with Tim Hwang. Tim’s the global public policy lead on AI and machine learning for Google. And what that basically means is he interacts with governments to inform Google’s opinions on policy. He also help educate governments on what things like machine learning actually mean and helps them figure out what the implications might be. So in this episode, Tim walks us through how governments are thinking about AI, and he also shares some thoughts on what the future might look like. Alright, here we go. And so I think that with AI and then policy on AI, you’ve kind of like nested two obscure things that people don’t really know what you’re talking about. [audio, 39:14]


A discussion about AI’s conflicts and challenges

TechCrunch, Natasha Lomas


from

“The competition for talent at the moment is absolutely ferocious,” agrees Professor Andrew Blake, whose computer vision PhD was obtained in 1983, but who is now, among other things, a scientific advisor to UK-based autonomous vehicle software startup, FiveAI, which is aiming to trial driverless cars on London’s roads in 2019.

Blake founded Microsoft’s computer vision group, and was managing director of Microsoft Research, Cambridge, where he was involved in the development of the Kinect sensor — which was something of an augur for computer vision’s rising star (even if Kinect itself did not achieve the kind of consumer success Microsoft might have hoped).

He’s now research director at the Alan Turing Institute in the UK, which aims to support data science research, which of course means machine learning and AI, and includes probing the ethics and societal implications of AI and big data.


How artificial intelligence can deliver real value to companies

McKinsey Global Institute


from

Much of the AI adoption outside of the tech sector is at an early, experimental stage. Few firms have deployed it at scale. In a McKinsey Global Institute discussion paper, Artificial intelligence: The next digital frontier?, which includes a survey of more than 3,000 AI-aware companies around the world, we find early AI adopters tend to be closer to the digital frontier, are among the larger firms within sectors, deploy AI across the technology groups, use AI in the most core part of the value chain, adopt AI to increase revenue as well as reduce costs, and have the full support of the executive leadership. Companies that have not yet adopted AI technology at scale or in a core part of their business are unsure of a business case for AI or of the returns they can expect on an AI investment.

However, early evidence suggests that there is a business case to be made, and that AI can deliver real value to companies willing to use it across operations and within their core functions. In our survey, early AI adopters that combine strong digital capability with proactive strategies have higher profit margins and expect the performance gap with other firms to widen in the next three years.


Digital humanities research from RIT getting national attention

Rochester Institute of Technology, RIT News


from

In the two years since Rochester Institute of Technology began offering a bachelor’s degree in digital humanities and social sciences, the first two graduates of the program received their degrees, and projects involving students, faculty and alumni connected to the new major are already garnering national attention. … One of the major’s first successes came from Ben Eshleman, who earned a bachelor’s degree in applied arts and sciences this May. He won the Joan Nestle Prize from the American Historical Association’s Committee on Lesbian, Gay, Bisexual & Transgendered History for researching and developing a website, Trans Rochester Speaks, an oral history of activists and advocates in Rochester’s transgender community.


Two Government Agencies. Two Different Climate Maps.

FiveThirtyEight, Maggie Koerth-Baker and Rachael Dottle


from

It turns out, the government produces two hardiness zone maps — one made by the USDA and one made by the National Oceanographic and Atmospheric Administration. Both divide the country into segments, each of which represents a 10-degree increment of the average annual minimum temperature. But the underlying data used to build out the zones is different. Those differences are driven by the agencies’ goals, and they affect what the different maps are intended to be used for.


Bezos’s Grocery ‘Waterloo’ Is Now His Biggest Opportunity

Bloomberg Technology, Brad Stone


from

For Bezos, wounded assets in important or growing business categories aren’t challenges to be avoided. They are puzzles to be solved. (See: The Post, Washington.) Grocery is an $800 billion market in the U.S., still largely untouched by the internet and resistant to change. Whole Foods itself has a well-established brand and high-income demographic that maps well to Amazon’s own customer base, and in particular its Amazon Prime subscription service, with an estimated 80 million members.

Over the last few years, Amazon has been quietly laying the groundwork to take over such a physical retail chain. In Seattle it is testing pickup grocery locations, where customers order online and then drive through to collect their totes under a protective canopy.


Earthquakes and the Rainfall Cycle

University of California-Berkeley, Berkeley Seismology Lab, Seismo Blog


from

So, where is the connection between seismicity and rainfall? It has to do with two things: the weight of water and the strong seasonality of the rainfall in our Golden State. As this last winter clearly showed, more than ninetyfive percent of all precipitation in California falls between October and May. Much of it accumulates as snow in the Sierra Nevada. In addition, the ground is soaked by all the rain and retains much of it. In fact, the weight of the snow and the ground moisture is so large, that it pushes down the Sierra and its foothills by about half an inch or so. This amount is way too small to be noticed in everyday life. It can, however, be measured with sensitive tools called “geodetic GPS receivers”.


Tweet of the Week

Twitter, Academia Obscura


from

 
Events



Trans-Atlantic Symposium on ICT Technology and Policy

Technological Leadership Institute, University of Minnesota


from

Minneapolis, MN June 19-20. Meeting will cover the spectrum from research assessments via commercialisation opportunities to policy issues in information and communication technologies (ICT) such as 5G networks, Big Data, and the Internet of Things (IoT)/cyber-physical systems (CPS). [$$$]

 
Deadlines



NASA invites YOU to apply to be a NASA Datanaut!

We are looking for a new cohort of 50 members to join our growing Datanaut community and are accepting applicants through June 30, 2017.
 
NYU Center for Data Science News



NYU’s Inaugural Workshop on Data Science for Medicine and Health

NYU Center for Data Science


from

While data science often plays a major role in medical research, collaboration between both fields is rare at NYU since our research labs are scattered miles apart in Manhattan and Brooklyn. This is all set to change, however, thanks to the inaugural workshop on data science for medicine and health that CDS professors Kyunghyun Cho, Rumi Chunara, and Juliana Freire organized last Friday, along with David Fenyo, Narges Razavian, and Daniel Sodickson from NYU Langone School of Medicine.

The workshop, which gathered faculty, researchers, and graduate students from CDS, Langone, the Courant Institute of Mathematical Sciences, and the Tandon School of Engineering, was a chance for everyone to share their work with others outside of their department.

 
Tools & Resources



Phase-Functioned Neural Networks for Character Control

Daniel Holden


from

This year at SIGGRAPH I am presenting Phase-Functioned Neural Networks for Character Control. This paper uses a new kind of neural network called a “Phase-Functioned Neural Network” to create a character controller suitable for games. Our controller requires very little memory, is fast to compute at runtime, and generates high quality motion in many complex situations. We also present a technique for fitting terrains from virtual environments to separately captured motion data. This is used to train our system so it can natually traverse rough terrains at runtime.


How I get gender balance in tech-conference committees

Medium, Lorena A Barba


from

Technical conferences are notoriously male-dominated in my fields of interest: computational mechanics, high-performance and GPU computing, scientific computing and data science. At last year’s ISC High-Performance Computing Conference in Frankfurt, I served as chair of the PhD Forum (the separate session for doctoral students to present their work). The first part of that job is to recruit the members of the technical committee — and I set myself the goal of a gender-balanced group. This is how I achieved it.


Why Do Academics Write?

Medium, Heather Knight


from

“Your advisor may or may not help you structure your ideas, and may or may not provide comments on a paper before you submit it. Occasionally, if they are a young professor, your advisor might even participate in the writing. There is an implicit understanding that an advisor will magically transfer knowledge about academic writing into the empty brain of the student (see this comic). But there are no explicit standards.”


Data Science Walkthrough with SQL Server 2017 and Microsoft Machine Learning Services

Cortana Intelligence and Machine Learning Blog


from

Microsoft Machine Learning Services were a key highlight of our SQL Server 2017 CTP 2.0 release in April this year. It allows Python scripts to run within SQL Server or be embedded in SQL scripts and be deployed as stored procedures. This feature essentially brings Python visualization and predictive analytics capabilities close to the data stored within SQL Server. Data scientists can combine the powers of SQL and Python and build end-to-end machine learning solutions with much greater ease.

To accompany the release of CTP2.0, we have developed a walkthrough that uses Microsoft Machine Learning Services and provides an end-to-end data science solution for predictive modeling. The walkthrough is based on a popular public data set, namely the New York City taxi dataset. It uses a combination of Python code, SQL Server data, custom SQL functions and stored procedures to build a classification model that predicts the probability that a driver will get a tip on a given cab ride. In the end, it deploys the Python model to SQL Server and uses server data to generate scores based on the model.


Learning Deep ResNet Blocks Sequentially using Boosting Theory

arXiv, Computer Science > Learning; Furong Huang, Jordan Ash, John Langford, Robert Schapire


from

Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct T weak module classifiers, each contains two of the T layers, such that the combined strong learner is a ResNet. Therefore, we introduce an alternative Deep ResNet training algorithm, \emph{BoostResNet}, which is particularly suitable in non-differentiable architectures. Our proposed algorithm merely requires a sequential training of T “shallow ResNets” which are inexpensive. We prove that the training error decays exponentially with the depth T if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline. In other words, we propose a weak learning condition and prove a boosting theory for ResNet under the weak learning condition. Our results apply to general multi-class ResNets. A generalization error bound based on margin theory is proved and suggests ResNet’s resistant to overfitting under network with l1 norm bounded weights.

 
Careers


Full-time, non-tenured academic positions

Communications/Program Manager



University of California-Berkeley, Berkeley Institute for Data Science; Berkeley, CA

Leave a Comment

Your email address will not be published.