Data Science newsletter – October 26, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for October 26, 2016

GROUP CURATION: N/A

Data Science News

MIT News

from October 24, 2016

For years, researchers at the MIT Media Lab have been developing a database of images captured at regular distances around several major cities. The images are scored according to different visual characteristics — how safe the depicted areas look, how affluent, how lively, and the like.

In a paper they presented last week at the Association for Computing Machinery’s Multimedia Conference, the researchers, together with colleagues at the University of Trento and the Bruno Kessler Foundation, both in Trento, Italy, compared these safety scores, of neighborhoods in Rome and Milan, to the frequency with which people visited these places, according to cellphone data.

Every Asian American has been asked this question. A computer gives the best answer.

Washington Post, WonkBlog, Jeff Guo

from October 21, 2016

Recently, computer scientists at the University of Rochester tried to teach an algorithm to tell the difference between Chinese, Japanese and Korean faces. They wanted to explore how advancements in artificial intelligence have made it easier for computers to interpret pictures in sophisticated ways. But, intentionally or not, their research taps into the uncomfortable history of how Asians have struggled to fit into American life.

The scientists were inspired by a quiz created by Japanese American web designer Dyske Suematsu. Fifteen years ago, Suematsu decided, half-jokingly, to investigate the stereotype that Asians all look alike. He threw a party in New York City and invited Asian friends. He put their portraits on the Internet and asked strangers to guess their ethnicity.

Machine Learning Veterans Launch ‘Element AI’ – A Montreal Based Artificial Intelligence Startup Factory

Yahoo Finance, PR Newswire

from October 25, 2016

Element AI, an artificial intelligence startup factory, launched today in Montreal. Element AI is founded by serial entrepreneurs Jean-François Gagné and Nicolas Chapados, along with Yoshua Bengio, a co-father of Deep Learning technology. The company will enable world-class entrepreneurs to tackle hard problems by building products that solve their artificial intelligence needs. Element AI brings together cutting edge researchers and industry leaders looking to assist in implementing an “AI-First” strategy and introduces the possibility for any organization to develop their own artificial intelligence strategy.

University Data Science: Transforming A Campus Near You

Brown University launched its new university-wide Data Science Initiative, as data science reshapes higher education. Outside of the Ivy League, Loyola University in Maryland created a Master’s program in data science.

Even the student newspapers pay attention to data science and related applications. The Cornell Daily Sun highlighted a recent campus lecture on computational linguistics by Stanford professor Daniel Jurafsky. The Brown Daily Herald profiled Computer Science professor Stephanie Tellex and her work on intelligent human-robot interaction.

Support from government, corporations and foundations is making a major difference.

The NSF Big Data Hubs are filling in the the spokes. Campus Technology provided an overview of the NSF Hub-Spoke plan earlier this month. The University of Michigan announced its Big Data Spoke award to focus on Advanced Computational Neuroscience.

Mount Sinai’s Icahn School of Medicine is partnering with analytics firm Digital Globe is developing a platform for examining health equity on a planetary scale.

Major foundations efforts are also catalytic, like the Gates Foundation’s transformative gift to the University of Washington Population Health Initiative. Don’t let us forget the tremendous ongoing support of the Moore Foundation and Sloan Foundation, who produced Moore-Sloan Data Science Environment Summit in New York this past week.

As Police Facial Recognition Use Expands, Researchers Finds Flaws

NPR, All Tech Considered

from October 25, 2016

Nearly half of all American adults have been entered into law enforcement facial recognition databases, according to a recent report from Georgetown University’s law school. But there are many problems with the accuracy of the technology that could have an impact on a lot of innocent people.

Deep learning startup Clarifai raises $30 million

VentureBeat, Jordan Novet

from October 25, 2016

Clarifai, a startup providing an application programming interface (API) that offers a type of artificial intelligence (A.I.) known as “deep learning,” is announcing a $30 million round of funding today.

“The money is really to grow the team,” Clarifai founder and chief executive Matthew Zeiler told VentureBeat in an email. “We have a really exciting roadmap that continues to position us as the independent A.I. company out there, and the only bottleneck on executing on it is the number of people in the company. We plan to grow all functions, from research to engineering to developer evangelists to sales and marketing.”

Big Data Innovation Hubs Selected for NTIS Joint Venture Partnership: “A Major Milestone for the Data Economy”

West Big Data Innovation Hub

from October 25, 2016

The four Big Data Regional Innovation Hubs have been selected by the National Technical Information Service (NTIS) of the U.S. Department of Commerce to enter into a Joint Venture Partnership. Once finalized, this partnership will provide opportunities for collaborations between the Big Data Hubs and NTIS to deliver groundbreaking data projects across federal agencies.

Federal government agencies collect a superabundance of useful data concerning topics such as the economy, population, trade statistics, climate, and more. However, impactful innovation is constrained by barriers to effective data sharing, analysis, and solution-building. Through the NTIS Joint Venture Partnership, the Big Data Hubs will contribute to advancing federal big data priorities, including promoting U.S. economic growth and enabling operational excellence, by directly connecting federal agencies with data science experts in industry, academia and nonprofit organizations to solve substantial challenges in delivering critical public services.

What Do People — Not Techies, Not Companies — Think About Artificial Intelligence?

Harvard Business Review, Leslie Gaines-Ross

from October 24, 2016

In 1942 the author and professor Isaac Asimov introduced his Three Laws of Robotics, one of the most well-known attempts to establish workable rules integrating artificial intelligence, or AI, into society. Since then, many science fiction writers, philosophers, scientists, and others have grappled with the pros and cons of AI.

This attention has only increased. Just this September, five of the largest tech companies teamed up to create a coalition, the Partnership on Artificial Intelligence to Benefit People and Society, to assure people that AI was not about creating killer robots. And earlier this month, under President Obama’s leadership, the White House issued a report, “Preparing for the Future of Artificial Intelligence,” discussing AI’s possible applications and how it is likely to impact society, for better or worse.

What we have heard less of, however, is what everyday consumers think about AI’s potential and pitfalls, about whether AI will help or hurt the world. We decided to ask.

Google: Our Assistant Will Trigger the Next Era of AI

Backchannel, Steven Levy

from October 25, 2016

Fernando Pereira, who leads Google’s projects in natural language understanding, is less excited about his company’s shiny new devices than he is about what will happen when people use them.

“Let me tell you a little bit about The Transition,” he says.

Pereira holds the title of distinguished scientist at Google. Since arriving at the company in 2008 from his post as the chair of the Department of Computer and Information Science at the University of Pennsylvania, he has been at the center of Google’s efforts to answer the question: How do we learn the meaning of text from data?

Digital Today, Cognitive Tomorrow

MIT Sloan

from September 12, 2016

In today’s economy, we are seeing companies, business models, products, and processes undergoing major transformation. Enterprises and governments are rapidly “becoming digital” as they seek to capture the cost savings, agility, and collaboration enabled by cloud, analytics, mobile, and social technologies.

However, digital is not the destination. Rather, it is laying the foundation for a much more profound transformation to come. Within five years, I believe all major business decisions will be enhanced by cognitive technologies.

I sensed the magnitude of the transition for the first time in 2011, when I watched IBM’s Watson system win on “Jeopardy!” At the time, I felt that I was watching history in the making: The technology known as artificial intelligence (AI) was finally moving from the lab into the world.

Why are we seeing this now?

Artificial Intelligence Might Fix The Worst Part Of Flying

Vocativ

from October 25, 2016

Tech startup Evolv wants to use computer vision and machine learning to help you breeze through airport security

[1610.06987] Multitask Learning of Vegetation Biochemistry from Hyperspectral Data

arXiv, Computer Science > Computer Vision and Pattern Recognition; Utsav B. Gewali, Sildomar T. Monteiro

from October 22, 2016

Statistical models have been successful in accurately estimating the biochemical contents of vegetation from the reflectance spectra. However, their performance deteriorates when there is a scarcity of sizable amount of ground truth data for modeling the complex non-linear relationship occurring between the spectrum and the biochemical quantity. We propose a novel Gaussian process based multitask learning method for improving the prediction of a biochemical through the transfer of knowledge from the learned models for predicting related biochemicals. This method is most advantageous when there are few ground truth data for the biochemical of interest, but plenty of ground truth data for related biochemicals. The proposed multitask Gaussian process hypothesizes that the inter-relationship between the biochemical quantities is better modeled by using a combination of two or more covariance functions and inter-task correlation matrices. In the experiments, our method outperformed the current methods on two real-world datasets.

AT&T Is Spying on Americans for Profit, New Documents Reveal

The Daily Beast

from October 25, 2016

On Nov. 11, 2013, Victorville, California, sheriff’s deputies and a coroner responded to a motorcyclist’s report of human remains outside of town.

They identified the partially bleached skull of a child, and later discovered the remains of the McStay family who had been missing for the past three years. Joseph, 40, his wife Summer, 43, Gianni, 4, and Joseph Jr., 3, had been bludgeoned to death and buried in shallow graves in the desert.

Investigators long suspected Charles Merritt in the family’s disappearance, interviewing him days after they went missing. Merritt was McStay’s business partner and the last person known to see him alive. Merritt had also borrowed $30,000 from McStay to cover a gambling debt, a mutual business partner told police. None of it was enough to make an arrest.

If we want AI, we have to trust machines to test it

Bloomberg Government, Cameron Leuthy

from October 25, 2016

The Air Force lab is using ALPHA, a program developed at the University of Cincinnati, that uses fuzzy logic — a computing approach based on degrees of truth rather than Boolean logic — and rapid processing to consistently surpass the performance of human pilots in simulations.

“Every 6.5 milliseconds ALPHA can take in the entirety of sensor data, organize the data and create a complete mapping of the scenario,” allowing it to make quick changes, according to ALPHA’s creators.

That’s clearly faster than humans can think, making it impossible for people to test whether the results are valid. Machines must examine the results — so, despite the concerns of Carter and others, these tests will put us on the path of trusting machines before we ever field autonomous systems.

Developing a Remote Sensing System to Track Marine Debris

Eos

from October 25, 2016

Scientists gathered at a workshop in January 2016 at the University of Hawai‘i at Mānoa in Honolulu with the goal of jump-starting development of such a system that could sense marine debris from space. Participants included researchers, citizen scientists, engineers, and cleanup experts from academic institutions, government and nongovernment organizations, and industry. They represented the United States, Japan, The Netherlands, and the United Nations Environment Programme.

The diverse group concluded that no single sensor or platform can “see” all debris varieties, and therefore the observational system must consist of different sensors, each specialized for various types of debris in different environments. Thus, traditional optical imagery can be used for the largest debris in selected cloud-free areas; synthetic aperture radar imagery and interferometry is suited for all-weather detections and drift measurements; imaging spectroscopy can detect low concentrations of subpixel-size plastic particles; and Raman spectroscopy, which penetrates below the surface, is useful for measuring debris sunk in shallow seas or suspended in the upper ocean.

NYU Center for Data Science News

Steinhardt Music & Audio Research Lab Receives NSF Grant to Study Bird Calls

NYU News

from October 24, 2016

Juan Pablo Bello, associate professor of music technology and director of the Music and Audio Research Lab (MARL) at the NYU Steinhardt School of Culture, Education and Human Development, was awarded a grant of over $600,0000 by the National Science Foundation to research and refine emerging machine listening technologies to identify different species of birds during migration.

Tools & Resources

pandas 2.0 Design Documents

GitHub – pandas-dev

from October 26, 2016

These are a set of documents, based on discussions started in December 2015, to assist with discussions around changes to Python pandas’s internal design intended to better accommodate the evolving needs of the growing Python data userbase and to help ensure that pandas remains a relevant and important project in the future.

We also will use this place to collect ideas for things to remove (such as deprecated features, but possibly other things) from the library that don’t necessarily depend on the internal implementation.

HindSight: Encouraging Exploration and Engagement in Data Visualization

Medium, Evan Peck

from October 24, 2016

If you love data visualization as much as we do, you probably believe that it can be a powerful tool for reasoning and understanding. Given the recent success of data journalism, you may also believe that data visualization makes complex topics more accessible to everyone, converting indecipherable sheets of numbers into the visual language of our brain. It seems that every day, a new visualization paints a compelling portrait of the world in 2016.

But hiding under this optimism, a depressing trend is emerging in the research surrounding data exploration and understanding: People aren’t exploring data.

Google Drive CLI Client

GitHub – prasmussen

from September 26, 2016

gdrive is a command line utility for interacting with Google Drive.

Microsoft releases beta of Microsoft Cognitive Toolkit for deep learning advances

Microsoft Next blog, Allison Linn

from October 25, 2016

Microsoft has released an updated version of Microsoft Cognitive Toolkit, a system for deep learning that is used to speed advances in areas such as speech and image recognition and search relevance on CPUs and NVIDIA® GPUs.

Building an efficient neural language model over a billion words

Facebook Code , Engineering Blog, Edouard Grave, Justin Chiu, Armand Joulin

from October 25, 2016

Facebook AI Research (FAIR) designed a novel softmax function approximation tailored for GPUs to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax (more details in the paper), circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. This approach further reduces the computational cost by leveraging the specificities of modern architectures and matrix-matrix vector operations both at train and test time. This makes it particularly suited for GPUs whereas most previous approaches, such as hierachical softmax, NCE, and importance sampling, have been designed for standard CPUs.

Chain Launches Open Source Blockchain Developer Platform

Blockchain News

from October 24, 2016

A week after R3 announced that Corda would be made open source and folded into Hyperledger, Chain, a provider of Blockchain technology solutions, has decided to release Chain Core Developer Edition, a free and open source version of its distributed ledger platform that enables organizations to issue and transfer assets on permissioned Blockchain networks.

Careers

Full-time positions outside academia

Quantitative Researcher, Growth

Facebook, Data and Analytics; Menlo Park, CA

Full-time, non-tenured academic positions

Bioinformatics analyst and a scientific software engineer for computational biology

Harvard University, Eddy/Rivas laboratory; Cambridge, MA

Postdocs

Post-Doc, Research Assistant Professor in Complex Systems

Northwestern University, Northwestern Institute on Complex Systems (NICO); Evanston, IL

Sports.BradStenger.com

Data Science newsletter – October 26, 2016

Leave a Comment Cancel reply