Data Science newsletter – June 15, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 15, 2017

GROUP CURATION: N/A

Data Science News

There’s An AI Revolution Underway And It’s Happening In Canada

Forbes, Bijan Khosravi

from June 09, 2017

Watch out U.S.—there’s a new startup hotbed in town and it’s coming from our neighbors to the north. I recently spent a week in Toronto and Montreal working with new startups and can’t believe the passion and energy I felt there. The DNA of Canadian entrepreneurs is charged with technical competency, commitment to creating value and a drive to make a difference.

Unfortunately the Canadian startup scene has been plagued with a lack of investment dollars and experience, resulting in few mergers and acquisitions and major IPOs. The conditions to create the next unicorn have never been met. Canada, a place where education in subjects like math has always been strong, hasn’t seen the startup success of other countries. Canada has historically been light on the presence of major VCs and the investments that do happen are low, so most startups are forced to sell early and cheap rather than grow and increase in value. Consider, for example, that the investment dollars from 2011 to 2015 was flat – running around $2B. This does not make for a recipe of success.

The Smartphone Psychiatrist

The Atlantic, David Dobbs

from June 14, 2017

Frustrated by the failures in his field, Tom Insel, a former director of the National Institute of Mental Health, is now trying to reduce the world’s anguish through the devices in people’s pockets. [audio, 46:42]

Tim Cook says Apple’s car project is ‘the mother of all AI projects’

Twitter, Romain Dillet

from June 13, 2017

In a very brief interview with Bloomberg, Apple CEO Tim Cook confirmed that the company has been working on an Apple car. This isn’t the first time Apple talked about the company’s ongoing work in the car industry.

Cook described Apple’s car project as the combination of three different factors — self-driving cars, electric vehicles and ride-sharing. Those three things together make the car industry interesting again.

More interestingly, Cook said that Apple is more focused on self-driving technology and not necessarily the car itself. “We’re focusing on… what we talked about we’re focusing on publicly is… We’re focusing on autonomous systems,” he said.

The future of education is plain text

Simply Statistics blog, Jeff Leek

from June 13, 2017

I was recently at a National Academy meeting on Envisioning the Data Science Curriculum. It was a fun meeting and one of the questions that came up was what kind of infrastructure do we need to enable shared curricula, compatibility across schools, and not reinventing the wheel. My answer to this question was that we need lecture notes stored in plain text files (like rmarkdown files) and data stored in csv files with direct links.

I am not nearly the first person to make this argument. Lorena Barba explained why her MOOC doesn’t use video almost two years ago now. More recently a blog post pointed out some advantages of plain text code files for data analysis. I’m sure there are more that I don’t know about.

I think that the future of education is in plain text documents (not just for data science) and that the future of data storage is in simple csvs for all but the most complicated data sets. Why?

The secret origin story of the iPhone

The Verge, Brian Merchant

from June 13, 2017

This month marks 10 years since Apple launched the first iPhone, a device that would fundamentally transform how we interact with technology, culture, and each other. Ahead of that anniversary, Motherboard editor Brian Merchant embarked on an investigation to uncover the iPhone’s untold origin. The One Device: The secret history of the iPhone, out on June 20th, traces that journey from Kenyan mines to Chinese factories all the way to One Infinite Loop. The following excerpt has been lightly condensed and edited. [longform]

Facebook needs to be more open about its effect on democracy

The Guardian, John Gallacher and Monica Kaminska

from June 12, 2017

Social media plays a huge role in elections. But while Twitter allows access to its data, Facebook’s secrecy means the extent of its influence may never be known

EA wants to use machine learning to create real-time game narratives

Gamasutra, Chris Kerr

from June 12, 2017

EA boss Andrew Wilson has been speaking to Glixel about the publisher’s plan for the next five years.

Now, it’s not uncommon to hear some of the industry’s biggest names wax lyrical about the future or the ‘next big thing’ during E3. The whole point of the show, after all, is to get consumers foaming at the mouth about what lies ahead.

But what’s interesting about this particular pitch is the way Wilson described EA’s plan to harness neural networks and machine learning to tell dynamic, engaging, and highly-personal stories.

The Next Frontier: Data Mining Social Media Images

Cornell Tech, News & Views

from June 12, 2017

Imagine what you could measure about the world if you had access to all the photos ever taken. This is a question that fascinates Noah Snavely, associate professor of computer science at Cornell Tech. An abundance of tools can mine data from web-based text, but for many years, the computer vision community has thought of images as the ‘dark matter’ of the web.

Snavely is currently working in the Cornell Graphics and Vision Group, where he develops innovative technologies for unlocking this vast data source. But his fascination with computer vision goes back to his student days.

Charles River Analytics Leads Multidisciplinary Team in DARPA’s Explainable Artificial Intelligence Program

Charles River Analytics

from June 12, 2017

Charles River Analytics Inc., developer of intelligent systems solutions, is leading a team as part of a Defense Advanced Research Project Agency (DARPA) program on Explainable Artificial Intelligence (XAI). In the XAI program, DARPA is funding multiple teams to make artificial intelligence more explainable through the interdependent development of machine learning techniques and human-computer interaction designs. The four-year broad agency announcement contract is valued close to $8 million.

Charles River’s team for the Causal Models to Explain Learning effort, known as CAMEL, includes Brown University, the University of Massachusetts at Amherst, and Roth Cognitive Engineering. The CAMEL team will use the notion of causality as a key concept in creating explanations humans can understand and trust.

DARPA Programs Create Defense Systems for Data, Networks

U.S. DEPARTMENT OF DEFENSE

from June 14, 2017

The internet is an infrastructure that supports all aspects of modern life, and the Defense Advanced Research Projects Agency is creating technology to protect U.S. data, networks and national security, DARPA’s acting director said in Baltimore today.

Steven H. Walker spoke to an audience this morning at 2017 Armed Forces Communications and Electronics Association Defensive Cyber Operations Symposium in Baltimore, noting that the problems the United States faces in the cyber domain have are among the nation’s most serious.

“Even a combination of the most advanced technology cannot solve the problems America faces in the cyber domain entirely,” he said, and described DARPA’s efforts to “create technologies that protect our data, our networks and our national security when it comes to these threats.”

[1706.02361] On the Robustness of Deep Convolutional Neural Networks for Music Classification

arXiv, Computer Science > Information Retrieval Keunwoo Choi, George Fazekas, Kyunghyun Cho, Mark Sandler

from June 07, 2017

epen our understanding of their properties. We analyse and (re-)validate a large music tagging dataset to investigate the reliability of training and evaluation. We perform comprehensive experiments involving audio preprocessing using different time-frequency representations, logarithmic magnitude compression, frequency weighting and scaling. Using a trained network, we compute label vector similarities which is compared to groundtruth similarity.
The results highlight several import aspects of music tagging and neural networks. We show that networks can be effective despite of relatively large error rates in groundtruth datasets. We subsequently show that many commonly used input preprocessing techniques are redundant except magnitude compression. Lastly, the analysis of our trained network provides valuable insight into the relationships between music tags. These results highlight the benefit of using data-driven methods to address automatic music tagging.

The ‘time machine’ reconstructing ancient Venice’s social networks

Nature News & Comment, Alison Abbott

from June 14, 2017

Machine-learning project will analyse 1,000 years of maps and manuscripts from the floating city’s golden age. [longform]

New evidence that all stars are born in pairs

University of California-Berkeley, Berkeley News

from June 13, 2017

The new assertion is based on a radio survey of a giant molecular cloud filled with recently formed stars in the constellation Perseus, and a mathematical model that can explain the Perseus observations only if all sunlike stars are born with a companion.

“We are saying, yes, there probably was a Nemesis, a long time ago,” said co-author Steven Stahler, a UC Berkeley research astronomer.

“We ran a series of statistical models to see if we could account for the relative populations of young single stars and binaries of all separations in the Perseus molecular cloud, and the only model that could reproduce the data was one in which all stars form initially as wide binaries. These systems then either shrink or break apart within a million years.”

Facebook hires Siri natural language understanding chief from Apple

VentureBeat, Blair Hanley Frank

from June 14, 2017

Apple’s Siri team lost its head of natural language understanding this month. Rushin Shah left his post as a senior machine learning manager on Apple’s virtual assistant to join Facebook’s Applied Machine Learning team, where he’ll be working on natural language and dialog understanding, according to a LinkedIn post.

5 Fintech Companies Using AI to Improve Business

SingularityHub, Alison E. Berman

from June 12, 2017

The fintech companies that are successfully leveraging AI today are the ones that have found a very concrete way to apply the technology to an existing business problem. For example, technology such as specialized hardware, big data analytics, and machine learning algorithms are being used in fintech to augment tasks that people already perform.

At the Singularity University Exponential Finance Summit this week, Neil Jacobstein, faculty chair of Artificial Intelligence and Robotics at SU, shared some of the most interesting AI companies in fintech right now.

Inside the Algorithm That Tries to Predict Gun Violence in Chicago

The New York Times, The Upshot blog, Jeff Asher and Rob Arthur

from June 13, 2017

Gun violence in Chicago has surged since late 2015, and much of the news media attention on how the city plans to address this problem has focused on the Strategic Subject List, or S.S.L.

The list is made by an algorithm that tries to predict who is most likely to be involved in a shooting, either as perpetrator or victim. The algorithm is not public, but the city has now placed a version of the list — without names — online through its open data portal, making it possible for the first time to see how Chicago evaluates risk.

We analyzed that information and found that the assigned risk scores — and what characteristics go into them — are sometimes at odds with the Chicago Police Department’s public statements and cut against some common perceptions.

Democratization of data science and emergence of citizen scientists

The Daily Californian, Melany Dillon

from May 26, 2017

The emerging trend of cheap, widely available analytical tools has led to a “democratization” of data science and the rise of the so-called citizen scientist. As more computational software and data science educational material, such as UC Berkeley’s own online data science lectures, becomes available to more people, citizen data scientists — “people … that may have some data skills … and (put) them to work exploring and analyzing data,” according to researcher Alexander Linden — play a larger role. These new data practitioners make data analysis faster, cheaper and often more accurate and can alter the average joe’s relationship to Big Data.

The new amateur scientists allow a greater number of people to work on a single project, increasing the accuracy of research and allowing citizen scientists to help society cope with the dearth of data scientists. Websites such as Kaggle allow this to happen by establishing communities for citizen data scientists to collaborate and compete (check out one of their competitions here). Students at UC Berkeley can explore Kaggle by taking the aptly titled Data Science for Kaggle Decal. UC Berkeley is not the only university aiding the rise of the citizen scientist. Schools like Carnegie Mellon and Cornell University have pioneered programs in data science, and, like UC Berkeley, have provided valuable data science content to the public.

Gray Seals, Once Hunted, Are Thriving In New England

NPR, The Two-Way, Merrit Kennedy

from June 14, 2017

“Past surveys based on traditional methods of counting, using occupied aircraft to survey seals on beaches, islands and seasonal ice cover, counted about 15,000 seals off the southeastern Massachusetts coast,” David W. Johnson, a professor of marine conservation ecology at Duke’s Nicholas School of the Environment, said in a statement.

“Our technology-aided aerial survey, which used Google Earth imagery in conjunction with telemetry data from tagged animals, suggests the number is much larger — between 30,000 and 50,000,” he adds, calling it a “conservation success.”

The researchers also pointed to “exponential growth” in the population from a Nova Scotia breeding colony, where animals are known to move south and vice versa. The animals can grow to 10 feet long and live up to 35 years.

Helping or hacking? Engineers and ethicists must work together on brain-computer interface technology

The Conversation, Eran Klein and Katherine Pratt

from June 13, 2017

BCIs use brain signals to control objects in the outside world. They’re a potentially world-changing innovation – imagine being paralyzed but able to “reach” for something with a prosthetic arm just by thinking about it. But the revolutionary technology also raises concerns. Here at the University of Washington’s Center for Sensorimotor Neural Engineering (CSNE) we and our colleagues are researching BCI technology – and a crucial part of that includes working on issues such as neuroethics and neural security. Ethicists and engineers are working together to understand and quantify risks and develop ways to protect the public now.

Is there structural racism on the internet?

The Conversation, Charlton McIlwain

from June 12, 2017

Internet users have – more or less – complete freedom to travel where they choose. Websites can’t see the color of a user’s skin and police incoming traffic in the same way human beings can and do in geographical spaces. Therefore, it’s easy to imagine that the internet’s very structure – the social environments it produces and the new economies it births – might not be racially segregated the way the physical world is.

And yet the internet does appear in fact segregated along racial lines. My research demonstrates that websites focusing on racial issues are visited less often, and are less visible in search result rankings than sites with different, or broader, focuses. This phenomenon is not based on anything that individual website producers do. Rather, it appears to be a product of how users themselves find and share information online, a process mediated mostly by search engines and, increasingly, social media platforms.

Andrew Ng: AI Is the New Electricity

WSJ Video

from June 09, 2017

Artificial intelligence is set to revolutionize every sector, and will likely eliminate whole categories of jobs in the next few years, said Andrew Ng, a Stanford adjunct professor and former Baidu scientist, at the WSJ D.Live Asia conference.

Events

The Hive Think Tank – Behind Every Great Product is a Product Manager

Meetup, The Hive

from July 06, 2017

Santa Clara, CA Two Deans from Carnegie Mellon Univeristy — Bob Dammon, dean of the Tepper School of Business and Andrew Moore, dean of the School of Computer Science, will be at THE HIVE for a Think Tank event on Thursday, July 6, starting at 6pm. [registration required]

Social Theory for Computer Science Education | ICER Conference

Elizabeth Patitsas, ICER 2017

from August 16, 2017

Tacoma, WA A half-day tutorial-style workshop co-located with ICER 2017 on Wednesday, August 16. [$$$]

Deadlines

Pushing the Limits of Scene Understanding: LSUN’17 Workshop and Semantic Image Segmentation Challenges

“We invite researchers to participate in a challenge for large-scale scene understanding models as part of the LSUN workshop at this year’s Computer Vision and Pattern Recognition (CVPR) conference.” Deadline for submissions is July 9.

Tools & Resources

Workflow systems for science: programming in the large

Software Sustainability Institute (UK), Daniel S. Katz

from June 14, 2017

A large amount of today’s computational and data science involves combining the execution of many tasks that each run a model or transform data, to build up a multi-component model or a multi-stage data transformation. Most researchers initially do this manually, and then (if they have any programming experience) eventually move to using shell scripts when the manual process gets too painful. However, shell scripts tend to limit the work to single resources, as they don’t really work well with parallel computing.

System programming in Rust: beyond safety

Adrian Colyer, the morning paper

from June 14, 2017

Balasubramanian et al. want us to switch all of our systems programming over to Rust. This paper sets out the case.

Despite many advances in programming languages, clean-slate operating systems, hypervisors, key-value stores, web servers, network and storage frameworks are still developed in C, a programming language that is in many ways closer to assembly than to a modern high-level language. Today, the price of running unsafe code is high… Why are we still using C?

Activity in Deep Learning Specialized Hardware

Stephen McAnearney, Building 58 blog

from June 12, 2017

Best practice hardware implementation of deep learning seems to be using (Nvidia) GPUs for training, but for inference the GPUs tend not to be efficient enough for probable applications. Incumbents and startups alike see this as an opportunity to build specialized inference based hardware (although some are also attempting to unseat Nvidia as leader in training hardware). Below is an aggregation of new hardware approaches for DL.

Bradford Cross’s answer to What is vertical AI?

Quora, Bradford Cross

from June 14, 2017

In a recent talk at AI by the bay, I laid out a four-factor definition of what I consider to be a vertical AI startup.

1. Full stack products

Provide a full-stack fully-integrated solution to the end customer problem from the interface that solves for the need all the way down the stack to the functionality, models, and data that power the interface.

Careers

Full-time positions outside academia

Legal Counsel – Law Enforcement and Safety

Twitter; San Francisco, CA

Senior Full Stack Node.js Developer, NASA Project Open Data (Senior Software Engineer V)

NASA Ames Research Center; Moffett Field, CA

Sports.BradStenger.com

Data Science newsletter – June 15, 2017

Leave a Comment Cancel reply