Data Science newsletter – April 23, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for April 23, 2018

GROUP CURATION: N/A

 
 
Data Science News



Google’s New AI Head Is So Smart He Doesn’t Need AI

WIRED, Business, Tom Simonite


from

Google’s heavy investment in artificial intelligence has helped the company’s software write music and beat humans at complex board games. What unlikely feats could be next? The company’s new head of AI says he’d like to see Google move deeper into areas such as healthcare. He also warns that the company will face some tricky ethical questions over appropriate uses for AI as it expands its use of the technology.

The new AI boss at Google is Jeff Dean. The lean 50-year-old computer scientist joined the company in 1999, when it was a startup less than one year old. He earned a reputation as one of the industry’s most talented coders by helping Google become a computational powerhouse with new approaches to databases and large-scale data analysis. Google colleagues once created a joke website of “Jeff Dean facts,” including his purported role in accelerating the speed of light. Another had it that Dean doesn’t really exist—he’s an advanced AI created by Jeff Dean.


Data Visualization of the Week

Twitter, Design Indaba


from


Tweet of the Week: How to spot a fake

Twitter, Thoughts of Dog


from


Chicago sports tech firm Stats looks to bring A.I. to the broadcast booth and sideline

Chicago Tribune, Ally Marotti


from

The company, which has already started building out its artificial intelligence team, is working to train computers to review game footage and extract statistics, providing new insights for coaches and players and fun facts for broadcasters to relay to fans.

“There’s lots of video footage that exists globally of historic games,” Henderson said. “If we can get computers basically to watch the game and code the game, that enables us to aggregate data sources on a much grander scale than individual humans can.”


New center set to continue MSU’s innovative legacy of turning data into action | Mississippi State University

Mississippi State University


from

Mississippi State University officials celebrated the opening of a new data center Friday [April 20] that will continue the university’s legacy of innovative data science practices.

Mississippi State’s National Strategic Planning and Analysis Research Center (NSPARC) Data Center contains 3,300 square feet of state-of-the-art technology that can process and store more than 400 terabytes of data, allowing NSPARC to further its mission of using data science to drive human progress. The new facility is located in MSU’s Thad Cochran Research, Technology and Economic Development Park.

“We’re expanding research boundaries like never before,” said Domenico “Mimmo” Parisi, NSPARC executive director. “We are providing students with new opportunities like never before and we are elevating the prominence of the role of universities in our society. We are looking at methods that can expand the use of data in ways that can make a difference in people’s lives.”


Wake Forest Provides a Model for BC to Implement Engineering Program

Boston College, The Heights student newspaper, Jack Goldman


from

In April 2017, The Heights reported on University plans to construct a state-of-the-art Institute for Integrated Sciences and Society. Last November, BC announced that University Trustee and Apple Senior Vice President of Worldwide Marketing Phil Schiller, BC ’82, and his wife, Kim Gassett-Schiller, pledged a $25 million multi-year gift to the construction of the Institute, which will be named in their honor.

The Schiller Institute is part of a $300 million investment that BC plans to make in the sciences, including the hiring of 22 new faculty and new courses in applied sciences, data science, and global public health. The institute will facilitate an interdisciplinary approach to the sciences, with the goal of integrating science, entrepreneurship, and other fields to help solve global problems in health, the environment, and energy. The building itself will contain new laboratories, maker spaces—a type of collaboration room that will give students “hands-on experiences in prototyping research” according to BC’s website—and some classrooms.

Thanks to these resources, an engineering program could become a reality.


SAP Study Reveals Key Traits of Machine Learning Leaders

SAP News Center


from

Nearly half (48 percent) of the companies who say they have already benefited from machine learning cite increased profitability as the top benefit they have realized, according to a new study from SAP SE (NYSE: SAP).

A similar share of companies who are already benefiting from machine learning also expect revenue growth of more than 6 percent for the two-year period of 2018-2019, the study showed. The study was conducted by the Economist Intelligence Unit (EIU) and written in discussion with SAP.

“Making the Most of Machine Learning: 5 Lessons from Fast Learners” is based on survey results from 360 senior executives across four geographic regions: North America, Europe, Asia Pacific and Latin America. The study identifies the opportunities, value and implications for companies that look at machine learning in a holistic way. The results also reveal leading companies — called Fast Learners — that are already seeing substantial benefits from machine learning. These benefits span the entire organization and include increased profitability and revenues, greater competitive differentiation, and faster, more accurate and more cost-efficient processes.


Why Police Should Monitor Social Media to Prevent Crime

WIRED, Security, Author: Christopher Raleigh Christopher Raleigh Bousque


from

Until recently, companies like GeoFeedia, SnapTrends, and Media Sonar peddled their products from city to city, advertising their ability to prevent crimes and catch perpetrators. However, a 2016 report from the ACLU of California presented a major setback for these companies, revealing that cities were using their products to target words like “#blacklivesmatter” and “police brutality” following the killings of Michael Brown and Freddie Grey.

And the way police departments in these cities accessed this user data is reminiscent of practices that inspired outrage over Cambridge Analytica. Law enforcement agencies partnered with companies that gained access to backend data streams via APIs, a privilege that Facebook, Twitter, and Instagram revoked shortly thereafter.

Yet police departments have quietly continued to employ social media mining, partnering with a number of startups.


Statistical and Machine Learning forecasting methods: Concerns and ways forward

PLOS One; Spyros Makridakis, Evangelos Spiliotis and Vassilios Assimakopoulos


from

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.


How Natural Language Inference Models “Game” the Task of Learning

Medium, NYU Center for Data Science


from

Sam Bowman, Assistant Professor of Linguistics and Data Science, explores how state-of-the-art NLI models rely on annotation artifacts


Anaconda Slithers Into the Data Science Certification Space

GoCertify.com, Ed Tittel


from

On April 9, Austin, Texas-based Anaconda announced the Anaconda Data Science Certification. The company claims to offer the “most popular Python data science platform,” so this new program could generate a lot of interest.


As Scientists Speak Out About Science, Women and Young Scholars Lead the Way

The Chronicle of Higher Education, Leonard Cassuto


from

Loud noises are emanating from the laboratory these days, but they’re declamations, not explosions. This month scientists and other advocates for science assembled in cities around the country for the second annual March for Science. The organizers called on people to march for “a future where science is fully embraced in public life and policy.”

Researchers were once content to let their work speak for them. What changed? The politicization of science — partisan debates over climate change and evolution are two prominent examples — has worsened over time. Now scientists want to be heard in the public square.

“Lots of people out there are making reckless, wild claims about what is and isn’t true, and about science itself,” said Naomi Oreskes, a professor of the history of science at Harvard University. “It’s tremendously important for scientists to get out there and explain what they do — to everyone, beginning with school children.”


Pedro Domingos on the Arms Race in Artificial Intelligence

SPIEGEL ONLINE, Christoph Scheuermann and Bernhard Zand


from

In an interview, best-selling author and machine-learning expert Pedro Domingos discusses the global competition to take the lead in artificial intelligence, the advance of autocrats and the threats modern technology presents to Western democracies.


Selling to America: the radical makeover of Goldman Sachs

FT.com, Ben McLannahan


from

Bank’s loan book up tenfold since 2012 but, as consumers struggle to pay, will it be punished for piling in to credit so late?


Species around the world are in trouble, but we can help

Microsoft Green Blog, Bonnie Lei


from

Consider the Pacific Yew tree’s bark. It is now the source of a cancer drug, yet its medical properties were not discovered until 1977. More than that, we know precious little about the full picture of life on Earth. Scientists have observed and described only 1.5 million species of the estimated 10 million on earth. We should be trying to maintain the fullest, most diverse array of species on this planet, to maintain the most resilient overall ecosystem.

To accomplish this, we will need more people using technology – it’s how humanity has solved every major problem we’ve ever faced. Human ingenuity, paired with advanced technology like artificial intelligence, can help us close the knowledge gap, make new discoveries, and more effectively conserve and protect species.

That is our goal in supporting the work of more than 65 AI for Earth grantees across more than 20 countries, many of whom are working to address key threats to global biodiversity. Here are three of these heroes for Earth:

 
Events



Advances in Data Science, 2018

Manchester's Data Science Institute, Cathie Marsh Centre for Social Research


from

Manchester, England May 21-22. “A two-day meeting to present recent developments in data science.” It “will explore some of the ways that data science can support societal wellbeing.” Organized by Manchester’s Data Science Institute and the Cathie Marsh Center for Social Research.[$$$]

 
Deadlines



UPTEC School on the Future of Computing

Porto, Portugal June 25-29. Deadline to apply is April 29.

Data for Good Exchange 2018: Our Data for Good? – Preliminary Call for Papers, Posters, Demos & Workshops

New York, NY Event date is September 16. Deadline for submissions is July 8.
 
Tools & Resources



equalAIs – Empowering humans by subverting machines

MIT Media Lab


from

We believe there should be more public discourse around what choices we want to make as individuals and as a society regarding how our data and images and facial recognition will be used and more tools to make those choices and to express or enforce them. To that end, our work has multiple components, from a FOIA request for information about the federal government’s use of facial recognition to an open sourced technical prototype for changing photographs so that facial recognition systems can’t see faces in them.


Rebuilding Deepdrive in Unreal

DeepDrive


from

“I noticed early on that the problems in simulation and in the real-world were often similar, but the solutions very different. For example, in imitation learning, you often need to reduce sampling error when training and test data are not independently and identically distributed.”

“In real cars this is dealt with, by ‘pretending’ as if a left or right camera is actually a center camera, and that the car is on the right or left side of the lane.”


Validation and Verification of Machine Learning Detections using Tomnod

DeepCore SDK, Kevin McGee and Joe White


from

In our blog post entitled Discovering Pattern of Life Activity using Machine Learning, we described how the output from a machine learning algorithm can aid in characterizing human activity over time. We did this by counting all the objects from certain categories like planes, trains and automobiles and then view the results in aggregate. This technique gives a solid activity baseline and can be used with other data sources like economic, demographic or market information.

In order to do all of this, we need machine learning models that perform well in all geographic areas. Additionally, we need a generous amount of training data to cover all the challenges we encounter when performing machine learning on satellite imagery. One option to accomplish this would be to have an army of imagery analysts marking our imagery all the time for our areas and objects of interests. Obviously this does not scale well in terms of cost or time. At Radiant Solutions we asked the question, “what if we could get our machine learning model to train itself?” So that’s exactly what we did. With our crowdsourcing platform Tomnod, we’ve developed an end-to-end pipeline to produce highly accurate models, as efficiently as possible.

 
Careers


Postdocs

Postdoctoral researchers (2)



University College Dublin; Dublin, Ireland

Postdoc position in ML at the NIH/NIMH



National Institute of Mental Health; Bethesda, MD
Internships and other temporary positions

Communications and Design Intern



Data-Pop Alliance; Cambridge MA, New York City, or Mexico City
Full-time positions outside academia

GBDX Software Engineer



DigitalGlobe; Herndon, VA

Leave a Comment

Your email address will not be published.