Data Science newsletter – December 5, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for December 5, 2017

GROUP CURATION: N/A

Data Science News

Twitter, Justine Sherry

from December 01, 2017

FlowingData, Nathan Yau

from December 04, 2017

Lidar, which is like radar but with lasers instead of radio waves, can provide high-detail surveys of the land. The state of Washington is using the tool for beautiful results.

Universities are also to blame for the GOP’s ‘grad student tax’

The Washington Post, Sarah Arveson

from November 29, 2017

Charging us tuition, only to waive it, helps to define us as students instead of the essential workers we are.

10 Things Everyone Hates About You – Exploring over 30 million feelings towards 3,000+ topics.

The Pudding, Amber Thomas and Russell Goldenberg

from December 04, 2017

If you’re looking for love, start by cleaning your bathroom. This advice may seem counterintuitive since, let’s be real, romantic pursuits are not typically at the mercy of commode cleanliness (we hope). But, according to hundreds of thousands of people in the US on the dating app Hater, dirty bathrooms are quite literally, the worst.

The new-age dating app is pretty simple: users react (with love or hate) to thousands of topics, like people who don’t signal, beards, and Beyoncé. The result is a matchmaking service based on mutual likes and dislikes. So, if you’re a hater of dirty bathrooms, you’re in good company. In fact, here are the things that we found are the most universally loved and hated throughout the US.

University Data Science News

Hanna Wallach of Microsoft Research was a co-chair of NIPS where keynote speaker Kate Crawford helped introduce a new focus on fairness, accuracy and transparency.

Karen Kelsky launched hashtag #MeTooPhD and an anonymous survey of people (1500 so far) in academia who have experienced sexual harassment. No names are named, but the survey results are public and it’s possible to search by institution and field. The upshot? Many women left academia altogether after being harassed by their supervisors and others have taken years to recover.

A Swedish thesis writer, Oona Lonstadt and her advisor, Peter Eklov have been found guilty of “misconduct in research” after publishing a paper that used fabricated data to suggest tiny particles of plastic in the Baltic Sea harm young fish. The paper in question has been retracted from Science and Eklov has expressed his embarassment; Lonstadt did not comment.

Stanford University’s AI100 group has constructed an AI index to measure progress in “speech recognition, question-answering and computer vision” across academia, industry, and open source communities.

The ACM has a new report on the state of autonomous vehicles, explaining that simulations of driving conditions are required to improve the technology’s ability to respond correctly to predictably unpredictable events like plastic bags blowing across the road, deer jumping in front of vehicles, and icy conditions that simultaneously cause glare and demand extra care.

Yale University has established an annual Day of Data. This year the “Day” was December 1, marking the fifth year for the event. Based on Yale’s own description the event tilts more toward “tell” than “show.”

Researchers Combat Gender and Racial Bias in Artificial Intelligence

Bloomberg Technology, Dina Bass and Ellen Huet

from December 04, 2017

When Timnit Gebru was a student at Stanford University’s prestigious Artificial Intelligence Lab, she ran a project that used Google Street View images of cars to determine the demographic makeup of towns and cities across the U.S. While the AI algorithms did a credible job of predicting income levels and political leanings in a given area, Gebru says her work was susceptible to bias—racial, gender, socio-economic. She was also horrified by a ProPublica report that found a computer program widely used to predict whether a criminal will re-offend discriminated against people of color.

So earlier this year, Gebru, 34, joined a Microsoft Corp. team called FATE—for Fairness, Accountability, Transparency and Ethics in AI. The program was set up three years ago to ferret out biases that creep into AI data and can skew results.

New Microsoft Research Podcast invites you to log on, tune in and geek out

Microsoft Research, Gretchen Huizinga

from December 04, 2017

The podcast gives Microsoft researchers a platform to share their work with a broad and curious audience, and aims for the perfect balance of professional and personal; technical and human. While the Microsoft Research Podcast is, first and foremost, about the research, it’s also an introduction to the researchers behind the projects. We find out who they are, what makes them tick, what inspires them, what gets them up in the morning, and what keeps them up at night. It’s not just the caffeine.

Every week, I talk to a new researcher and discover the stories behind the technology. The conversations are purposeful, but not scripted. Each interview reflects the unique nature of the subject matter and the person who has dedicated his or her life to it. Each interview reveals something surprising and amazing and each interview has a distinctly different flavor or theme.

How a “Robot Tax” Could Reduce Income Inequality

Kellogg Insight, Based on the research of Joao Guerreiro, Sergio Rebelo and Pedro Teles

from December 01, 2017

Earlier this year, Microsoft founder Bill Gates threw his support behind a controversial policy: a robot tax. As workers in many sectors are replaced by machines, the government is losing huge amounts of income tax revenue. Taxing the companies that employ robots, Gates reasoned, could help slow the pace of automation, and the revenue could be used to retrain employees.

Sergio Rebelo, a finance professor at the Kellogg School, had serious doubts when he heard Gates’s argument. For decades, economists have known that taxing so-called “intermediate goods”—goods that are used to make other goods, like the bricks used to build a house, or the robots used to manufacture cars—can make it harder for suppliers to create and sell their products.

“When you do that, you reduce the level of production in the economy,” Rebelo says.

Facebook showcases latest research at NIPS 2017

Facebook Research

from November 30, 2017

“Machine Learning and AI experts from around the world will gather in Long Beach, CA, next week at NIPS 2017 to present the latest advances in machine learning and computational neuroscience. Research from Facebook will be presented in 10 peer-reviewed publications and posters.”

[1712.00409] Deep Learning Scaling is Predictable, Empirically

arXiv, Computer Science > Learning; Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

from December 01, 2017

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.
This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents—the “steepness” of the learning curve—yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Two elements of winning machine learning companies

Machine Learnings, Satya Patel

from November 27, 2017

So where’s the opportunity for startups? At Homebrew, we’ve focused on ML/AI investing in companies that have two key elements: 1) data that is largely hidden from the incumbents and 2) cost effective, scalable win-the-market strategies (plain old go-to-market ain’t good enough!). Our view is the combination of these two elements leads to long term advantage and a competitive moat that protects against both the incumbents and other startups that enter the market.

Finding data pools that are black boxes to incumbents usually means targeting verticals or use cases where the larger players don’t have access, experience or focus.

Events

NSF Lecture- Soft Materials Research in the Era of Machine Learning » CCC Blog

National Science Foundation, CCC Blog

from December 11, 2017

Alexandria, VA Professor Juan de Pablo from the Institute for Molecular Engineering at University of Chicago will be giving a lecture on Soft Materials Research in the Era of Machine Learning at the National Science Foundation on Monday, December 11 at 2:00 p.m.

Tools & Resources

Peyton: Throne.ai API

GitHub – Throne-AI

from December 03, 2017

Peyton is a Python library that allows you to interact with the Throne.ai platform for sports prediction. From this library, you can download historical and competition data, user ranking data, and submit predictions.

Amazon Comprehend

Amazon AWS

from December 01, 2017

“Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; and automatically organizes a collection of text files by topic.”

Introducing Binder 2.0 — share your interactive research environment

eLife, Labs, Project Jupyter

from November 30, 2017

“Binder allows researchers to quickly create the computational environment needed to interact with research code and data shared online. To interact with someone else’s work, you simply click a URL and are taken directly to a live environment where you can run the code in the cloud.”

Using Artificial Intelligence to Augment Human Intelligence

Distill, Shan Carter and Michael Neilsen

from December 04, 2017

“By creating user interfaces which let us work with the representations inside machine learning models, we can give people new tools for reasoning.”

Computational Optimal Transport

Gabriel Peyré

from December 04, 2017

“This website host the book project Computational Optimal Transport. You will also find slides and computational resources”

Amazon launches AWS Cloud9, a browser-based IDE for cloud developers

TechCrunch, Frederic Lardinois

from November 30, 2017

“AWS today launched a new browser-based IDE, AWS Cloud9, at its re:Invent conference in Las Vegas. Cloud9 is actually an IDE the company acquired last year. It’s based on the Ace Editor and the work Cloud9 did on its c9.io IDE.”

Fireside Chat with Simons Institute Director Dick Karp

YouTube, Simons Institute

from November 27, 2017

Samir Khuller from University of Maryland interviews Dick Karp. [video, 37:47]

Careers

Internships and other temporary positions

Facebook AI Research Residency Program

Facebook Research

AI Researcher applications for FDL 2018 are now open.

NASA Frontier Development Lab

Internship, Natural Language Processing Researcher

BenevolentAI; New York, NY

Tenured and tenure track faculty positions

Computer Science

Wellesley College; Wellesley, MA

Sports.BradStenger.com

Data Science newsletter – December 5, 2017

Leave a Comment Cancel reply