Data Science newsletter – March 11, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for March 11, 2019

GROUP CURATION: N/A

 
 
Data Science News



AI in schools — here’s what we need to consider

The Conversation, Neha Shivhare


from

Maybe you heard the Canadian government has invested millions of dollars in AI research during the past few years and is emerging as one of the global leaders in AI research.

But did you know that some companies are developing AI for use in schools, for example in forms such as AI tutoring systems? Such systems can engage students in dialogue and provide feedback in subjects where they need extra help.

As an educational technology researcher, I am interested in how educators apply technological advancements. My concern is improving and facilitating education by holistically combining educational philosophy, psychology, sociology and technology.


Plagiarism-detection firm to be sold for US$1.7 billion

University World News, Chronicle of Higher Education, Beth McMurtrie


from

Stamping out student plagiarism is big business. How big? Worth US$1.735 billion, to be exact. That’s the price that Advance, a privately held media, communications, and technology company, will pay to purchase Turnitin, the 800-pound gorilla of plagiarism-detection services. Although not the largest ed-tech deal ever made, it is, in the words of one analyst, “massive”.


UI students unveil dog-human ‘translator’ at Engineering Open House

The News-Gazette (Champaign, IL), Julie Wurth


from

Dug the Dog has nothing on Alma.

Inspired by the adorable character from the movie “Up,” a team of University of Illinois engineering students created their own talking dog for this year’s Engineering Open House.

Yes, Alma is a real dog, who wears a special scarf fitted with electrodes attached to a system that can decode her brain waves and translate them into human speech.


Bay Area police try out controversial AI software

San Jose Mercury News, Bay Area News Group, Ethan Baron


from

Even the head of a Santa Cruz tech company that sells software to Bay Area police departments admits that using an algorithm to tell cops where and when to patrol raises a host of complicated issues.

With the promise of trying to predict crime before it happens, police departments across the United States are experimenting with artificial intelligence programs like the one from PredPol in Santa Cruz. It’s an evolution of the “hot-spot” crime maps police have been using for decades to guide their patrolling — with 21st century twists that opponents say can reinforce bias and make people less safe.


Five stories from five years of spaceborne precipitation measurements

NASA, Global Climate Change blog, Kasha Patel


from

On February 27, 2014, a Japanese rocket launched NASA’s latest satellite to advance how scientists study raindrops from space. The satellite, the Global Precipitation Measurement (GPM) Core Observatory, paints a picture of global precipitation every 30 minutes, with help from its other international satellite partners. It has provided innumerable insights into Earth’s precipitation patterns, severe storms, and the rain and snow particles within clouds. It has also helped farmers trying to increase crop yields, and aided researchers predicting the spread of fires.

In honor of GPM’s fifth anniversary, we’re highlighting some of our favorite and most unique Earth Observatory stories, as made possible by measurements taken by GPM.

The second wettest October in Texas ever


Predicting the ‘holy grail’ of climate forecasting: A new model and a new public dataset

Microsoft Research


from

It was crunch time, just as it had been many times before in the preceding weeks. Such is the nature of real-time competition. The yearlong Subseasonal Climate Forecast Rodeo was being sponsored by the Bureau of Reclamation and the National Oceanic and Atmospheric Administration, and teams were tasked with predicting temperature and precipitation in the Western United States for two time spans—two to four weeks out and four to six weeks out. Teams were required to submit their four predictions every two weeks.

Waiting until the day before the submission deadline, as they often did to base their predictions on the most up-to-date information possible, Microsoft Researcher Lester Mackey and his teammates pulled up their data source and found—nothing. The data wasn’t there.

It took some time to reach competition organizers, who told them the data had been moved and provided a new link, but any hope of an extension was extinguished upon making contact. Organizers reminded the group that missing and incomplete information is among the challenges facing real-world forecasters. (I mean, how often have you seen your local meteorologist skip a prediction because of insufficient information?) The disappearing act set the team back 12 hours.


Appen acquires Figure Eight for up to $300M, bringing two data annotation companies together

TechCrunch, Anthony Ha


from

Appen just announced that it’s acquiring Figure Eight in an all-cash deal that sees Appen paying $175 million upfront, with an additional payment of up to $125 million based on Figure Eight’s performance this year.

Both companies focus on using crowdsourced labor pools to annotate data, which in turn is used to train artificial intelligence and machine learning — for example, Figure Eight (formerly known as CrowdFlower and Dolores Labs) says its technology has been for everything from mapping to stock photography to scanning receipts for expense reports.

Appen, meanwhile, is a publicly-traded company headquartered in Sydney.


Why Data Science Teams Need Generalists, Not Specialists

Harvard Business Review, Eric Colson


from

… the goal of data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities. Algorithmic products and services like recommendations systems, client engagement bandits, style preference classification, size matching, fashion design systems, logistics optimizers, seasonal trend detection, and more can’t be designed up-front. They need to be learned. There are no blueprints to follow; these are novel capabilities with inherent uncertainty. Coefficients, models, model types, hyper parameters, all the elements you’ll need must be learned through experimentation, trial and error, and iteration. With pins, the learning and design are done up-front, before you make it. With data science, you learn as you go, not before you go.

In the pin factory, when learning comes first, we neither expect nor want the workers to improvise on any aspect the product, except to produce it more efficiently. Organizing by function makes sense since task specialization leads to process efficiencies and production consistency (no variations in the end product).

But when the product is still evolving and the goal is to learn, specialization hinders our goals in several ways.


One Step Closer to Deep Learning on Neuromorphic Hardware

The Next Platform, Michael Feldman


from

A group of researchers at Sandia National Laboratories have developed a tool that can cross-train standard convolutional neural networks (CNN) to a spiking neural model that can be used on neuromorphic processors. The researchers claim that the conversion will enable deep learning applications to take advantage of the much better energy efficiency of neuromorphic hardware, which are designed to mimic the way the biological neurons work.

The tool, known as Whetstone, works by adjusting artificial neuron behavior during the training phase to only activate when it reaches an appropriate threshold. As a result, neuron activation become a binary choice – either it spikes or it doesn’t. By doing so, Whetstone converts an artificial neural network into a spiking neural network. The tool does this by using an incremental “sharpening process” (hence Whetstone) through each network layer until the activation becomes discrete.


This professor is cleaning up tech’s ‘messy data’ problem

SiliconANGLE, Kylie Anderson


from

Madeleine Udell (pictured), assistant professor of operations research and information engineering at Cornell University, is educating a new era of technologists to decode this so-called “messy data” with a more effective approach to tech collaboration.

“Oftentimes people only learn about big, messy data when they go to industry,” Udell said. “I’m interested in understanding low dimensional structure in large, messy data sets [to] figure out ways of … making them seem cleaner, smaller and easier to work with.”

Udell spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the recent Stanford Women in Data Science event at Stanford University.


1 big thing: The precarious rental economy

Axios, Felix Salmon


from

Everything is a service, these days. Here’s investor Mikal Khoso:

Today you can rent living space flexibly based on your needs (AirBNB, Stoop), commute from that space without ever buying a car (Wheels, Uber, Lyft), rent clothes to fill your closet (Le Tote), rent specific appliances based on your needs (Joymode) and rent the furniture you fill your apartment with (Fernish).

Why it matters: There’s a lot to be worried about in this brave new world where our entire lives are dominated by rentiers, even if investors like Khoso have managed to persuade themselves that everybody wins.


How Memory Became Weaponized

Psychology Today, Matthew Huston


from

Emerging technology deployed on Internet platforms exploits our own mechanisms of memory to work against us. Both sanity and democracy are losing.


Porfiry

Fathom Info


from

Deconstructing documents, extracting entities, and making connections. Visualizing what’s already available about Mueller’s Trump/Russia investigation.


Facebook’s Privacy Cake

Stratechery, Ben Thompson


from

What Zuckerberg announced is quite believable, precisely because it makes perfect sense for Facebook: this is a privacy cake that Facebook can have — and eat it too.


Have We Reached Peak Big?

WIRED, Opinion, Michael J. Joyner


from

Going “Big” results either in underperformance or it comes under political or regulatory threat. That is a long way from the death of Big, but there are already hints as to how it could end.

The market may intervene, as appears to be the case with the A380.

The political process might also take care of it. The current political mess and gridlock doesn’t need to last forever. If and when the political class and plutocrats get concerned enough about the pitchforks coming out, maybe Big Tech will be broken up or regulated.

 
Events



Next-Generation Genomics

Nature Conferences


from

New York, NY August 12-13 at New York University. “This conference will feature advances in both computational and experimental methods, including single cell analysis, genetic engineering, population genetics, and statistical and deep learning.” [$$$}


2019 Annual Arthur M. Sackler Lecture presented by Rodney Brooks

The National Academies of Sciences, Engineering, and Medicine


from

Washington, DC March 13, starting at 6 p.m., National Academy of Sciences (2101 Constitution Ave NW). [free, registration required]


2019 CSforALL Summit

CSforAll


from

Salt Lake City, UT October 21-23 at the University of Utah. [save the date]


Space Debris Hackathon

SpaceApps NYC


from

New York, NY March 30-31 at Columbia University. “The theme is Low Earth Orbit space debris, with the option to work on software, hardware, or policy solutions.” Only Columbia University students can participate in this hackathon


Special CILVR Colloquium: Oriol Vinyals (DeepMind)

New York University, CILVR


from

New York, NY March 12, starting at 2 p.m. NYU auditorium 150 (60 5th Ave). Title: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II

 
Deadlines



LingHacks II Application

“Fill out this form by March 16 to grab a spot at LingHacks II, the second iteration of the world’s first computational linguistics hackathon for high school students! We’re so excited to have you join us from March 30-31, 2019 at 42 US in Fremont, California!”

DEED Workshop at 2019 SIGMOD

Amersterdam, Netherlands Data Management on End-to-End Machine Learning Workshop is June 30. Deadline for submissions is March 18.

Data Visualization Inventors, Founders, and Developers

In an effort to build a catalog of data visualization inventors, I’ve created this Google Sheet with a list of graphs (based on my Graphic Continuum project) and their inventors. I’m inviting you to help me fill in this list with more graphs, inventor names, and sources. I’ve filled in a few, but there’s a ways to go (and for some, I’ve just listed a random website or Wikipedia).

Workshop on Critical Data Science

“We invite submissions to the ICWSM Workshop on Critical Data Science, held on 11 June 2019 in Munich, Germany at the Thirteenth International AAAI Conference on Web and Social Media (ICWSM-2019).” Deadline for submissions is March 25.

Turing Senior AI Fellowships and Turing AI Fellowships

“Fellowships will cover a broad view of AI, including applications of foundational disciplines across mathematical sciences, statistical sciences, computational sciences and engineering. Consideration may also be given to AI work that interfaces with the life sciences, social sciences or humanities. Fellows will be expected to collaborate as appropriate across disciplines, and will have the opportunity to collaborate with academia, industry, government and the third sector.” Deadline for applications is March 29.

Nokia Bell Labs Prize | Bell Labs Prize

“The Bell Labs Prize is a competition for innovators from participating countries around the globe that seeks to recognize proposals that ‘change the game’ in the field of information and communications technologies by a factor of 10, and provides selected innovators the unique opportunity to collaborate with Bell Labs researchers to help realize their vision.” Deadline to enter is April 26.

Geohackweek 2019

Seattle, WA September 9-13 at University of Washington eScience Institute. Deadline to apply is May 1.

Apply for SNAPP Funding

Our annual call for proposals is now open until May 1, 2019. To apply, please refer to our Request for Proposals.

 
Tools & Resources



AllenNLP

The Allen Institute for Artificial Intelligence


from

AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop.


Launching TensorFlow Lite for Microcontrollers

Pete Warden's blog


from

I’ve been spending a lot of my time over the last year working on getting machine learning running on microcontrollers, and so it was great to finally start talking about it in public for the first time today at the TensorFlow Developer Summit. Even better, I was able to demonstrate TensorFlow Lite running on a Cortex M4 developer board, handling simple speech keyword recognition. I was nervous, especially with the noise of the auditorium to contend with, but I managed to get the little yellow LED to blink in response to my command! If you’re interested in trying it for yourself, the board is available for $15 from SparkFun with the sample code preloaded. For anyone who didn’t catch it, here are the notes from my talk.

 
Careers


Full-time positions outside academia

Machine Learning Engineer



Civis Analytics; Chicago, IL

Project Coordinator II



Fred Hutchinson Cancer Research Center; Seattle, WA

Frontend Engineer – Machine Intelligence Apps



Instabase; San Francisco, New York City, Boston

Leave a Comment

Your email address will not be published.