Data Science newsletter – February 6, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for February 6, 2018

GROUP CURATION: N/A

Data Science News

Hate your boss? B12 is designing work without (human) managers

TechCrunch, Danny Crichton

from February 04, 2018

Using research around a concept called flash teams, New York-based B12 is looking to solve the team-based labor marketplace challenge, all the while opening its core infrastructure to the public as an open source project known as Orchestra. The company, the brainchild of CEO Nitesh Banta and CTO Adam Marcus, is targeting website creation and management as its first vertical.

The concept of flash teams originated in the work of a group of computer science researchers at the Stanford HCI Group. The goal of their original 2014 paper was to identify the challenges in using algorithms to build teams, including how to evaluate experts, construct teams with complementary skillsets, and handle organizational behavior to ensure that work was completed satisfactorily.

Teaching machine learning at University of Edmonton

Macleans.ca, Catherine McIntyre

from February 02, 2018

Once known as a mining and energy town, Edmonton is now one of the world’s leading centres for a decidedly different industry—machine learning. The department of computer science at the University of Alberta is playing a leading role in building the city’s high-tech reputation.

,br/>
A particularly strong suit of the faculty is their artiﬁcial intelligence offering. With new money and resources flowing into the area, it’s poised to get even stronger.

In March 2017, the federal government earmarked $125 million for AI training at three Canadian institutions, including the U of A. In July, Google’s artiﬁcial intelligence ﬁrm DeepMind opened a new lab—its second in the world—in Edmonton, which is run in partnership with the school. The facility offers computer science students and grads an opportunity to intern and hone their AI skills, while luring new talent to the university.

The government may just be discovering U of A’s artiﬁcial intelligence prowess, but it’s something Richard Sutton has known for a while.

University study improves interfaces between brain, computers

Brown Daily Herald, Arjun Ganga

from February 04, 2018

The integration of man and machine may seem like science fiction, but the well-funded scientific movement aiming to restore functional independence to patients through the integration of computers is already a reality. A recent University study as part of the BrainGate2 clinical trial builds on this movement and improves the calibration system between brain and machine.

For those with conditions like ALS, brain stem stroke or limb loss, movements that many take for granted can be extremely difficult or even impossible. Brain-computer interfaces, or BCIs, aim to help those with such ailments by translating brain cell signaling into the click of a cursor, the selection of letters on a screen or even the movement of robotic limbs.

One of the challenges involved in the real-time translation of brain activity is syncing an external device to align with the signals of the brain. This complex calibration process involves machine learning — a set of elaborate algorithms — and often a technician, but the recent University study has sped things up.

Here’s Why Alexa Won’t Light Up During Amazon Super Bowl Ad

Bloomberg Technology, Brad Stone

from February 02, 2018

The word “Alexa” is uttered 10 times during the Super Bowl spot, but thankfully, the Amazon Echo in your living room isn’t going to perk up and try to respond. An Amazon spokeswoman is guarded about explaining exactly why, saying only, “We do alter our Alexa advertisements … to minimize Echo devices falsely responding in customer’s homes.”

Bezos and company have evidently been thinking about this problem for a long time, before the Echo was even introduced. A September 2014 Amazon patent titled “Audible command filtering” describes techniques to prevent Alexa from waking up “as part of a broadcast watched by a large population (such as during a popular sporting event),” annoying customers and overloading Amazon’s servers with millions of simultaneous requests.

Carnegie Mellon’s clear view on 5G cloudlets

OpenStack Superuser blog

from January 25, 2018

Emerging edge cloud services like smart city applications have suffered due to lack of acceptable APIs and services that can dynamically provision and deliver services through provider edges. Except for caching services widely used by providers, due to economy of scale in cloud and few low traffic gaming apps there is little to cheer about.

A talk about the future of edge at the OpenStack Summit in Sydney was itself something of an evolving situation, with the originally scheduled for presentation by Prakash Ramchandran, Rolf Schuster, and Narinder Gupta, ended up presented by Mark Shuttleworth with Canonical and Joseph Wang from Inwinstack due to none of the original presenters able to travel due to visa problems.

The good news: it’s only a matter of time before there’s another generational wave of the radio frequency spectrum as the industry moves to 5G. “That will create a new set of possibilities, but it’s also going to create a huge amount of cost,” said Shuttleworth. “It’s going to be very expensive to deploy that next generation radio frequency.” 5G will enable very high speed, very low latency communication.

What Does the ‘Electrification of Everything’ Look Like in America? NREL Plans to Find Out

Greentech Media, Justin Gerdes

from February 17, 2018

While NREL has conducted extensive research on the supply-side implications of rapid deployment of renewable energy in the U.S. power system, it had not yet studied the demand side: extensive electrification of end-use technologies such as light-, medium- and heavy-duty vehicles and residential and commercial heat pumps, said Trieu Mai, NREL senior researcher and leader of the study.

The move to electrification, especially the adoption of electric vehicles, constituted “a hole in the research that we recognized,” said Mai.

The project’s first report, released last week in conjunction with the launch of Electrification Futures Study, compiles cost and performance projections for end-use electric technologies through 2050. The report is considered foundational, said Mai, because it provides the long-term assumptions for technologies to be studied more extensively throughout the series.

If Workers Slack Off, the Wristband Will Know. (And Amazon Has a Patent for It.)

The New York Times, Ceylan Yeginsu

from February 01, 2018

What if your employer made you wear a wristband that tracked your every move, and that even nudged you via vibrations when it judged that you were doing something wrong?

What if your supervisor could identify every time you paused to scratch or fidget, and for how long you took a bathroom break?

What may sound like dystopian fiction could become a reality for Amazon warehouse workers around the world. The company has won two patents for such a wristband, though it was unclear if Amazon planned to actually manufacture the tracking device and have employees wear it.

Shifts in marine microbial populations detected using statistical machine learning

University of Washington, eScience Institute

from February 01, 2018

Oceanography is currently witnessing a data explosion, with new instruments being deployed that collect high frequency, multi-parameter datasets, both remotely, with satellites, and in situ, with new instruments such as the SeaFlow cytometer. As a result, oceanographers have begun to use large-scale statistical machine learning tools to analyze these substantial datasets, thereby accelerating scientific discovery in marine microbial ecology.

The SeaFlow cytometer continuously profiles microbial populations across thousands of kilometers of the ocean surface during research cruises, which can last several weeks. In contrast to conventional flow cytometers that sit on laboratory benches running a single experiment at a time, SeaFlow allows one to seamlessly aggregate samples across multiple data collection campaigns, thus facilitating the identification of regions showing coherent features in microbial populations.

Sequoia Capital plans $8 billion global fund, eyes China investors: sources

Reuters, Julie Zhu and Kane Wu

from January 31, 2018

Sequoia Capital, an early investor in global tech behemoths like Google Inc and Apple Inc, aims to raise up to $8 billion in its largest-ever fundraising and has set sights on Chinese investors, people familiar with the plan told Reuters.

More cash in the bag would help the Silicon Valley venture capital giant diversify its focus from early-and growth-stage investments to pre-IPO funding rounds, at a time when startup valuations are spiking partly due to money pouring in from SoftBank Group Corp’s $93 billion Vision Fund.

“There is so much money now (in the tech sector). You need to have a bigger war chest,” said a Hong Kong-based investment banker familiar with Sequoia’s strategy.

Intelligence agency specializes in phones, computer exploitation

USA Today, Associated Press, Jim Michaels

from January 31, 2018

Smartphones helped terror organizations grow and communicate. Now the devices are contributing to their downfall.

In a nondescript, highly secured building in this Washington suburb, a group of U.S. government technicians and linguists are downloading massive amounts of data from phones, hard drives, CDs and other devices, providing a huge boost to the U.S. intelligence community as it hunts terrorists.

Many of the devices have been captured from battlefields in Iraq and Syria, where the Islamic State has lost virtually all the territory it captured in 2014.

AI4All, created by Google Cloud’s Fei-Fei Li, is pairing tech workers and high school students on AI projects

VentureBeat, Khari Johnson

from February 03, 2018

AI4All, an organization funded by Melinda Gates and Nvidia founder Jensen Huang, launches its first-ever mentorship program today at Oakstop, a coworking space in Oakland, California.

The program will join tech workers from companies like OpenAI, IBM, Ford, and Accenture with high school students underrepresented in AI to work on projects that apply machine learning to solve problems for the benefit of humanity.

AI4All was established by a group that includes Fei-Fei Li, Google Cloud chief scientist and director of Stanford’s AI Lab and Vision Lab, whose passion for democratization of AI was made a central part of the launch of Google’s second-generation tensor processing unit.

The New Dating Requirement: Consuming All of Your Partner’s #Content

The Daily Beast, Taylor Lorenz

from February 02, 2018

‘Be respectful and watch my [Instagram] Story,’ one 20-something demanded. Her relationship prerequisite isn’t unique—but when stories expire in 24 hours, it’s not easy, either.

Why Open Algorithms Matter in the Social Sector

Datassist, Heather Krause

from January 31, 2018

Social sector organizations are often under intense pressure to make decisions more effectively and efficiently. The demand for evidence-based decision-making — and programming — is ever rising. And with good reason. But as we automate more and more decisions, the importance of using open algorithms can’t be understated.

Why do open algorithms matter so much?

Algorithms used to make social sector decisions must be handled very differently than those employed in the corporate sector. If they’re not, the teams using them run the risk of hurting the very people they want to help.

Scientific Publishing in the Digital Age | ASAPbio

ASAPbio, Bodo M. Stern and Erin K. O’Shea

from January 30, 2018

Life scientists feel increasing pressure to publish in high-profile journals as they compete for jobs and funding. While academic institutions and funders are often complicit in equating journal placement with impact as they make hiring and funding decisions, we argue that one of the root causes of this practice is the very structure of scientific publishing. In particular, the tight and nontransparent link between peer review and a journal’s decision to publish a given article leaves this decision, and resulting journal-specific metrics like the impact factor, as the predominant indicators of quality and impact for the published scientific work. As a remedy, we propose several steps that would dissociate the appraisal of a paper’s quality and impact from the decision to publish it. First, publish peer reviews, whether anonymously or with attribution, to make the publishing process more transparent. Second, transfer the publishing decision from the editor to the author, removing the notion that publication itself is a quality-defining step. And third, attach robust post-publication evaluations to papers to create proxies for quality that are article-specific, that capture long-term impact, and that are more meaningful than current journal-based metrics. These proposed changes would replace publishing practices developed for the print era, when quality control and publication needed to be integrated, with digital-era practices whose goal is transparent, peer-mediated improvement and post-publication appraisal of scientific articles.

In Science, There Should Be a Prize for Second Place –

The Atlantic, Ed Yong

from February 01, 2018

Some scientific journals are defusing the fear of getting “scooped” by making it easier for scientists to publish results that have appeared elsewhere.

The AI superstars at Google, Facebook, Apple—they all studied under this guy

Toronto Life magazine, Katrina Onstad

from February 04, 2018

Geoffrey Hinton spent 30 years hammering away at an idea most other scientists dismissed as nonsense. Then, one day in 2012, he was proven right. Canada’s most influential thinker in the field of artificial intelligence is far too classy to say I told you so

Events

AISTATS 2018 Accepted Papers

AISTATS 2018

from April 09, 2018

Playa Blanca, Lanzarote, Canary Islands April 9-11. “Since its inception in 1985, AISTATS has been an interdisciplinary gathering of researchers at the intersection of artificial intelligence, machine learning, statistics, and related areas.” [$$$]

R for Humans, Ensemble Modeling, and Deep learning (with Keras)

Chicago R User Group

from February 15, 2018

Chicago, IL February 15, starting at 5:30 p.m., IBM Innovation Center (71 S. Wacker Dr.) [rsvp required]

2018 Infinite Possibilities Conference

Mathematical Sciences Research Institute

from April 14, 2018

Washington, DC April 14-15, 2018 at Howard University. ” The Infinite Possibilities Conference (IPC) is a national conference that is designed to promote, educate, encourage and support women of color interested in mathematics and statistics, as a step towards addressing the underrepresentation of African-Americans, Latinas, Native Americans, and Pacific Islanders in these fields.” [$$]

DataDrivenAg

Donald Danforth Plant Science Center

from February 26, 2018

Arlington, VA February 26-27. “Powered by the NIFA FACT Initiative, a symposium and focused data workshop or “hackathon” gathering experts in plant breeding, genomics, remote sensing and analytics will create a community demonstration of the power of these large and comprehensive datasets to benefit agriculture.” [$$]

Designing a New City Operating System Embracing Technology and Remaking Local Government

NYU Marron Institute of Urban Management

from February 27, 2018

New York, NY Tuesday, February 27, starting at 5 p.m., NYU Wagner (295 Lafayette St, 2nd floor). “Join NYU professor Neil Kleiman and Harvard professor Stephen Goldsmith as they discuss their new book, A New City O/S: The Power of Open, Collaborative, and Distributed Governance.” [rsvp requested]

Deadlines

3rd Symposium on the Structure and Mobility of Crime

Paris, France June 12, a NetSci2018 Satellite workshop. Deadline to submit abstracts is March 3.

Tools & Resources

An unbiased approach for sifting through big data

Hokkaido University (Japan)

from February 03, 2018

Researchers have developed a complex system model to evaluate the health of populations in some U.S. cities based only on the most significant variables expressed in available data. Their unbiased network-based probabilistic approach to mine big data could be used to assess other complex systems, such as ranking universities or evaluating ocean sustainability. … Matteo Convertino of Hokkaido University in Japan and Joseph Servadio of the University of Minnesota in the U.S. have introduced a novel probabilistic method that allows the visualization of the relationships between variables in big data for complex systems. The approach is based on “maximum transfer entropy”, which probabilistically measures the strength of relationships between multiple variables over time.

Feature Launch: Kaggle API

Kaggle

from February 01, 2018

“We’re excited to announce the beta release of the official Kaggle API! The API is made accessible using a command line interface (CLI) implemented in Python. You can now use the CLI to do things like submitting to competitions and downloading datasets. For instructions on using the CLI go to https://github.com/Kaggle/kaggle-api.”

[1801.10247] FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling

arXiv, Computer Science > Learning; Jie Chen, Tengfei Ma, Cao Xiao

from January 30, 2018

The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work—FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.

Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid and Pinot

Medium, Roman Leventov

from February 01, 2018

In this post I want to compare ClickHouse, Druid and Pinot, the three open source data stores that run analytical queries over big volumes of data with interactive latencies.

Careers

Full-time positions outside academia

Applied research positions: ML for finance

ISI Foundation; Torino, Italy

Product Manager

B12; New York, NY

Full-time, non-tenured academic positions

Research Scientific Programmer/Analyst

Boston University; Boston, MA

Library Carpentry Coordinator

Data Carpentry and University of California, California Digital Library; Oakland, CA

Open Educational Resources Coordinator

University of Houston; Houston, TX

Sports.BradStenger.com

Data Science newsletter – February 6, 2018

Leave a Comment Cancel reply