Data Science newsletter – June 8, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 8, 2017

GROUP CURATION: N/A

 
 
Data Science News



Be Careful Celebrating Google’s New Ad Blocker. Here’s What’s Really Going On.

The Intercept, David Dayen


from

The company announced that it would establish an ad blocker for the Chrome web browser, which has become the most popular in America, employed by nearly half of the nation’s web users. The ad blocker — which Google is calling a “filter” — would roll out next year, and would be the default setting for Chrome when fully functional. In other words, the normal user sparking up their Chrome browser simply wouldn’t see the ads blocked by the system.

What ads would get blocked? The ones not sold by Google, for the most part.


Human vs Machine: Five epic fights against AI

New Scientist, Feature, Sean O'Neill


from

The next frontier for AI in games is Starcraft 2, a space-war strategy game played in real time. Why? Its sheer complexity. Players harvest resources, build armies and fight on virtual terrain filled with pinch-points, alleys and strategic high ground. Gone are the neat grids and prescribed moves of chess or Go – Starcraft is chaos. The subtlety of Starcraft is enormous, as we explored back in 2010. That’s why, seven years later, if you ask the co-founder and CEO of DeepMind, Demis Hassabis, what his next target is after Go, he’ll tell you it’s Starcraft.


AI summit aims to help world’s poorest

Nature News & Comment, Declan Butler


from

In the world’s wealthiest neighbourhoods, artificial intelligence (AI) systems are starting to steer self-driving cars down the streets, and homeowners are giving orders to their smart voice-controlled speakers. But the AI revolution has yet to offer much help to the 3 billion people globally who live in poverty.

That discrepancy lies at the heart of a meeting in Geneva, Switzerland, on 7–9 June, grandly titled the AI for Good Global Summit. The meeting of United Nations agencies, AI experts, policymakers and industrialists will discuss how AI and robotics might be guided to address humanity’s most enduring problems, such as poverty, malnutrition and inequality.

Development agencies are buzzing with ideas, although only a few have reached the stage of pilot experiments. But scientists caution that the rise of AI will also bring societal disruption that will be hard to foresee or manage, and that could harm the world’s most disadvantaged.


Company Data Science News

Uber continues on what appears to be a downward spiral, losing top Kalanick-confidant and Chief Business Officer Emil Michael to resignation over the weekend. Michael follows a wave of departures by ten other top executives and the firing of 20 employees due to inappropriate behavior of one sort or another. Uber’s board, led by Eric Holder, Jr., will release its recommendations in a 7-hour meeting Sunday. Travis Kalanick was asked to take a leave of absence. Uber is a lesson that a great idea plus talented staff is necessary but not sufficient for long term success. Good management is mandatory.

Lyft will partner with Boston-based self-driving car company Nutonomy to start piloting self-driving vehicles in Boston. Watch out, Uber. Lyft is not going to waste any time charging forward.

Obligatory link to Apple’s WWDC keynote speech. Spoiler: the company rejected the name iAlexa and went with HomePod.

Google is planning a large new site in San Jose that will reportedly take up 6 million square feet of office space near Caltrain’s Diridon Station.

Google’s new ad filter for its Chrome browser will only block unacceptably annoying ads. Who decides what is unacceptably annoying? Google and Facebook, who are the largest members of the Coalition for Better Ads. Chrome users will still see all sorts of (acceptably) annoying ads because that’s the attention toll we pay to surf the web.

Demis Hassabis wants AI to take on StarCraft now that it’s routinely cleaning up on Go. Juxtaposing Google’s ad filter story and Hassabis’s goals for the year is why all the cool kids want to work for DeepMind.

JP Morgan released a fat report about the future of machine learning in finance. The report emphasizes all the people they will need to hire to keep themselves relevant, so if you want to work in finance, make sure you have stats and programming skills.

Facebook has shrunk “the training time of a ResNet-50 deep learning model on ImageNet from 29 hours to one.”

Publon, a company that “allows scientists to track and showcase their peer-reviewing contributions” in order to try to incentivize scientists to keep producing excellent reviews was just bought by Clarivate Analytics, which runs Web of Science. “More than 150,000 researchers have signed up, writing more than 800,000 reviews,” according to The Economist. It still feels wrong that a for-profit company is minting so much money off the unpaid labor of authors and peer reviewers. I know it is not costless to provide the infrastructure and archival work to run journals, but I think it would be possible to run this system on a non-profit model.


San Jose confirms talks with Google for massive downtown site

Silicon Valley Business Journal, Janice Bitters


from

San Jose officials have confirmed longtime rumors that Google is eyeing a massive landing around the Diridon Station area in downtown, where it could take up to 6 million square feet in office and R&D space, making it the company’s largest campus in the country.

The Diridon Station area is envisioned by the city to be a huge, mixed-use development area surrounding several major transportation investments, including the addition of high-speed rail and electrifying the existing Caltrain route, an already popular transit mode that runs north and south along the Peninsula. Ultimately the area is expected to include publicly accessible plazas, ground-level retail, and a greenbelt and park along the Los Gatos Creek.


Apple Just Joined Tech’s Great Race to Democratize AI

WIRED, Business, Tom Simonite


from

Apple’s iMac updates and new HomePod speaker drew most of the attention at the company’s World Wide Developers keynote. But tucked away in the middle were a short few minutes in which software chief Craig Federighi casually launched Apple into one of the tech industry’s fiercest competitions– the contest to help developers build the next generation of AI-powered applications.

Federighi had already spent some time boasting how Apple was using machine learning to do things like make Siri smarter, or help users rediscover old memories in the Moments photo app. Then he announced that Apple would also be gifting some AI superpowers to developers.

“We want to make powerful machine learning easy for you to incorporate in your apps,” he said. Federighi announced new APIs that help coders building apps for Apple devices do things like recognize faces or animals in photos, or parse the meaning of text.


Kids, AI devices, and intelligent toys

Medium, MIT MEDIA LAB, Stefania Druga and Randi Williams


from

The dichotomy between machines and living things is narrowing. Today, artificial intelligence (AI) is embedded in all kinds of technology, from robots to social networks. This affects the youngest among us as we see the emergence of an “Internet of Toys.” The trend is what prompted us to explore the impact of those “smart,” interconnected playthings on children. We’ll present our paper, “Hey Google, is it OK if I eat you?: Initial Explorations in Child-Agent Interaction,” at the Interaction Design and Children conference at Stanford University on June 27. This blog post provides a preview of our findings.


Human and computer vision unite to help Microsoft engineers stop traffic deaths

GeekWire, Lisa Stiffler


from

In the U.S., car crashes kill roughly 40,000 people a year. That’s a loss of life equivalent to four to five 737 airplanes crashing every week.

“In no other setting would this level of fatality go unnoticed,” said Franz Loewenherz, principal transportation planner for the city of Bellevue, Wash.

What makes it even worse is “a lot of these are preventable crashes,” he said. “If we have the data, we can make smart choices about how to deal with this.”

So Loewenherz and the city have teamed up with Microsoft and the University of Washington to form the Video Analytics Towards Vision Zero Partnership. Their plan is to generate that data and use it to improve roadways and prevent collisions between cars, trucks, bikes and pedestrians. Their ultimate goal is eliminating traffic deaths and serious injuries, a target referred to as “vision zero.”


NASA Re-Embraces the ‘Worm,’ Its Retro Cool Retired Logo, for New Merchandise

Space.com, Robert Z. Pearlman


from

The “worm” is back in at NASA — sort of.

Twenty-five years after the space agency unceremoniously retired the worm, its simple but iconic logotype, in favor of returning to its original 1958 graphic insignia, the retro logo has fallen back into NASA’s good graces, at least when it comes to its use on merchandise.


Activists Build Human Rights Abuse Cases With Help From Cellphone Videos

NPR, Joe Palca


from

A Syrian-born human rights lawyer in Washington, D.C., is collecting those videos, hoping someday they will be used to build criminal cases against the perpetrators of the violence.

But he also faces a major problem: The volume of videos is staggering.

“We have 600,000 videos, and we’re in the process of downloading almost 2,000 videos a day,” says Mohammad Al Abdallah, executive director of the Syria Justice and Accountability Centre, a nonprofit supported by the State Department and a handful of European governments.


When Health Technologists Meet Emergency Medicine Residents

Huffington Post, Kiyan Rajabi


from

I don’t know how many hundreds of technology challenges I’ve heard from doctors over the years. Among those, having the sufficient data they need to tailor individualized treatment plans and to prevent delayed diagnosis are among pain points most frequently mentioned.

Last May, Health Tech Masters Students at Cornell Tech and Emergency Medicine residents at NY-Presbyterian aimed to bridge the gap between technologists and clinicians by hosting a first-ever conference of its kind: one that emphasizes the use of mobile devices to conduct medical research.

The ResearchSuite Conference, organized by chief emergency medicine resident Mark Shankar, took place at NY-Presbytarian Hospital in New York City and brought together over 50 physicians and engineers passionate about the emergence of mobile health for medical research studies. The gathering explored the use of ResearchSuite in this collaboration, referring to the suite of tools available to create applications for medical research.


Ohio Supercomputer Center runs largest scale calculation ever

Ohio Supercomputer Center


from

The Ohio Supercomputer Center recently displayed the power of its new Owens Cluster by running the single-largest scale calculation in the Center’s history.

Scientel IT Corp used 16,800 cores of the Owens Cluster on May 24 to test database software optimized to run on supercomputer systems. The seamless run created 1.25 Terabytes of synthetic data.


Data Frontiers: Subjectivity, Sentiment, and Sense

Brandwatch blog, Seth Grimes


from

The secret of advanced social insight is that there is no secret.

The data, analytics, solutions, and guidance you need are available and accessible, whatever your industry. Your challenge is to choose the right tools – the right methods and sources and of course the right software – for the job at hand.

That is no small caveat, however. Success is correlated with subject-matter expertise and analysis judgment and an open-to-exploration attitude.

Given the Brandwatch context for this article, you very likely have strong social analysis at your disposal – that’s my take-away from a look at
and the Audiences product at the recent – so we’ll focus on key elements that distinguish advanced social insight tech from foundational varieties: the ability to cope with the complexity of written (and spoken) language and to quantify and exploit sentiment, opinion, and emotion.


J.P.Morgan’s massive guide to machine learning and big data jobs in finance

efinancialcareers, Sarah Butcher


from

J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.

Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.


CECOM Uses Operations Research to Save Lives, Millions and Resources

INFORMS


from

One of the U.S. Army’s biggest challenges is time. Transporting technical experts to and from combat zones is burdensome and can inhibit the overall success of the mission. At the same time, because technical experts face the same battlefield risks as soldiers, reducing the need for technical personnel on the battlefield puts fewer soldiers at risk.

CECOM has developed a suite of systems, named Virtual Logistics Assistance Representative (VLAR), to maximize self-reliance at combat outposts. The heart and underpinning of VLAR is a set of operations research (O.R.) methods and CECOM’s Causal Bayesian belief networks.


The course of the future – and the technology behind it – Jupyter Notebooks powering Berkeley’s data science curriculum

Berkeley Data Science


from

To catch a glimpse of the college course of the future, take a look at UC Berkeley’s Foundations of Data Science (Data 8), the pioneering course that introduces statistics, computer science and data analysis to lower-division students of all majors.

During lecture, students can write and edit code on their laptop computers in real time, working on the same data set and code that the professor is discussing in class. Learning in the class is active, and interactive: Students can try their own calculations, using large data sets that are available at the students’ fingertips, and follow up with questions to the professor.

Underlying these classroom innovations are Jupyter Notebooks (link is external), a web-based platform through which students can work on projects and submit homework assignments and labs.


Mining the Data Mother Lode

University of Pennsylvania, Penn Medicine News


from

A mother posting on Facebook about the way her son behaves while playing video games could provide a vital clue for the correct treatment for his epilepsy. This is but one type of social media chatter that is informing data scientists at Penn Medicine’s Health Language Processing Lab (HLP). One of the newest entities with the Penn Institute for Biomedical Informatics, HLP combines social media content with other sources of health information in a unique way aimed at understanding how people use language to communicate health needs.

All sorts of groups across Penn Medicine are harnessing data contained in electronic health records (EHRs) and social media to help improve outcomes. The Abramson Cancer Center, for instance, uses lab tests, radiology visits, and patient-reported symptoms to help lung cancer patients avoid the ER visits. Departments across the health system work with data scientists to use finely tuned algorithms to detect complications or underlying health conditions earlier in the continuum of care, and the Center for Digital Health focuses its efforts on how social media intersects with health care, working to determine how posts might help providers detect health problems before urgent care is needed, or even how Twitter might play a role in fighting HIV.


How Can We Optimize AI for the Greatest Good, Instead of Profit?

MIT Technology Review, Jamie Condliffe


from

How can we ensure that artificial intelligence provides the greatest benefit to all of humanity?

By that, we don’t necessarily mean to ask how we create AIs with a sense of justice. That’s important, of course—but a lot of time is already spent weighing the ethical quandaries of artificial intelligence. How do we ensure that systems trained on existing data aren’t imbued with human ideological biases that discriminate against users? Can we trust AI doctors to correctly identify health problems in medical scans if they can’t explain what they see? And how should we teach driverless cars to behave in the event of an accident?

The thing is, all of those questions contain an implicit assumption: that artificial intelligence is already being put to use in, for instance, the workplaces, hospitals, and cars that we all use. While that might be increasingly true in the wealthy West, it’s certainly not the case for billions of people in poorer parts of the world. To that end, United Nations agencies, AI experts, policymakers and businesses have gathered in Geneva, Switzerland, for a three-day summit called AI for Good. The aim: “to evaluate the opportunities presented by AI, ensuring that AI benefits all of humanity.”


DeepLearning10: The 8x NVIDIA GTX 1080 Ti GPU Monster (Part 1)

SVH, Patrick Kennedy


from

We have done a number of smaller builds including DeepLearning01 and DeepLearning02 that we published. While those builds were focused on an introductory, getting your feet wet with frameworks, today we have a different monster, DeepLearning10. This is an 8x NVIDIA GTX 1080 Ti GPU beast and is relatively affordable. We will note that as we are writing the DeepLearning10 piece, DeepLearning11 (a second 8x GPU system) is already up and running half populated with GPUs. As we have seen these systems deployed in Silicon Valley data centers, they are rarely one-offs and are more often clustered.


Stan Boland: The British tech guru revving up to take on Uber in driverless cars

The Telegraph (UK), James Ashton


from

Rather than remote-controlling drivers all over the country, Boland’s venture, FiveAI, based in Cambridge and Bristol, will be completely driverless. It is the Holy Grail of motoring that every car maker and technology company including Uber is racing to crack. He expects to be testing on the streets of London in the second half of 2019.

And which part of the capital stands ready to become this new frontier land? “Certainly Croydon is a possibility, and so is Bromley.” He simply couldn’t resist the challenge. “It is probably the most exciting thing happening in engineering and computer science at the moment. It is the application of artificial intelligence and machine learning to solve real world problems,” he says.


Government Data Science News

The new Emmanuel Macron administration is offering grants of up to 1.5m Euros for senior researchers, 1m Euros for junior researchers for climate scientists to move to France. The tagline: Make our planet great again.

18F, an office within the General Services Administration (GSA) of the US federal government, is partnering with procurement officers in Alaska to write their “vision statement, product road map, and statement of work” in an open GitHub repo.

NSF has abruptly discontinued Doctoral Dissertation Improvement Grants for ecologists and biologists. Program officers receive so many DDIG applications that they run out of time for everything else they are expected to do. The Ecological Society of America has written a letter offering to help process grants and requesting that they be continued.


NIH has also been asked by the Trump administration to cut overhead payments to universities to only 10% of the grant, as we reported last week. This would fundamentally alter the symbiotic relationship between federal agencies supporting science and the universities in which scientific research is performed.

At the NIH, a plan to cap grants to large labs has been abandoned. Instead, more grant money has been set aside for younger principal investigators in the Next Generation Researchers Initiative.

Quite timely, then, that Caroline Fiennes has an op-ed in Nature News calling for a science of philanthropy. She notes, “Some grants to academic scientists create so much administration that researchers are better off without them.” She also has evidence suggesting that, “grant size didn’t seem to affect success,” even though we intuitively assume larger grants have bigger impact.

South Wales police have made the first arrest (that we know of) using facial recognition technology. The police are not saying much. The use of machine learning and computer vision in criminal justice is likely to ramp up. ProPublica has critiqued the COMPAS system used to set bail. I predict we will also see gait recognition because it is harder to mask one’s gait by wearing a baseball hat or hoodie. It can also be more computationally intensive to process motion data than data from still images.


Looking at new trends in Distributed Robotics Systems and Society

Robohub, Eduardo Castello Ferrer


from

It isn’t a secret that distributed robotic systems are starting to revolutionize many applications from targeted material delivery (e.g., Amazon Robotics) to precision farming. Assisted by technological advancements such as cloud computing, novel hardware design, and manufacturing techniques, nowadays distributed robot systems are starting to become an important part of industrial activities including warehouse logistics or autonomous transportation.

However, as many engineers and scientists in this field know, several of the heterogeneous characteristics of these systems that make them ideal for certain future applications — robot autonomy, decentralized control, collective emergent behavior, collective learning, knowledge sharing, etc. — hinder the evolution of the technology from academic institutions to the public sphere.


For the First Time, Astronomers Measure the Mass of a Star Using General Relativity

Discover.com, D-brief, Nathaniel Scharping


from

For the first time, astronomers have measured the mass of a star by observing the way its mass deforms light passing by it.

It’s an observation that Einstein predicted but thought could never actually happen, due to the incredibly precise alignment between distant astronomical objects it entails. But using modern observing tools, researchers recently found and tracked two distant stars as they lined up almost perfectly.


What are machine learning engineers?

O'Reilly Radar, Ben Lorica and Mike Loukides


from

We are beginning to hear more companies forming teams of machine learning engineers. This isn’t really a new specialty, as such; as machine learning (and, in particular, deep learning) became the rage in data science circles, data engineers were bound to look to the next step. But what distinguishes a machine learning engineer from a data engineer?

To some extent, machine learning engineers do what software engineers (and good data engineers) have done all along.


AI Influencer Andrew Ng Plans The Next Stage In His Extraordinary Career

Forbes, Peter High


from

I was curious how his plans have taken shape in the couple of months since the announcement, so I caught up with him at his office at the Gates Computer Science Building at Stanford University. Given how influential his career has been to date, I was curious where he would focus his attention from this point forward. We also covered his recommendations for companies that are nearer to the beginning of the journey of implementing artificial intelligence, the emergence of roles like the chief artificial intelligence officer, and the industries that are most likely to be impacted by AI, as well as his comparison between the business cultures in the United States and China, among a variety of other topics.

 
Events



Sportdata & Performance Forum

Pinnacle Sports & Events Management Limited


from

Zurich, Switzerland November 27-28, produced by Pinnacle Sports & Events [$$$]

 
Deadlines



Reproducibility in ML Workshop

Sydney, Australia Part of ICML2017. Deadline for submissions is June 17.

Social Media & Social Order | SM⨯SO | International Conference

Oslo, Norway November 30-December 2. Deadline for paper abstracts is August 1.

Call for Papers: Computation+Journalism Symposium

Evanston, IL Symposium is October 13-14. Deadline for submissions is August 1.

GTC DC

Washington, DC Presented by NVIDIA, the GPU Technology Conference (GTC) is the most important event series for GPU developers in the world. Deadline for talks’ submissions is September 1.

NSF-Simons Research Centers for Mathematics of Complex Biological Systems – Program Solicitation

“The purpose of the NSF-Simons Research Centers for Mathematics of Complex Biological Systems (MathBioSys) is to enable innovative collaborative research at the intersection of mathematics and molecular, cellular and organismal biology, to establish new connections between these two disciplines, and to promote interdisciplinary education and workforce training.” Deadline for proposals is September 29.
 
Tools & Resources



A Beginner’s Guide to Biking to Work

The New York Times, AC Shilton


from

If that bicyclist whizzing by seems a little happier than the average gridlock-bound car commuter, you’re not imagining it: A recent study found that two-wheeled commuters were happier than their gas pedal-stomping, car-caged peers.

Becoming a bike commuter might seem daunting, but the benefits can be worth it: exercising regularly, saving money, decreasing your carbon footprint, absolving yourself of guilt over that break-room doughnut.

Benefits aside, that happiness bikers had over their driving counterparts? It’s all about control.


Medical Data for Machine Learning

GitHub – beamandrew


from

“This is a curated list of medical data for machine learning. This list is provided for informational purposes only, please make sure you respect any and all usage restrictions for any of the data listed here.”


TorchCraft Release v1.3-0

GitHub – TorchCraft


from

“Python support has been tested, and is now in Beta.”


A Research to Engineering Workflow

Dustin Tran


from

Going from a research idea to experiments is fundamental. But this step is typically glossed over with little explicit advice. In academia, the graduate student is often left toiling away—fragmented code, various notes and LaTeX write-ups scattered around. New projects often result in entirely new code bases, and if they do rely on past code, are difficult to properly extend to these new projects.

Motivated by this, I thought it’d be useful to outline the steps I personally take in going from research idea to experimentation, and how that then improves my research understanding so I can revise the idea.


Your Checklist to Get Data Science Implemented in Production

KDnuggets, Dataiku


from

For over a year we surveyed thousands of companies from all types of industries and data science advancement on how they managed to overcome these difficulties and analyzed the results. Here are the key things to keep in mind when you’re working on your design-to-production pipeline.


Keras Deep Learning with Apple’s CoreMLTools on iOS 11 – Part 1

Amund Tveit


from

This is a basic example of train and use a basic Keras neural network model (XOR) on iPhone using Apple’s coremltools on iOS11. Note that showing the integration starting from a Keras model to having it running in the iOS app is the main point and not the particular choice of model, in principle a similar approach could be used for any kind of Deep Learning model, e.g. generator part of Generative Adversarial Networks, a Recurrent Neural Network (or LSTM) or a Convolutional Neural Network.


A neural approach to relational reasoning

Google DeepMind; Adam Santoro, David Raposo, Nick Watters


from

“In two new papers, we explore the ability for deep neural networks to perform complicated relational reasoning with unstructured data. In the first paper – A simple neural network module for relational reasoning – we describe a Relation Network (RN) and show that it can perform at superhuman levels on a challenging task. While in the second paper – Visual Interaction Networks – we describe a general purpose model that can predict the future state of a physical object based purely on visual observations.”


Welcoming Core ML – Towards Data Science

Medium, Emannuel Carvalho


from

Core ML supports many machine learning models (neural networks, tree ensembles, support vector machines, and generalized linear models). The model should be in the Core ML model format (models with a .mlmodel file extension)1.

In our example, we will use a famous model, the VGG16, which is used to classify images, and luckily for us it is one of the pre-trained models that are available in Core ML.


Data Sketching – The approximate approach is often faster and more efficient.

ACM Queue, Graham Cormode


from

Streaming methods have been adopted in a number of domains, starting with telecommunications but spreading to search engines, social networks, finance, and time-series analysis. These ideas are also finding application in areas using traditional approaches, but where the rough-and-ready sketching approach is more cost effective. Successful applications of sketching involve a mixture of algorithmic tricks, systems know-how, and mathematical insight, and have led to new research contributions in each of these areas.

This article introduces the ideas behind sketching, with a focus on algorithmic innovations. It describes some algorithmic developments in the abstract, followed by the steps needed to put them into practice, with examples. The article also looks at four novel algorithmic ideas and discusses some emerging areas.

 
Careers


Full-time positions outside academia

Data Engineer



PGA Tour; Ponte Vedra Beach, FL

Leave a Comment

Your email address will not be published.