Data Science newsletter – May 18, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for May 18, 2017


Data Science News

Faster machine learning is coming to the Linux kernel

InfoWorld, Serdar Yegulalp


It’s been a long time in the works, but a memory management feature intended to give machine learning or other GPU-powered applications a major performance boost is close to making it into one of the next revisions of the kernel.

Heterogenous memory management (HMM) allows a device’s driver to mirror the address space for a process under its own memory management. As Red Hat developer Jérôme Glisse explains, this makes it easier for hardware devices like GPUs to directly access the memory of a process without the extra overhead of copying anything. It also doesn’t violate the memory protection features afforded by modern OSes.

One class of application that stands to benefit most from HMM is GPU-based machine learning. Libraries like OpenCL and CUDA would be able to get a speed boost from HMM. HMM does this in much the same way as speedups being done to GPU-based machine learning, namely by leaving data in place near the GPU, operating directly on it there, and moving it around as little as possible.

Imec Tips Novel AI Chips

EE Times, Rick Merritt


The Imec research institute described machine learning accelerators using arrays of resistive and magnetic memory cells rather than neural networks to reduce cost and power. Initial results included an MRAM array that lowered power by two orders of magnitude.

It’s early days for the promising work. Imec is withholding details of the chips’ architecture and their performance until later in the year when it has its patents filed. The research institute started a machine learning group just 18 months ago as part of its ongoing efforts to expand beyond its core work on silicon process technology.

FDA to Create Digital Health Unit

Regulatory Affairs Professionals Society, Zachary Brennan


With ongoing work on guidance related to software as a medical device, and a new dedicated unit to digital health coming to the US Food and Drug Administration’s Center for Devices and Radiological Health (CDRH), the agency is slowly but surely dipping its toe into the rapidly advancing field.

Bakul Patel, ‎associate center director for digital health at FDA, told attendees at MedCon in Cincinnati on Thursday that current work is directed at funneling through about 1,400 comments on draft guidance on software as a medical device released last October, which is also a priority for the International Medical Device Regulators Forum (IMDRF).

Patel noted that FDA is currently working on “what clinical validation looks like for software,” adding that the 46-page document will likely be simplified to 20 pages when finalized.

School of Medicine Creates Department of Population Health Sciences

Duke University School of Medicine


On Saturday, May 13, the Duke University Board of Trustees approved the creation of a Department of Population Health Sciences within the School of Medicine. The current Center for Population Health Sciences will be elevated to department status, effective July 1, 2017. Lesley Curtis, PhD, director of the center, will serve as interim chair of the new department until a permanent chair is named.

Using Data to Find the True Price of Parking in Boston

Data-Smart City Solutions, Wyatt Cmar


During weekdays in the Seaport District, fares adjust at 8:00 AM, noon, and 4:00 PM to accommodate fluctuating parking demand. At two-month intervals, data scientists in DoIT review the information collected by the smart meters equipped with sensors to determine whether to raise or lower daily rates by 50 cents, or hold prices constant. According to the parameters of the pilot, spots could become as cheap as a dollar per hour or gradually rise to as much as $4 per hour.

In March, the city announced that the majority of the 550 spaces in the Seaport District have gone down to a dollar per hour. A quarter of meter hours stayed constant at $1.50 per hour, and a quarter went up to two dollars per hour. Drivers can track the rates online at the Performance Parking website.

Twenty years on from Deep Blue vs Kasparov: how a chess match started the big data revolution

The Conversation, Mark Robert Anderson


This was no ordinary game of chess. It’s not uncommon for a defeated player to accuse their opponent of cheating – but in this case the loser was the then world chess champion, Garry Kasparov. The victor was even more unusual: IBM supercomputer, Deep Blue.

In defeating Kasparov on May 11 1997, Deep Blue made history as the first computer to beat a world champion in a six-game match under standard time controls. Kasparov had won the first game, lost the second and then drawn the following three. When Deep Blue took the match by winning the final game, Kasparov refused to believe it.

Apple’s New Campus: An Exclusive Look Inside the Mothership

WIRED Magazine, Steven Levy


When [city council member Kris] Wang asked what benefit would come to Cupertino from this massive enterprise, the speaker had a slight edge to his voice as he explained, as if to a child, that it would enable the company to stay in the California township. Otherwise, it could sell off its current properties and take its people with it, maybe to someplace nearby, like Mountain View. That unpleasantness out of the way, the speaker was able to return to the subject of what he would create.

“I think we do have a shot,” he told the council, “of building the best office building in the world.” What he didn’t tell them—during what none of them could have known would be his last public appearance—is that he was not just planning a new campus for the company he cofounded, built, left, returned to, and ultimately saved from extinction. Through this new headquarters, Steve Jobs was planning the future of Apple itself—a future beyond him and, ultimately, beyond any of us.

Deep learning: What’s changed?

SAS Voices, Colin Gray


How does deep learning differ from traditional algorithms?

Let’s take a few examples. A credit scoring model based on logistic regression will typically use around ten to fifteen input parameters, such as age, income, time at address etc. More complex decision trees or neural networks used to detect fraud may use hundreds of parameters. Deep learning takes this to a whole new level and may use hundreds of thousands or even millions of parameters. This can only really work when there are thousands or even millions of examples to train the models.

Hadoop Certification | Cloudera, Hortonworks, Edureka Big Data & Hadoop Certifications

Edureka, Shubham Sinha


Why do you need Big Data Hadoop Certification? If you complete a Big Data Hadoop certification, you are recognized in the industry as a capable and a qualified Big Data expert. It would give you a preference and add value to your resume which helps in grabbing job opportunities in the field of Big Data & Hadoop.

Google is building AR and VR that knows where you are

Engadget, Nicole Lee


Google made plenty of VR- and AR-related announcements at I/O, its annual developer conference today. Not only will Samsung’s Galaxy S8 and LG’s upcoming flagship be Daydream-compatible, but the company also announced that it’s working on standalone Daydream VR headsets. Enabling the latter is a unique technology called WorldSense, a new set of positional tracking tools that doesn’t require any cameras or complicated setup. This sets the Daydream headsets apart from the competition. The current HTC Vive, for example, needs external room sensors; Google’s upcoming headsets wouldn’t.

This technology is important because it enables the standalone Daydream headsets to track your precise movements in space without the need for cables or an external PC. In this way, it sounds very similar to Project Santa Cruz, which is Oculus’ own standalone VR headset. But while Project Santa Cruz is still in the prototype phase, Google’s standalone Daydream already has partners like HTC and Lenovo working on headsets people can actually buy.

Researchers Use Molecular Dynamics and Machine Learning to Create ‘Hyper-predictive’ Computer Models for Drug Discovery

North Carolina State University, NC State News


Researchers from North Carolina State University have demonstrated that molecular dynamics simulations and machine learning techniques could be integrated to create more accurate computer prediction models. These “hyper-predictive” models could be used to quickly predict which new chemical compounds could be promising drug candidates.

Drug development is a costly and time-consuming process. To narrow down the number of chemical compounds that could be potential drug candidates, scientists utilize computer models that can predict how a particular chemical compound might interact with a biological target of interest – for example, a key protein that might be involved with a disease process. Traditionally, this is done via quantitative structure-activity relationship (QSAR) modeling and molecular docking, which rely on 2- and 3-D information about those chemicals.

How physics and mathematics are helping us understand society

Frontiers Blog, Daisy Hessenberger


From Twitter to politics, the question is now, how will this new field develop and can we use it to address some of the biggest challenges in our world? “This ‘work-in-progress’ spirit is reflected as well in this volume,” write Topic Editors. “[T]he call was launched in late 2014 and 10 articles were eventually accepted and published, including reviews—a look behind—, one methods paper, and six original contributions—a look ahead—introducing a broad range of research, from models with a strong analytical flavour to data-driven problems.”

Of course, working in a new field has its pitfalls, and at any crossroads you hope to find signs pointing out the way. This timely collection of peer-reviewed open-access articles, “At the Crossroads: Lessons and Challenges in Computational Social Science”, not just provides examples of CSS but also aims to outline good practice in this field. In “P-Values: Misunderstood and Misused”, authors Bertie Vidgen and Dr. Taha Yasseri from the Oxford Internet Institute “call for a more careful use of statistical tests and show [a] few directions for improvement.”

Apple announces $1 billion data center expansion east of Reno

Reno Gazette-Journal, Anjeanette Damon


Apple easily won approval from the Reno City Council on Wednesday for a new agreement allowing it to take advantage of millions of dollars in sales tax abatements as it works to double the size of its massive data center east of Reno.

Only Councilwoman Jenny Brekhus voted against the agreement that will allow Apple to buy land in downtown Reno instead of lease it to be eligible for a tax break that will bring its sales tax rate down to 0.5%.

The vote came moments after Apple executive Mike Foulkes announced Apple will invest another $1 billion in expanding its data center at the Reno Technology Park in the Truckee River canyon east of Sparks.

Liquid-crystal and bacterial living materials organize and move in their own way

Penn State University, Penn State News


Smart glass, transitional lenses and mood rings are not the only things made of liquid crystals; mucus, slug slime and cell membranes also contain them. Now, a team of researchers is trying to better understand how liquid crystals, combined with bacteria, form living materials and how the two interact to organize and move.

“One of the ideas we came up with was materials that live,” said Igor S. Aronson, holder of the Huck Chair and Professor of Biomedical Engineering, Chemistry and Mathematics. Living matter, active matter may be self-healing and shape-changing and will convert energy to mechanical motion.”

The living material Aronson is exploring using predictive computational models and experiments is composed of a bacterium — Bacillus subtilis — that can move quickly using its long flagella and a nematic liquid crystal — disodium cromoglycate. Liquid crystals as materials sit somewhere between a liquid and a solid. In this case, the molecules in disodium cromoglycate line up in long parallel rows, but are not fixed in place. Capable of moving, they remain oriented in only one direction unless disturbed.

We’re living in a golden age of AI, says Jeff Bezos

World Economic Forum, Futurism, Dom Galeon


For Amazon founder and Blue Origin CEO Jeff Bezos, artificial intelligence (AI) isn’t something to be feared. Instead, it’s to be celebrated. Speaking at the Internet Association’s annual gala in Washington D.C., the renowned tech entrepreneur told the audience that AI is now enjoying a renaissance, as it helps to solve problems previously seen only as science fiction.

“It is a renaissance, it is a golden age,” Bezos said. “We are now solving problems with machine learning and artificial intelligence that were … in the realm of science fiction for the last several decades. And natural language understanding, machine vision problems, it really is an amazing renaissance.”

How AI Could Make Urban Bicycling a Whole Lot Easier

VICE, Motherboard, Michael Byrne


With long curly locks, sandy moustache, and driving cap, Edward Zimmerman looks like the fiddler in a Scottish folk band. To say he looks out of place as a presenter at the GPU Technology Conference is even something of an understatement—Zimmerman would look out of place behind a computer in a public library.

His presentation is likewise out of place. For one thing, its focus is on public transportation infrastructure, a domain that’s not likely to get anyone rich and-or famous. Specifically, Zimmerman, as a cyclist, wants to fix urban bicycling—or at least make it better and perhaps safer. He wants to do this by optimizing traffic lights for bicycle traffic using deep learning.

Cyclists and drivers are aware of the phenomenon of the “green wave” even if they don’t know it by name. It’s where you get the timing of traffic lights on a city street just perfect and are able ride a “wave” of green traffic lights through the city.


Dataverse Community Meeting 2017

Harvard IQSS


Cambridge, MA June 14-16, hosted by Harvard’s Institute for Quantitative Social Science. [free, registration required]

Tools & Resources

Introducing the TensorFlow Research Cloud

Google Research Blog, Zak Stone


To accelerate the pace of open machine-learning research, we are introducing the TensorFlow Research Cloud (TFRC), a cluster of 1,000 Cloud TPUs that will be made available free of charge to support a broad range of computationally-intensive research projects that might not be possible otherwise.

What is auto-sklearn? — AutoSklearn 0.2.0 documentation

GitHub – automl


“auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.”

Tips for Building High-Quality Django Apps at Scale

Medium, DoorDash


At DoorDash, most of our backend is currently based in Django and Python. The scrappy Django codebase from our early days has evolved and matured over the years as our business has grown. To continue to scale, we’ve also started to migrate our monolithic app towards a microservices architecture. We’ve learned a lot about what works well and what doesn’t with Django, and hope we can share some useful tips on how to work with this popular web framework.

What are the challenges of building a data team at a startup?

Quora, Kimberly Shenk


Building a data science team is usually difficult because it is a large up-front investment that is hard to properly leverage. The business struggles to understand how the team is providing value while data scientists also struggle to show their value to the business. Many times this is because data science teams focus on the code and building models with little or no regard on how to get their work into the hands of the business in a timely and useful manner.

I have found that the trick to building a team that provides value to the business is to actually not focus on the modeling – so much.

Fancy things you could build using content negotiation



Our new Content Resolver service is an ideal interface and information source to build integrations. Today, we want to share with you a few potential (and fancy!) integrations one could build using content negotiation and DOI metadata.


Full-time positions outside academia

Software Engineer

Alluvium; Brooklyn, NY

Leave a Comment

Your email address will not be published.