Data Science newsletter – May 3, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 3, 2018

GROUP CURATION: N/A

 
 
Data Science News



Government Data Science News

The Food and Drug Administration is moving ahead with a pilot program to pre-certify companies that make software as a medical device (SaMD). This is a big step by an agency that is struggling to regulate new actors in the health space, whether it’s companies like FitBit and Apple that make software that could influence healthfulness, or clinicians who would like to prescribe exercise. The program will help them get a handle on how to regulate more complex algorithms in the healthcare loop, I imagine, but a far more complicated undertaking. Stay tuned.


Public librarians California and Washington state are being trained to handle open data requests to take the burden off of agency officials. You will still have to use the Freedom of Information Law (FOIL) to obtain agency data that is not public.



Two scientists – Phil Janowicz and Kristopher Larsendropped out of their respective races for the US House of Representatives. If you’d like to vote for scientists, find out who is still in the running on 314action.org, a group established after the last presidential election to support scientists trying to get elected.



Cities like Atlanta and Baltimore have been hacked recently, making it more obvious just how important cybersecurity is for functioning civic and commercial activities. In Baltimore, the 311 and 911 systems were down for 17 hours. Most large cities face cyber attacks every single day, according to CityLab.



A bill being considered in Georgia demonstrates how difficult it is to create regulations for cybersecurity. The bill would make it illegal to access a computer or network “without authority”. This means many typical protective activities within the cybersecurity world would become illegal, thereby weakening overall cybersecurity. The intention of the bill is to provide an avenue to punish information thieves who take data that was accidentally left in the clear.

The National Institutes of Health is now actively recruiting 1 million volunteers to provide DNA samples and medical records to improve our understanding of the genetic underpinnings (or lack thereof) that explain longevity, sensitivity to disease, to treatment, and to environmental factors. Sign up here – it’s kind of like being an organ donor without having to give away your organs. Lives will be extended.



Michael Griffin has been promoted to Undersecretary of Research and Engineering at the Defense Department. He will be responsible for DoD research in “hypersonics (for offense and defense); directed energy; machine learning and artificial intelligence; quantum science, including encryption and computing; and microelectronics.”



Michigan, not wanting to relinquish it’s status as the automotive capital of the US, just clinched a deal with Toyota Research Institute to build a closed autonomous vehicle testing facility. Safety first! (I’m looking at you, Uber.)


House Funding Bill Allocates $1.2B for VA EHR System Next Year

Healthcare Informatics Magazine, Rajiv Leventhal


from

The House Appropriations Committee has released its Military Construction and Veterans Affairs bill for 2019, including $1.2 billion for the new VA electronic health record (EHR) system.

In total, the legislation provides $96.9 billion in discretionary funding – $4.2 billion above the fiscal year enacted 2018 level. As stated in a press release from the Appropriations Committee, “The bill contains $1.2 billion for the new VA electronic health record system. This will ensure the implementation of the contract creating an electronic record system for VA that is identical to one being developed for DoD [The Department of Defense]. These two identical systems will ensure our veterans get proper care, with timely and accurate medical data transferred between the VA, DoD, and the private sector.”


Alphabet’s AI venture capital firm makes first investment in Canada with BenchSci

The Globe and Mail, Josh O'Kane


from

Alphabet Inc.’s nascent artificial-intelligence venture fund has picked its first Canadian investment: BenchSci, a Toronto biomedical startup that uses machine learning to scan millions of data points in biomedical research papers, generating searchable results to help shorten the drug discovery process. … The internet-search giant, Mr. Belenzon said, will help BenchSci build best practices and a strong work culture while connecting the startup with potential customers. “They have unique technological expertise to add value to the company,” Mr. Belenzon said in an interview.


Company Data Science News

Facebook lost Jan Koum, fb employee, board member, and founder of the facebook-acquired messaging service WhatsApp. Koum has been frustrated by Facebook’s reliance on online advertising since at least 2012 when he wrote (with WhatsApp co-founder Brian Acton) that he thinks online ads are “a disruption to aesthetics, an insult to your intelligence, and the interruption of your train of thought.” Koum is a billionaire. Let’s hope that is not the minimum financial threshold above which we feel free to take principled action.



Apple is poised to overtake Spotify in streaming and is hiring designers and culture-world aficionados like crazy, aiming to become a high tech music, culture, creativity, and design brand.



Daphne Koller, formerly of Calico, an Alphabet subsidiary focused on “health, well-being, and longevity” has announced her next big thing: insitro. Insitro will utilize machine learning to improve drug R&D, an idea that is much harder to do than it is to talk about doing. Koller has the ML chops so this one might be the real deal.



Hilary Mason, formerly CEO of FastForward Labs which was acquired by Cloudera, has been put in charge of her own Machine Learning organizational branch at Cloudera as a result of a larger organizational manuever. Anupam Singh will head up a new vertical called Analytics. (See how machine learning and analytics are separate! Smart.). I now plan to invest in their stock but I am in no way qualified to give financial advice.



Twitter is telling all users to change their passwords. None were lost or leaked, to the best of their knowledge, but they were stored without hashing or other obfuscation in an internal log along with unmasked email addresses. As a matter of privacy hygiene, let me remind everyone never to use the same password for two accounts, especially not your email or banking accounts.


Extra Extra

“Everyone is publishing and publishing because that’s where the money in science comes from. But if everyone is publishing and nobody is reading, are we making a contribution? Are we really doing anything important?” Depressing thought, from a scientist who cites it as the root of his clinical depression. Nature received 150 responses to its earlier story on mental illness in academia and is now launching an entire series on it.



A study by UCLA IPAM demonstrates that robots will make better drivers than humans, this time by simulating the cascade effects from that one (human) driver who inexplicably brakes for no apparent reason.



Handicapper Bill Benter wrote an algorithm to bet on horses with which he has won over $1bn. Saddle up, data scientists.


AI Revives In-Memory Processors

EE Times, Rick Merritt


from

Startups, corporate giants, and academics are taking a fresh look at a decade-old processor architecture that may be just the thing ideal for machine learning. They believe that in-memory computing could power a new class of AI accelerators that could be 10,000 times faster than today’s GPUs.

The processors promise to extend chip performance at a time when CMOS scaling has slowed and deep-learning algorithms demanding dense multiply-accumulate arrays are gaining traction. The chips, still more than a year from commercial use, also could be vehicles for an emerging class of non-volatile memories.

Startup Mythic (Austin, Texas) aims to compute neural-network jobs inside a flash memory array, working in the analog domain to slash power consumption. It aims to have production silicon in late 2019, making it potentially one of the first to market of the new class of chips.


The Digital Vigilantes Who Hack Back

The New Yorker, Nicholas Schmidle


from

American companies that fall victim to data breaches want to retaliate against the culprits. But can they do so without breaking the law?


Feeling overwhelmed by academia? You are not alone

Nature, Chris Woolston


from

More than 150 scientists contacted Nature with their personal stories following coverage of an international survey showing evidence of a mental-health crisis in graduate education (T. M. Evans et al. Nature Biotechnol. 36, 282–284; 2018). To kick off a series on mental health in academia, we talked to five people on the front lines of science who were willing to share their insights and discuss how changes to the culture might help.


Can we teach computers to be digital detectives?

Johns Hopkins Mathematical Institute for Data Science


from

Imagine standing on the sidewalk of a busy city street, taking in your surroundings. “When you or I look at that scene, we have a task in mind—whether to find a place to eat or shop, the metro station, or a particular person,” says René Vidal, a Johns Hopkins professor of biomedical engineering. We take into account variables such as lighting, weather, and our angle of view. We’re able to distinguish the guy walking his dog from the suspicious one who seems to be following us. We can tell a car that’s pulling into a parking spot from one about to smash into a building.

For all that computers can do better than humans—from playing the stock markets to figuring out the band behind that song stuck in our head—we still have them beat in at least one important way. “We can filter out information that isn’t relevant, or note information that is out of the ordinary,” Vidal says. “Our question is: How can we get computers to do the same?”

Computers can already process massive amounts of data. Vidal’s team is trying to program them to process semantic information—the meaning and relationships among people, objects, environments, and actions. This could help defense and law enforcement agencies quickly figure out what happened in the aftermath of an act of terrorism, and possibly even prevent it from happening in the first place. If computers could simultaneously analyze the contexts and relationships of photographs, videos, audio, text documents, and internet activity—collectively known as “multimodal data”—they could become digital detectives, with faster, more accurate, and greater processing abilities than humans.

That’s why the U.S. Department of Defense issued a call for white papers through its Multidisciplinary University Research Initiative program. In response, Vidal assembled an all-star team of researchers, and in 2017 their proposal won the five-year, $11 million grant for a project called the Semantic Information Pursuit for Multimodal Data Analysis, funded jointly by the Department of Defense and the U.K.’s Engineering and Physical Sciences Research Council.


Software System Award Honors Project Jupyter Team

Lawrence Berkeley Lab


from

The Project Jupyter team has been honored with an Association of Computing Machinery (ACM) Software System Award for developing a tool that has had a lasting influence on computing. Project Jupyter evolved from IPython, an effort pioneered by Fernando Pérez, an assistant professor of statistics at UC Berkeley and staff scientist in the Usable Software Systems Group in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research Division.

The award and a prize of $35,000 will be presented to the team at the ACM Awards banquet in San Francisco on June 23, 2018.


Cities, CDOs and the Power of Networking

Governing magazine, Stephen Goldsmith and Jane Wiseman


from

For the past two years, we’ve been able to bring together these data leaders and accelerate their successes via the power of a peer network. They help each other with analytic approaches to tough policy issues like traffic congestion and lead-paint abatement, and they also troubleshoot the operational aspects of their jobs, such as hiring and training staff and managing vendors. They don’t just share results and source code — they share their processes, methods and the pitfalls along the way. They make their job descriptions, RFPs and data-sharing agreement templates available to each other. They even help each other build open-source data tools.

While the CDOs in our Civic Analytics Network are leaders at the forefront of their young field, some of the insights gained from their work are transferrable to all cities, regardless of their data-maturity stage. From the CDOs’ conversations we have distilled a list of important steps that data-savvy local and state governments should undertake:


Cloudera Build Out for Machine Learning, Analytics, and Cloud

RTInsights, Sue Walsh


from

Cloudera says the goal of the new units is to increase innovation and help their customers transform their most complex data into actionable insights.


AI-powered doctor assistant Suki lands $20M in funding

MobiHealthNews, Laura Lovett


from

Suki, a startup that makes an AI-powered and voice-enabled digital assistant for doctors, has just landed $20 million in a funding round led by Venrock, with participation from First Round, Social Capital, and Marc Benioff.

The technology was developed to help doctors handle paperwork and update a patient’s EHR. The technology is also designed to gradually personalize itself based on each individual doctor — in fact, the more time the platform spends with doctors, the more it learns about their needs. The digital assistant also can search and retrieve patient data and capture data over time. Part of its capabilities include updating and charting patients’ EHRs.


A radical new theory proposes that facial expressions are not emotional displays, but “tools for social influence”

The British Psychological Society, Research Digest, Emma Young


from

You’re at a ten-pin bowling alley with some friends, you bowl your first ball – and it’s a strike. Do you instantly grin with delight? Not according to a study of bowlers, who smiled not at a moment of triumph but rather when they pivoted in their lanes, to look at their fellow bowlers.

That study provided the earliest evidence for a controversial hypothesis, the Behavioural Ecology View (BECV) of facial displays, outlined in detail in a new opinion piece in Trends in Cognitive Sciences. Carlos Crivelli at De Montfort University, Leicester, UK and Alan Fridlund at the University of California, Santa Barbara, put forward the case that facial displays are not universal, “pre-wired” expressions of emotion – a concept supported by 80 per cent of emotion researchers in a recent poll – but are flexible tools for influencing the behaviour of other people.

The same range of specific facial displays have been associated with anger, disgust, fear, joy, sadness and surprise in many different cultures. As Paul Ekman at the University of California, San Francisco – the best-known proponent of the “basic emotions theory” or BET – writes in a blog post: “The capacity for humans in radically different cultures to label facial expressions with terms from a list of emotion terms has replicated nearly 200 hundred times.”


Ancient DNA expands our understanding of evolution

Gordon and Betty Moore Foundation


from

In the late 19th century the passenger pigeon, once the most abundant bird in North America, and possibly the world, went extinct. Three-to-five billion passenger pigeons once glided across the skies (a remarkably large number for any vertebrate) and so it raises the question: how could such a large population die off never to be seen again?

This is the question Drs. Beth Shapiro and Richard Green pondered. “Why didn’t little tiny populations of this bird survive in some refugial forest somewhere? Why did they just go from billions to none?” Shapiro posited.

Ecologists at University of California, Santa Cruz Paleogenomics Lab, Shapiro and Green used ancient DNA to try to answer these questions. Ancient DNA is defined as isolating DNA (the carrier of genetic information) from specimens that are dead more than 100 years or from biological samples that have not already been preserved specifically for DNA analyses. The field got its start in 1985 with the extraction and sequencing of DNA from a 150-year-old museum specimen – an extinct subspecies of the zebra. Since then, scientists have been able to reconstruct the complete genomes from several extinct species, such as Neanderthals and mammoths.

 
Events



Rock Health Enterprise Insights Forum

Rock Health


from

San Francisco, CA June 19. “A one-day event focused on demystifying AI and machine learning in healthcare. This invitation-only gathering bring together senior leaders from major healthcare companies along with select investors, academics, and startup founders.” [$$$, invitation only]


Design@Large: Stuart Geiger (UC Berkeley)

University of California-San Diego, Design Lab


from

La Jolla, CA May 9, starting at 4 p.m., CSE 1202 on the UCSD campus. Title: The Human Contexts of Computation and Data: Infrastructures, Institutions, and Interpretations. [free]


Draft CarpentryCon 2018 Program

Software Carpentry Foundation


from

Dublin, Ireland May 30-June 1. “CarpentryCon 2018 is the key community-building and networking event in The Carpentries’ annual calendar of activities.” [$$$]


SmartCities NYC

Jerry Hultin, Raj Pannu, Aarti Tandon


from

New York, NY May 8-10 at Pier 36. “NYU CUSP Prof @NeilKleiman will present during @SmartCitiesNY, an exciting conference brings citizen-inspired thinking to the innovations that are transforming cities around the world.” [$$$$]


Feed the Feed – Machine Learning Networking Dinner

Pinterest Labs


from

San Francisco, CA May 10 starting at 6:15 p.m., Pinterest (808 Brannan St). “An invite-only dinner for engineers, researchers and scientists working on challenging machine learning & ranking problems. Join us over food and drinks to discuss industry trends with other experts in your field and learn about the ongoing machine learning efforts from different teams at Pinterest.” [invitation only]


Philly Data Jawn 2018

CompassRed


from

Philadelphia, PA June 13, starting at 3 p.m. “We at CompassRed’s Data Lab are working with leaders in Philly’s data science community to herald the return of Data Jawn. On June 13, 2018, from 3:00 p.m. to 7:00 p.m., you’re invited to hear our keynote, attend panels, listen to lightning talks and mingle with other folks who love data as much as we do.” [$$]

 
Deadlines



Genomics Research Hackathon for Rare Kidney Cancer

San Francisco, CA May 18-20. Organized by SVAI Team. Deadline to apply is May 13.

Digital Infrastructure Research RFP

“In 2016, the Ford Foundation funded a report by Nadia Eghbal titled “Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure” that described how the development and maintenance of digital infrastructure often falls to communities of volunteers who take it upon themselves to maintain this infrastructure in their own free time and for little or no money. Unsurprisingly, this leads to significant risks to the open internet and the ability to develop new, innovative research and businesses within it.”

“The Sloan and Ford Foundations would like to fund a set of research projects to further study these dynamics, with an eye toward better understanding the economics, maintenance and sustainability of digital infrastructure.” Deadline for proposals is June 13.

 
Moore-Sloan Data Science Environment News



Financial Stress Testing Powered by Machine Learning

Medium, NYU Center for Data Science


from

Bud Mishra, Professor of Computer Science and Mathematics at NYU, devises novel method for stress testing and causality analysis.

 
Tools & Resources



Raspberry Pi meets AI: The projects that put machine learning on the $35 board

ZDNet, Nick Heath


from

“The majority of the following projects use pre-trained, machine-learning models to teach Pi boards about the world around them: schooling robots in how to navigate tricky terrain through to powering early warning systems for car parking attendants.” [slideshow, 14 projects]


Airship: A New Front-End Library for Location Intelligence Apps

CARTO, Steve Isaac


from

One of several exciting announcements coming out of CARTO Locations Madrid was the launch of Airship, CARTO’s new front-end library for Location Intelligence apps. Airship, along with CARTO.js 4.0 and the new Developer Center, is a part of CARTO’s strategy to empower developers with the tools to build their own custom Location Intelligence applications and solutions to answer today’s varied and complex business questions.

With this library, designers and developers can generate styles, interactive elements, typography, and many more design elements optimized specifically for location applications with minimal coding requirements.


An Introduction to Hashing in the Era of Machine Learning

Medium, Bradfield, Tyler Elliot Bettilyon


from

New research is an excellent opportunity to reexamine the fundamentals of a field; and it’s not often that something as fundamental (and well studied) as indexing experiences a breakthrough. This article serves as an introduction to hash tables, an abbreviated examination of what makes them fast and slow, and an intuitive view of the machine learning concepts that are being applied to indexing in the paper.

(If you’re already familiar with hash tables, collision handling strategies, and hash function performance considerations; you might want to skip ahead, or skim this article and read the three articles linked at the end of this article for a deeper dive into these topics.)

In response to the findings of the Google/MIT collaboration, Peter Bailis and a team of Stanford researchers went back to the basics and warned us not to throw out our algorithms book just yet. Bailis’ and his team at Stanford recreated the learned index strategy, and were able to achieve similar results without any machine learning by using a classic hash table strategy called Cuckoo Hashing.


JavaScript replacements for Python data science tools

Observable, Tom MacWright


from

Python is an excellent language and ecosystem for data science, and JavaScript in many ways is just catching up now. But it’s catching up quickly.

Here’s a running list of tools and their equivalents in JavaScript. In some cases, it’s not a perfect 1:1 equivalence, but we’re trying to get as close as possible.


Cornell Newsroom Summarization Dataset

Cornell University, Connected Experiences Lab.


from

“A large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.” (arXiv)


Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

arXiv, Computer Science > Computation and Language; Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush


from

Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process. The aim is to identify which patterns have been learned and to detect model errors. We demonstrate the utility of our tool through several real-world large-scale sequence-to-sequence use cases.

Leave a Comment

Your email address will not be published.