The now annual event was launched in 2010 by the Obama administration as a hackathon-style program where attendees were challenged to develop prototype applications in 30 days from 30 data sets. Today, Health Datapalooza includes presentations from government and private sector healthcare experts, breakout panel sessions that dive into specific areas of interest, and an exhibit hall. The common thread of discussion throughout these activities is how health data is, can, or will eventually be used to improve health outcomes. Over the next two days, we’ll be giving you a snapshot of the presentations and topics being covered at this year’s event with a few specific highlights as we find out what’s on the horizon for health data technology.
We are pleased to present the 4th edition of the Global Open Data Index (GODI), a global assessment of open government data publication. GODI compares national government in 94 places across the 15 key datasets that have been assessed by our community as the most useful for solving social challenges.
For this edition, we received 1410 submitted of datasets, but only 10% of these are open according to the Open Definition. At Open Knowledge International (OKI), we believe it’s important to look further than just the numbers. GODI is not just a benchmark, it can be and should be used as a tool to improve open data publication and make data findable, useful and impactful. This is why we include a new phase, the dialogue phase for this edition.
By embedding the brain into a gel that swells up when pumped with water, the team blew up mouse brain tissue to roughly 20 times its original size, while preserving the normal structure and connections of neurons and their dendrites.
Using this method, aptly dubbed expansion microscopy (ExM), the team reconstructed a tiny piece of the mouse brain in 3D. Normally, dendrites entangle into a jumbled mess, making it hard to tease apart individual synaptic connections with a conventional light microscope.
Fun bits, ear candy, and long reads
OKCupid has some of the best company blog posts. They just put out a new piece on the pitfalls of A/B testing featuring lines like, “the control group would have a bunch of ignored messages and unreciprocated love”.
David Hasselhoff has a new role in which all his lines were written by a robot. I used to get confused, thinking David Hasselhoff was a robot. I realize I may not have been alone in that confusion.
Eric Horvitz, Microsoft Research Technical Fellow spoke at Data & Society about “failures of automation in the open world, biased data and algorithms, opacity of reasoning, adversarial attacks on AI systems, and runaway AI.” Horvitz has just been named ‘head of all research’ at Microsoft Research.
Andrew Russell (SUNY-Polytechnic in Utica) and Lee Vinsel (Stevens Institute of Technology) are angling to launch a sub-field of science and technology studies focusing on maintenance and maintainers. They question the all too often unbridled, uncritical celebration of innovation: “Entire societies have come to talk about innovation as if it were an inherently desirable value, like love, fraternity, courage, beauty, dignity, or responsibility. Innovation-speak worships at the altar of change, but it rarely asks who benefits, to what end? A focus on maintenance provides opportunities to ask questions about what we really want out of technologies. What do we really care about? What kind of society do we want to live in? Will this help get us there?”
Love tech company bios? This week we’ve got the story of Venmo, a money-transfer app widely used by 20-somethings to facilitate splitting restaurant bills, cab fares, rent, and utilities. One person pays upfront, everyone else pays her back using Venmo which is now part of PayPal. In the beginning, as with so many tech companies, mistakes were made. The biggest: no fraud detection whatsoever, in flagrant violation of US securities law and common sense: “the service launched with virtually no regulatory compliance built into it”. Move fast and break stuff, indeed.
This is absolutely not a ‘fun bit’. A years old anti-vaccination campaign in Minnesota has kicked off a measles outbreak with 41 cases reported and more expected in Minnesota’s Somali community. Andrew Wakefield, the one-time doctor who published the original ‘vaccines cause autism’ paper which has been rescinded (so has his medical license), traveled to Minnesota three times in recent years to address the Somali population there. He spread his toxic gospel that the measles, mumps, rubella (MMR) vaccine causes autism, a disease that continues to impact that community. Vaccination rates within the Minnesotan Somali community plummeted from 92 in 2004 before Wakefield’s visits to 42 percent in 2014 according to the Minnesota Department of Health. At NYU, an MBA student was diagnosed with mumps last week. The National Hockey League has reported cases of mumps for the past two years, reminding all of us – especially academics in university settings – to make sure we receive the recommended MMR booster in adulthood.
Automation is coming for the jobs of poor people, enriching the innovators who create them (like the readers of this newsletter, I imagine), and the investors backing the innovators. A Bloomberganalysis found, “the difference in annual income between households in the top 20 percent and those in the bottom 20 percent ballooned by $29,200 to $189,600 between 2010 and 2015”. Yes, the gap between the top quintile and the bottom quintile is now more than three times the size of median household income ($56,516) in the US.
Raj Chetty, David Grusky et al., in a study of income mobility, shows Americans’ ability to move from one income strata to another is fading: “rates of absolute mobility have fallen from approximately 90 percent for children born in 1940 to 50 percent for children born in the 1980s.” As an 80s baby, yeah, even chance sounds about right.
A consortium of eight UK universities, led by the University of Oxford, has been awarded £3 million by the Engineering and Physical Sciences Research Council (EPSRC) to establish a national high-performance computing facility to support machine learning.
Science Advances; Leto Peel, Daniel B. Larremore, and Aaron Clauset
from
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Tropical forests dominate global terrestrial carbon exchange, but long-term climate variability might affect their ability to uptake and store carbon dioxide. To better understand tropical forest carbon dynamics, the Tropical Nutrient Limitation Working Group assembled published datasets to determine how temperature and rainfall interact to control carbon cycling in tropical forests. Their results are featured in a recent Ecology Letters publication.
EurekAlert! Science News, American Institute of Physics
from
New computer simulations, described this week in the journal Physics of Fluids can better probe the underlying physics, potentially leading to more efficient extraction of oil and gas.
The general theme of the presentation is the future of programming and computer science and Peter touches on numerous aspects of these topics.
Peter envisions how our interaction with Technology will evolve as intelligent agents become increasingly prevailing in our lives and how Computer Science is gradually becoming an empirical science. [video, 1:37:13]
Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog
from
There is currently increased attention to the statistical (and replication) crisis in science. Biomedicine and social psychology have been at the heart of this crisis, but similar problems are evident in a wide range of fields. We discuss three examples of replication challenges from the field of social psychology and some proposed solutions, and then consider the applicability of these ideas to clinical neuropsychology. In addition to procedural developments such as preregistration and open data and criticism, we recommend that data be collected and analyzed with more recognition that each new study is a part of a learning process. The goal of improving neuropsychological assessment, care, and cure is too important to not take good scientific practice seriously.
NuTonomy will integrate its software and sensors into the Peugeot 3008, a gasoline-powered crossover SUV. Converted vehicles will undergo real-world testing on the streets of Singapore, where nuTonomy last year launched the world’s first taxi service built around self-driving vehicles.
[Karl] Iagnemma said Peugeot and nuTonomy plan to “get a couple of cars on the road quickly in Singapore this summer, and then scale up rapidly from there.”
The New York Times, Nick Wingfield and Natasha Singer
from
Last week, Satya Nadella, the chief executive of Microsoft, slipped on a glove made of cardboard and clenched his hand into a fist, causing a robotic hand with fingers made of drinking straws to mimic his movements.
The glove was one of several engineering projects built in a makeshift laboratory on Microsoft’s campus. The company spent the last year talking to thousands of teachers and designing high-tech experiments that require mostly low-cost parts. It will give the designs to schools for free so teachers can use them in their lesson plans.
The projects are part of a major push the company announced Tuesday at an event in New York to make its products more attractive to school administrators, students and teachers.
The New York Times, The Upshot blog, Claire Cain Miller
from
We don’t know how quickly machines will displace people’s jobs, or how many they’ll take, but we know it’s happening — not just to factory workers but also to money managers, dermatologists and retail workers.
The logical response seems to be to educate people differently, so they’re prepared to work alongside the robots or do the jobs that machines can’t. But how to do that, and whether training can outpace automation, are open questions.
In the not-too-distant future, a “planetary” computer will be able to create a computational model of your body, with the ability to run simulations of your health and to anticipate chronic disease before you show any symptoms.
This is the direction we’re headed, according to Larry Smarr, founding director of the California Institute of Telecommunications & Information Technology at UC San Diego. While his expertise lies in computer networks and infrastructure, Smarr has emerged as a de facto leader in quantified health—largely due to his relentless curiosity about his own health. In 2011, Smarr diagnosed his own Crohn’s disease long before he showed any symptoms.
Places that used to be ad-free—from the living room to our friendships—are now becoming sites for ads or surveillance designed to make them more effective.
San Francisco, CA June 2. The Artificial Intelligence Conference is an annual event where leading AI researchers and top industry practitioners meet and collaborate. [$$$]
The Social Science Research Council’s Digital Culture program is issuing a call to our networks, fellows, and the scholarly community in general for personal stories of government data use.
Como, Italy FATREC Workshop on Responsible Recommendation at RecSys 2017 is a venue for discussing questions of social responsibility in building, maintaining, evaluating, and studying recommender systems. Workshop is August 31. Deadline for paper submissions is June 22.
“We’re continuously adding new features to our API to make it easier for developers to integrate with our data and share great local businesses through their apps. Today, we’re releasing access to query our data via GraphQL, a graph query language. This is available immediately through our developer beta program.”
A metric-based evaluation give an NLG system a score by computing how similar its output text is to “gold-standard” reference texts. There are a number of different metrics (including BLEU, METEOR, and ROUGE), which are based on different scoring functions.
I am not a great fan of metric-based evaluation, for reasons I explain below, and would be very dubious if, for example, I was asked to review a paper on NLG which only presented a metric-based evaluation. Nevertheless, I will also below give some advice on best practice for such evaluations.
Over the course of this blog post, I will first contrast transfer learning with machine learning’s most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios. I will then provide examples of applications of transfer learning before delving into practical methods that can be used to transfer knowledge. Finally, I will give an overview of related directions and provide an outlook into the future.
A resilient Data Science Platform is a necessity to every centralized data science team within a large corporation. It helps them centralize, reuse, and productionize their models at peta scale. We’ve built Algorithmia Enterprise for that purpose.