Data Science newsletter – June 5, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 5, 2017

GROUP CURATION: N/A

 
 
Data Science News



Mary Meeker: Digital health is at “an inflection point”

MobiHealthNews, Jonah Comstock


from

… In the past the report hasn’t placed a lot of emphasis on the healthcare sector. Not so this year — Meeker headlines her 31-page healthcare section by saying healthcare is at a “digital inflection point”.

The reasons for this inflection point are manifold, according to the report. First of all, there’s more and more healthcare data being generated thanks to the rising provenance of wearables and connected health trackers. And there’s other types of data proliferating as well: more kinds of lab tests are available than ever before, and it’s easier than ever before to have one’s genome sequenced and get health insights from the results.

There’s also an increase in what Meeker calls “digitally native health-related data sets”, which means basically that more and more people are using health apps and those apps are creating new kinds of datasets. And even hospital originated data is increasing with more use of EHRs. Meeker cites IDC data showing that a typical 500-bed hospital now contains 50 petabytes of data.


Do You Regret Your College Choices?

Gallup, Brandon Busteed


from

If they had to do it over again, the majority of Americans (51%) who pursued a postsecondary education would change their degree type, institution or major.

This is one of the most significant findings in a massive new research collaboration between Gallup and Strada Education Network.


Google Sprinkles AI on Its Spreadsheets to Automate Away Some Office Work

MIT Technology Review, Tom Simonite


from

Want to turn boring numbers into a cool chart? Just ask, and Google’s algorithm will do the rest.


Allen School privacy researchers shed light on secret surveillance with SeaGlass

University of Washington, UW CSE News


from

A team of researchers in the Allen School’s Privacy & Security Research Lab have developed a new system, SeaGlass, which could bring more transparency and accountability to cell-phone surveillance. SeaGlass is capable of detecting anomalies in the cellular network that may indicate the presence of surveillance devices called IMSI-catchers (also known as cell-site simulators or Stingrays), which track individuals through their International Mobile Subscriber Identity by posing as a legitimate cell tower.


Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation

Sociology; Matthew L Williams, Pete Burnap, Luke Sloan


from

New and emerging forms of data, including posts harvested from social media sites such as Twitter, have become part of the sociologist’s data diet. In particular, some researchers see an advantage in the perceived ‘public’ nature of Twitter posts, representing them in publications without seeking informed consent. While such practice may not be at odds with Twitter’s terms of service, we argue there is a need to interpret these through the lens of social science research methods that imply a more reflexive ethical approach than provided in ‘legal’ accounts of the permissible use of these data in research publications. To challenge some existing practice in Twitter-based research, this article brings to the fore: (1) views of Twitter users through analysis of online survey data; (2) the effect of context collapse and online disinhibition on the behaviours of users; and (3) the publication of identifiable sensitive classifications derived from algorithms.


Biodiversity Is More Than Just Counting Species

Scientific American, Rachel Cernansky


from

Ecologists are increasingly looking at how richness of traits — rather than number of species — helps set the health of ecosystems


You Look Familiar. Now Scientists Know Why.

The New York Times, Nicholas Wade


from

The brain has an amazing capacity for recognizing faces. It can identify a face in a few thousandths of a second, form a first impression of its owner and retain the memory for decades.

Central to these abilities is a longstanding puzzle: how the image of a face is encoded by the brain. Two Caltech biologists, Le Chang and Doris Y. Tsao, reported in Thursday’s issue of Cell that they have deciphered the code of how faces are recognized.

Their experiments were based on electrical recordings from face cells, the name given to neurons that respond with a burst of electric signals when an image of a face is presented to the retina.


G.M. Wants to Drive the Future of Cars That Drive Themselves

The New York Times, Bill Vlasic


from

The chief executive of General Motors, an automaker synonymous with Detroit, saw the future of driving not in the Motor City but on the streets of San Francisco.

Mary T. Barra, a G.M. lifer who had worked her way from engineer to the top, was in the back seat of a prototype self-driving electric car as it wound its way through the city’s downtown a year ago.

She wanted to see for herself whether automation was ready to take over from a driver — safely, and on a mass scale. How would it react, for example, when it reached an intersection as a light turned yellow?


Sema4, a Mount Sinai spinout, launches with a focus on genomics

MedCity News, Erin Dietsche


from

The for-profit startup has been created from numerous parts of Mount Sinai’s Department of Genetics and Genomic Sciences and the Icahn Institute for Genomics and Multiscale Biology.

Pronounced “semaphore,” the company will utilize genomic and clinical data to transform overall clinical diagnostics. By combining everything from predictive modeling to open access data, it aims to be able to better treat and diagnose diseases.


Memorial Sloan Kettering and Philips want to upgrade precision oncology

MedCity News, Erin Dietsche


from

Memorial Sloan Kettering Cancer Center and Amsterdam-based Philips have launched a research collaboration specifically aimed at cancer.

The organizations will focus on using precision diagnostics to dig deeper into what causes cancer at the single-cell level. By taking such a sharp focus, MSK and Philips will work to uncover the best ways to resolve the differences both in and between tumors.


Opportunity focus: Integrating NOAA’s $553M R&D computer system

Bloomberg Government, Katrice Eborn


from

The National Oceanic and Atmospheric Administration (NOAA) on May 19 announced plans and a tentative timeline to solicit vendors to integrate its computing system at data centers nationwide.

The agency plans to release a draft solicitation around June 15 for the $553 million, 10-year single-award indefinite-delivery-indefinite-quantity (IDIQ) follow-on contract. NOAA expects to issue a final request for proposals on Oct. 18, with site visits and an industry day organized around those dates


U.S. Army intel training worth $785 million

Bloomberg Government, Katrice Eborn & Kevin Brancato


from

The U.S. Army Intelligence Center of Excellence (ICoE), located at Fort Huachuca, Arizona, is seeking a contractor to provide education and training support for intelligence-related military specialties.

The draft solicitation for the Huachuca Training and Support Contract (HTASC), released May 22, gives an overview of the $785 million, seven-year single award, indefinite-delivery/indefinite-quantity (IDIQ) contract, set to begin June 2018. A final solicitation is expected in September 2017, with an award in the third quarter of fiscal 2018.


Open Access to Academic Research

Communications of the ACM, Vint Cerf, Cerf's Up blog


from

I recently spent a half-day with a group of government agency, publishing industry, and academic institutional representatives to discuss key considerations leading to improved access to academic research results, associated data, and analytic software. The sponsoring organization is the Open Scholarship Initiative (osinitiative.org) and its primary goal is accessibility of research and scholarly output. This was not necessarily about free access as much as making things easily discoverable and accessible.

The world of scientific and academic publishing has grown over time and this is not surprising. The number of narrowly focused academic publications is increasing, partly driven by business models of profit and non-profit institutions and partly by the “publish or perish” dynamic in the academic world. Library budgets have not kept up with the cost of increasing numbers of publications. Changing this dynamic may necessitate revising the metrics of value: favoring quality over quantity. Academic tenure decisions often seem to turn on quantity and perhaps that must change.


Big Data

Communications of the ACM


from

For our fourth and final Panel in Print, we invited 2014 ACM A.M. Turing Award recipient MICHAEL STONEBRAKER, 2013 ACM Prize recipient DAVID BLEI, 2007 ACM Prize recipient DAPHNE KOLLER, and ACM Fellow VIPIN KUMAR to discuss trends in big data.

 
Events



Announcing the agenda for TechCrunch Sessions: Robotics

TechCrunch


from

Cambridge, MA An event focused solely on robotics, July 17 at MIT’s Kresge Auditorium. [$$$]


Deep Reinforcement Learning Bootcamp

OpenAI, University of California-Berkeley


from

Berkeley, CA This two-day long bootcamp will teach you the foundations of Deep RL. Deadline for applications is June 16.


Cornell Tech@Bloomberg: The Rise of Angel Investing

Cornell Tech, Bloomberg


from

New York, NY Featuring Brian S. Cohen, Chairman of New York Angels. Tuesday, January 20, starting at 5:30 p.m., Bloomberg L.P. (731 Lexington Ave). [free, registration required]


COLT 2017

Association for Computational Learning


from

Amsterdam, The Netherlands July 6-10. Conference is preceded by 1-day tribute to Manfred Warmuth for his 60th birthday. [$$$]


Anticipation and Timing: Where the Neuroscience of Cognition and Aesthetics Meet

Chelsea Music Festival


from

New York, NY June 16. Chelsea Music Festival, Festival Talks III, at St. Paul’s German Lutheran Church (315 W 22nd St). Lecture by Michael Shadlen starts 6:30 p.m. Musical performance at 7:30 p.m. [$$]

 
Deadlines



Multilayer and Interconnected Networks: Applications

Cancun, Mexico The scope of the satellite meeting is to review the recent advances in the field of multilayer and interconnected networks, focusing in particular on the interplay between structure and dynamics. Deadline for submissions is June 10.

NIPS 2017 Call for Tutorials

Long Beach, CA Tutorials on Monday,December 4. Deadline for Tutorial Proposal Submissions is June 15.

ICML 2017 workshop: Video Games and Machine Learning

Sydney, Australia “This workshop focuses on complex games which provide interesting and hard challenges for machine learning. Going beyond simple toy problems of the past, and games which can easily be solved with search, we focus on games where learning is likely to be necessary to play well.” Deadline for submissions is June 16.

TPOT Automated Machine Learning Competition

“We’re going to challenge you to apply TPOT [an AutoML tool that optimizes machine learning pipelines] to any data science problem you find interesting on Kaggle. If your entry ranks in the top 25% of the leaderboard on a Kaggle problem, we want to see how TPOT helped you accomplish that.”
 
NYU Center for Data Science News



Emergent Language in a Multi-Modal, Multi-Step Referential Game

arXiv, Computer Science > Learning; Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho


from

Inspired by previous work on emergent language in referential games, we propose a novel multi-modal, multi-step referential game, where the sender and receiver have access to distinct modalities of an object, and their information exchange is bidirectional and of arbitrary duration. The multi-modal multi-step setting allows agents to develop an internal language significantly closer to natural language, in that they share a single set of messages, and that the length of the conversation may vary according to the difficulty of the task. We examine these properties empirically using a dataset consisting of images and textual descriptions of mammals, where the agents are tasked with identifying the correct object. Our experiments indicate that a robust and efficient communication protocol emerges, where gradual information exchange informs better predictions and higher communication bandwidth improves generalization.

 
Tools & Resources



Kubernetes is a big deal, but an overlooked one

Medium, ArchiTECHt, Derrick Harris


from

“Containers will matter a lot to how companies build, and buy, applications and infrastructure. Kubernetes, in particular, is a project that has the backing and the vision to become a platform for many new applications over the next several years. And while it’s challenging to keep up with everything that’s happening in the container/microservices space, I think it’s a good idea to try, at least enough to get a sense of where things are going.”


Exploring and visualising reef life survey data

Yanir Seroussi


from

Last year, I wrote about the Reef Life Survey (RLS) project and my experience with offline data collection on the Great Barrier Reef. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website. This post describes the tools and their implementation, and outlines possible directions for future work.

 
Careers


Postdocs

Postdoctoral fellowship in Movement & Spatial Ecology



Smithsonian Conservation Biology Institute; Front Royal, VA
Full-time, non-tenured academic positions

Job Opening: Associate Director at Princeton CITP



Princeton University, Center for Information Technology Policy; Princeton, NJ

Leave a Comment

Your email address will not be published.