Data Science newsletter – November 30, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for November 30, 2019

GROUP CURATION: N/A

 
 
Data Science News



Why Google’s Move into Patient Information Is a Big Deal

Harvard Business Review, David Blumenthal


from

Setting the legalities aside for a moment, here are the fundamentals underlying the Ascension-Google relationship: Ascension sits on troves of data accumulated in the course of caring for millions of patients who pass through its facilities. That data used to be locked away in paper records that had to be physically transported and laboriously abstracted to serve any purpose other than the care of an individual patient at a particular place and time. As a result of the near-universal adoption of electronic health records over the last decade, all that information is now stored as electrons that can flow instantly to wherever it’s needed and useful, provided that patients’ privacy is protected.

This has several immediate benefits for patients. One is that their personal histories are always accessible when they get care at Ascension (and possibly elsewhere). Another is that Ascension’s doctors and nurses can potentially learn from the experience of all Ascension’s patients with similar conditions as they care for any individual patient. And by applying search technologies and artificial intelligence, Ascension may also be able in real time to mobilize lessons of the entire scientific literature to bring to bear on individual patients. That literature is so enormous that even the most experienced, specialized clinicians have difficulty keeping up with it. Ascension’s experience may also inform medical research more broadly.

The challenge is that accomplishing these innovative uses of electronic data requires a range of informatics, analytics, and research skills that most health systems don’t possesses. One logical approach is for health care organizations like Ascension to partner with third parties that have the necessary capabilities. That’s where Google comes in. It has IT skills — including in the field of artificial intelligence — that Ascension can never hope to equal. And Google has been gobbling up nationally renowned clinician leaders and researchers to create a deep bench in health-care informatics and research.


Malevolent Machine Learning

Communications of the ACM, Chris Edwards


from

The key issue that has troubled deep learning researchers is why deep learning models seem to be fooled by what appears to humans like noise. Although experiments by James DiCarlo, a professor in neuroscience working at the Massachusetts Institute of Technology (MIT), and others have showed similarities between the gross structure of the visual cortexes of primates and DNNs, it has become clear the machine learning models make decisions based on information the brain either does not perceive, or simply ignores.


Joint statement on EPA proposed rule and public availability of data (2019)

Science, Holden Thorp et al.


from

As leaders of peer-reviewed journals, we support open sharing of research data, but we also recognize the validity of scientific studies that, for confidentiality reasons, cannot indiscriminately share absolutely all data. Datasets featuring personal identifiers—including studies evaluating genomes of thousands of people to characterize medically relevant genetic variants—are but one example. Such data may be critical to developing new drugs or diagnostic tools but cannot be shared openly; even anonymized personal data can be subject to re-identification, and it has been a longstanding practice for agencies and journals to acknowledge the value of data privacy adjustments. The principles of careful data management, as they inform medicine, are just as applicable to data regarding environmental influences on public health. Discounting evidence from the decision-making process on the basis that some data are confidential runs counter to the EPA stated mission “to reduce environmental risks…based on the best available scientific information” (4).


AstraZeneca, Novoheart to Co-Develop New Heart Model

Xconomy, Melissa Fassbender


from

AstraZeneca has turned to Novoheart’s “human heart-in-a-jar” technology to create a preclinical model that mimics a form of heart failure for which there is no effective therapy available.


Five Questions With a16z’s Vijay Pande on AI and Making New Drugs

Xconomy, Sarah de Crescenzo


from

There [are] 200, 250 companies now in this AI/drug design space. Given the prevalence of tools like [open source ML framework] TensorFlow, algorithms in the public domain, and public data, it doesn’t take much to build something just with those off-the-shelf pieces that looks pretty good, especially compared to what people could do before. All of those companies, if they’re using basically the same algorithms, the same tools, and the same data, they’re going to get the same answers as each other. So differentiation is really going to be key …


NYC seeks Algorithm Officer to oversee automation

Smart Cities World, Sarah Wray


from

New York Mayor Bill de Blasio has signed an Executive Order to establish an Algorithms Management and Policy Officer position within the Mayor’s Office of Operations.

The officer will lead on algorithm policy and develop guidelines and best practices to assist city agencies in their use of algorithms to make decisions.


GE Launches Program to Ease Artificial Intelligence Adoption

HealthIT Analytics, Jessica Kent


from

GE Healthcare has launched the Edison Developer Program to accelerate artificial intelligence adoption and impact throughout the healthcare industry.

The program is based on Edison, GE Healthcare’s secure intelligence platform. The program will allow providers to gain easy access to AI solutions by directly integrating these tools into clinical workflows.


We need to talk more about the legal issues around training musical AIs

musically, Stuart Dredge


from

When most people think about the potential legal issues around AI-created music, they tend to think about the output – the music itself, and questions like whether an AI-generated track can attract copyright protection. Sophie Goossens, counsel at law firm Reed Smith, thinks that just as much attention should be paid to the input.

“AI is not born in a vacuum, and AI systems do not appear out of the blue. In most cases, their ability to create is directly proportional to the amount of information they are able to absorb and learn from. This body of information is known as the ‘the training set’, and it’s something we should be asking many more questions about,” she says.


2019 Most Clicks: Waitlists and frustrations grow as UNC’s computer science department tries to keep up

The Daily Tar Heel student newspaper, Hannah McClellan and Cynthia Dong


from

UNC computer science professor Kris Jordan told The Daily Tar Heel that COMP 401 had 139 students on the Google form waitlist as of Sunday. Two other courses in the introductory sequence — COMP 410 and 411 — had Google form waitlists of 112 and 144 people respectively.

“Those numbers are pretty atrocious,” Jordan said.

They are also not an anomaly — they reflect the computer science department’s rapid growth over the last ten years.


Partnership aims to accelerate cell and gene therapy

Harvard Gazette


from

An innovative new major private-public partnership led by Harvard and MIT aims to accelerate one of the hottest and most promising areas of medical research: cell and gene therapy.

“I just think it’s a huge opportunity,” said Terry McGuire, a founder of the venture capital firm Polaris Partners and a member of Massachusetts Life Sciences Strategies Group, a panel of academic, government, health-care, and industry officials. “Clearly, [it] is the next great frontier, and it’s going to be hugely important, both from a research perspective and from a clinical perspective.”

Announced Monday, the project, which will bring the universities together with leading local hospitals, major corporations and state officials, will create a new, as-yet-unnamed center for advanced biological innovation and manufacturing and remove a big impediment to research. The new facility, expected to open by the end of 2021, seeks to ensure that Massachusetts will remain a leading region globally for life sciences.


Amazon to double seasonal hiring to record 200k workers as business continues to boom

GeekWire, Todd Bishop


from

Amazon plans to hire 200,000 seasonal workers this year, twice as many as last year, suggesting that it expects a strong holiday shopping season.

Much of the hiring will take place in fulfillment centers, but the increase also reflects the steady expansion of Amazon’s business into other areas that require seasonal workers.


The GovLab and UNICEF Launch Responsible Data for Children (RD4C) Initiative

NYU GovLab, Andrew J. Zahuranec


from

The GovLab and UNICEF launched the Responsible Data for Children initiative (RD4C) at UNICEF House. Through a brief presentation and panel discussion, UNICEF’s Robert MacTavish and The GovLab’s Andrew Young described the newly released RD4C website, Selected Readings, Synthesis Report, and RD4C Principles, as well as their hopes for how these and future outputs could strengthen data responsibility.

RD4C intends to support best practice in data responsibility; identify challenges and develop practical tools to assist practitioners in evaluating and addressing them; and encourage a broader discussion on actionable principles, insights, and approaches for responsible data management. The project’s three-part research methodology involved an extensive desk review of relevant literature, key informant interviews with UNICEF personnel and others working in the space, and three field observation visits to UNICEF country offices in Romania, Kenya, and Afghanistan.


New center illuminates ‘microscopic universe within each cell’

University of California-Berkeley, Berkeley News


from

In a video posted online this week, Seeker, a San Francisco-based digital media network focused on science and technology, profiled UC Berkeley’s newest cutting-edge facility, the Advanced Bioimaging Center.

Run by Srigokul “Gokul” Upadhyayula, a newly arrived assistant professor-in-residence of molecular and cell biology, the center is building imaging systems that will provide real-time video of living cells for biologists who want to understand “how life works,” Upadhyayula told Seeker. He still is awestruck at the “microscopic universe inside each cell,” he said


Cardiovascular Disease Prevention at a Crossroads:Precision Medicine or Polypill?

JAMA Network, Michael J. Joyner and Nigel Paneth


from

The remarkable decline in mortality from cardiovascular disease (CVD) in the United States since the 1960s has been attributed to a combination of primary prevention measures (smoking cessation, healthful diet, increased exercise), management of elevated blood pressure and lipids, and treatment of CVD when it occurs (revascularization, heart failure drugs). But since 2016, CVD death rates have leveled off and may even be rising slightly, prompting consideration of the next phase of how to address CVD. Currently, prevention options are to advocate broadly for primary prevention and to test for conventional risk factors and assign an estimated risk as the basis for treatment (secondary prevention).


rOpenSci Announces a New Award From The Gordon and Betty Moore Foundation to Improve the Scientific Package Ecosystem for R

rOpenSci, Karthik Ram


from

“We are pleased to announce that we have received new funding from the Gordon and Betty Moore Foundation. The $894k grant will help us improve infrastructure for R packages and enable us to move towards a science first package ecosystem for the R community.”

 
Events



Data Science at ESPN: A regression-based metric for estimating NBA player contribution

Ramapo College, Data Science Program


from

Mahwah, NJ December 5, starting at 4 p.m. Speaker: Brian Macdonald, Director of Sports Analytics at ESPN [free]


Summer Institute in Network Science and its Applications

Northeastern University Network Science Institute, Indiana University Network Science Institute


from

Boston, MA June 22-July 3 at Northeastern University. “Course offerings are loosely divided into two parallel tracks cover fundamental network science topics (track 1), such as network geometry and dynamics, and special network science applications (track 2), such as biological and social networks.” [save the date]


Introducing Customize, the Food Personalization Summit

The Spoon


from

New York, NY February 27. “A one-day executive summit that will explore the impact of personalization across the food system.” [$$$]

 
Deadlines



The Carpentries Long-Term Impact Survey is Open

“We are officially opening our Long-Term Impact Survey and would like anyone who has taken a Data, Software, or Library Carpentry workshop between April 2018 and December 2019 to share their story by completing the survey.” Deadline to participate is December 20.
 
Tools & Resources



Coding habits for data scientists

ThoughtWorks, David Tan


from

Typically, code to train ML models is written in Jupyter notebooks and it’s full of (i) side effects (e.g. print statements, pretty-printed dataframes, data visualisations) and (ii) glue code without any abstraction, modularisation and automated tests. While this may be fine for notebooks targeted at teaching people about the machine learning process, in real projects it’s a recipe for unmaintainable mess. The lack of good coding habits makes code hard to understand and consequently, modifying code becomes painful and error-prone. This makes it increasingly difficult for data scientists and developers to evolve their ML solutions.

In this article, we’ll share techniques for identifying bad habits that add to complexity in code as well as habits that can help us partition complexity.


Local-first software: you own your data, in spite of the cloud

Adrian Colyer, the morning paper blog


from

“Local-first software: you own your data, in spite of the cloud” Kleppmann et al., Onward! ’19

Watch out! If you start reading this paper you could be lost for hours following all the interesting links and ideas, and end up even more dissatisfied than you already are with the state of software today. You might also be inspired to help work towards a better future. I’m all in :).


How to annotate literally everything

Dmitrii Gerasimov


from

TLDR: when I read I try to read actively, which for me mainly involves using various tools to annotate content: highlight and leave notes as I read. I’ve programmed data providers that parse them and provide nice interface to interact with this data from other tools. My automated scripts use them to render these annotations in human readable and searchable plaintext and generate TODOs/spaced repetition items.

In this post I’m gonna elaborate on all of that and give some motivation, review of these tools (mainly with the focus on open source thus extendable software) and my vision on how they could work in an ideal world.


Do We Really Need a Web API?

Manuel Vila


from

With Liaison, frontend and backend can communicate directly with the language they are implemented in. The frontend can call a method in the backend without further ceremony. It is just a regular JavaScript method invocation.

The way it works is simple: a frontend class can “inherit” from a backend class. So, when a method is called, if this method is missing in the frontend, the corresponding method is executed in the backend, automatically.

 
Careers


Full-time positions outside academia

Associate Business Development



DataKind; Brooklyn, NY

Matplotlib Research Software Engineering Fellow



NumFOCUS; Austin, TX

Deputy Director of Instructor Training



The Carpentries; Oakland, CA
Postdocs

Post Doctoral Fellow – Harden



University of Texas, Population Research Center; Austin, TX

Leave a Comment

Your email address will not be published.