Data Science newsletter – November 4, 2016

Newsletter features journalism, research papers, events, tools/software, and jobs for November 4, 2016

GROUP CURATION: N/A

Data Science News

Software Dreams Up New Molecules in Quest for Wonder Drugs

MIT Technology Review

from November 03, 2016

What do you get if you cross aspirin with ibuprofen? Harvard chemistry professor Alán Aspuru-Guzik isn’t sure, but he’s trained software that could give him an answer by suggesting a molecular structure that combines properties of both drugs.

Barba-group reproducibility syllabus

Medium, Hackernoon, Lorena A. Barba

from October 31, 2016

After my short piece, “A hard road to reproducibility,” appeared in Science, I received several emails and Twitter mentions asking for more specific tips — both about tools and documents we use in the group to train the team about reproducibility.

Sure, A.I. Is Powerful—But Can We Make It Accountable?

WIRED, Culture, Clive Thompson

from October 27, 2016

The opacity of machine learning isn’t just an academic problem. More and more places use the technology for everything from image recognition to medical diagnoses. All that decisionmaking is, by definition, unknowable—and that makes people uneasy. My friend Zeynep Tufekci, a sociologist, warns about “Moore’s law plus inscrutability.” Microsoft CEO Satya Nadella says we need “algorithmic accountability.”

Why Bad Genes Aren’t Always Bad News

University of Toronto, Faculty of Medicine

from November 03, 2016

We usually think of mutations as errors in our genes that will make us sick. But not all errors are bad, and some can even cancel out or suppress the fallout of those mutations known to cause disease. While little has been known about this process — called genetic suppression — that will soon change as University of Toronto researchers uncover the general rules behind it.

Teams led by Professors Brenda Andrews, Charles Boone and Frederick Roth of the Donnelly Centre and the Department of Molecular Genetics, in collaboration with Professor Chad Myers of the University of Minnesota-Twin Cities, have compiled the first comprehensive set of suppressive mutations in a cell, as reported in the latest issue of Science. The four researchers are members of the Genetic Networks program of the Canadian Institute for Advanced Research. Their findings could help explain how suppressive mutations combine with disease-causing mutations to soften the blow or even prevent disease.

PEPI Builds Bridges Between Industry and Researchers

South Big Data Hub, Hubbub! blog

from November 02, 2016

The South Big Data Hub’s Program to Empower Partnerships with Industry (PEPI) pairs early career faculty and researchers throughout the South with Industry Partners and support their travel to make collaboration possible. The program is co-sponsored by the National Science Foundation (NSF), the Computing Community Consortium (CCC), UnitedHealthCare Group and McKesson Corporation. Through PEPI, the South Hub provided funding to support data-intensive fellowships with industry for early career faculty, research scientists, and postdocs. Each award provided the recipient with up to $15,000 of travel and salary support to pay for their full-time effort for 2 – 5 weeks working at the company site.

Why a Scientist’s Big Break May Be Just Around the Corner – Researchers, have hope: your most successful paper can occur at any point in your career.

Kellogg Insight

from November 02, 2016

Conventional wisdom holds that a scientist’s best work is usually published mid-career, in the sweet spot after they have learned the ropes, but before administrative duties or thoughts of retirement encroach upon research. So is an aging academic with an underwhelming research career a lost cause?

That was a motivating question behind a recent study by Kellogg’s Dashun Wang. “Sometimes when I give talks, I say this is ‘the hope project,’” says Wang, an associate professor of management and organizations. It is hopeful because Wang and colleagues find that a scientist’s most-cited paper is equally likely to pop up at any point in her career.

The Competitive Landscape for Machine Intelligence

Harvard Business Review, Shivon Zilis and James Cham

from November 02, 2016

If this year’s landscape shows anything, it’s that the impact of machine intelligence is already here. Almost every industry is already being affected, from agriculture to transportation. Every employee can use machine intelligence to become more productive with tools that exist today. Companies have at their disposal, for the first time, the full set of building blocks to begin embedding machine intelligence in their businesses.

Smoking ’causes hundreds of DNA changes’

BBC News

from November 03, 2016

Smoking leaves an “archaeological record” of the hundreds of DNA mutations it causes, scientists have discovered.

Having sequenced thousands of tumour genomes, they found a 20-a-day smoker would rack up an average of 150 mutations in every lung cell each year.

Here’s what data science tells us about Hillary Clinton’s emails

The Washington Post, Monkey Cage blog

from November 02, 2016

At Columbia University’s History Lab, social scientists and data scientists have conducted many experiments to discover patterns and anomalies in official secrecy in large collections of declassified documents. We joined with collaborators at Fundação Getulio Vargas in Brazil, Renato Souza and Flavio Coelho, to see whether we could use data science methods to classify State Department communications.

We had two goals: First, find out whether, and to what extent, being classified as “secret” or “confidential” has historically been random or predictable. Second, learn what is normal and what might be considered negligent in how officials manage large numbers of potentially sensitive communications.

Blockchain-enabled open science framework

O'Reilly Radar

from November 03, 2016

In this article, I propose making the process of commercializing preclinical research more reproducible and transparent by basing it on a blockchain. This effort will rely on the blockchain for communication to carry out peer reviews and publicly report the results. The program will be discussed thoroughly in a later section. Let us begin by reviewing three major initiatives currently in place to enhance reproducibility.

The Big Difference Between Facebook and Twitter

Vanity Fair, The Hive blog

from November 03, 2016

The difference between Facebook and Twitter reflects the shift from social media to messaging, which is just the latest in a series of shifts in Silicon Valley. Messaging services are now eclipsing social media companies in terms of user numbers. Asian messaging services like LINE and WeChat have already proven that they can monetize—something social media companies struggled with for a long time. (Arguably, Twitter is still, ten years into its existence, struggling to monetize itself). The move to messaging, a platform that can be monetized, and away from the more-nebulous category of social media is perhaps part of a movement away from quixotic ideas about growth, and a renewed focus on business fundamentals and a path to profitability.

Events

1st Annual NYU Stern FinTech Conference

New York, NY Wednesday, November 9, at Kaufman Management Center (KMC, 44 West 4th St) [$$$]

BERC Cleanweb Hackathon: Brave New Hacks

Berkeley, CA Berkeley Institute for Data Science, Doe Library, on Friday-Saturday, November 18-19. [$]

Tools & Resources

Airbnb open sources data-science-sharing platform

Computerworld, Sharon Machlis

from November 03, 2016

Airbnb created an internal Knowledge Repo, combining git version control and Markdown templates for reporting results. Airbnb recently open-sourced its Knowledge Repository Beta, seeking contributors to help move the project forward.

Microsoft Offers Free Trials of Data Science Virtual Machine

eWeek

from November 03, 2016

Data scientists and organizations considering a cloud-based platform for their big data analytics needs can now evaluate Microsoft’s Data Science Virtual Machine for several hours without paying a dime.

“You can launch a VM instance with just a few clicks and explore it fully—no credit cards or Azure subscriptions needed,” wrote Microsoft staffers Paul Shealy, a senior software engineer, and Barnam Bora, a program manager, in a blog post. “A test drive lasts eight hours, enough time for you to try several sample solutions or analyze your own dataset.”

[1609.00037] Good Enough Practices in Scientific Computing

arXiv, Computer Science > Software Engineering; Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, Tracy K. Teal

from October 14, 2016

We present a set of computing tools and techniques that every researcher can and should adopt. These recommendations synthesize inspiration from our own work, from the experiences of the thousands of people who have taken part in Software Carpentry and Data Carpentry workshops over the past six years, and from a variety of other guides. Unlike some other guides, our recommendations are aimed specifically at people who are new to research computing.

Visualization Tools & Books

Keshif

from November 04, 2016

Lots of stuff, hard to describe.

Careers

Postdocs

Postdoctoral and Transition Program for Academic Diversity

New York University; New York, NY

Tenured and tenure track faculty positions

Assistant Professor in Digital Curation

University of Michigan, School of Information; Ann Arbor, MI

Internships and other temporary positions

Baseball Operations Internship

Boston Red Sox; Boston, MA

Sports.BradStenger.com

Data Science newsletter – November 4, 2016

Leave a Comment Cancel reply