Data Science newsletter – January 20, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for January 20, 2017

GROUP CURATION: N/A

Data Science News

Tweet of the Week

Twitter, Barry Ritholtz

from January 19, 2017

Twitter, Mark Di Stefano

from January 18, 2017

Cornell faculty receive $1 million grant from NSF Cybersecurity Innovation for Cyberinfrastructure | Cornell Computing and Information Science

Cornell CIS

from January 19, 2017

CIS faculty members Deborah Estrin (CS/Cornell Tech), Nate Foster (CS), Fred Schneider (CS) as well as Arnaud Sahuguet (Cornell Tech) and David Shmoys (ORIE) have received a $1 million grant from the National Science Foundation’s Cybersecurity Innovation for Cyberinfrastructure (CICI) program. Their project will develop a software framework to support the implementation of data-driven planning applications where individuals will have fine-grained control over use of their data.

Engineers eat away at Ms. Pac-Man score with artificial player

Cornell Chronicle

from January 17, 2017

Using a novel approach for computing real-time game strategy, engineers have developed an artificial Ms. Pac-Man player that chomps the existing high score for computerized play. … The record score at the annual Ms. Pac-Man Screen Capture Competition stands at 36,280, but a trio of researchers led by Silvia Ferrari, professor of mechanical and aerospace engineering at Cornell, has produced a laboratory score of 43,720.

Explainable AI in Frogger

YouTube, Entertainment Intelligence

from January 18, 2017

This video shows a reinforcement learning agent playing Frogger. The agent was trained to translate its internal state representation into natural language.

The Health Informatics Revolution

Georgia Tech, Research Horizons magazine, John Toon

from January 18, 2017

Using massive data sets, machine learning, and high-performance computing, health analytics and informatics is drawing us closer to the holy grail of health care: precision medicine, which promises diagnosis and treatment tailored to individual patients. The information, including findings from the latest peer-reviewed studies, will arrive on the desktops and mobile devices of clinicians in health care facilities large and small through a new generation of decision-support systems.

“There are massive implications over the coming decade for how informatics will change the way care is delivered, and probably more so for how care is experienced by patients,” said Jon Duke, M.D., director of Georgia Tech’s Center for Health Analytics and Informatics.

How the FDA Will Help Lead the Next Medical Revolution

WIRED, Science, Robert M. Califf

from January 18, 2017

Regulatory oversight is sometimes seen as an impediment to innovation in health care. It’s true, of course, that the Food and Drug Administration is entrusted with making sure new tests and treatments meet a high standard for safety and efficacy. But the agency is acutely aware that saving lives also means getting genuine medical advances to market as fast as possible.

After decades of experience as a physician and researcher and nearly two years at the FDA, I’m convinced that smart regulation can actually foster innovation. And that’s especially true today with the advent of precision medicine, an exciting new approach to health care that aims to tailor care for each individual, based on their unique genetic makeup, environment, and history.

SilverCloud raises $8.1M for digital behavioral health platform

MobiHealthNews, Heather Mack

from January 18, 2017

Digital behavioral health company SilverCloud, which makes an online platform of mental health and wellbeing programs for health systems and healthcare organizations, has raised $8.1 million in a round led by B Capital Group, the venture capital firm founded by Facebook co-founder Eduardo Saverin. ACT Ventures, Investec Ventures and AIB Seed Capital fund also contributed.

The SilverCloud platform offers 22 programs for conditions including anxiety, depression, stress, the emotional impact of chronic illnesses and eating disorders. The modular programs can be accessed via laptop, tablet or smartphone, and users are encouraged to complete a 40-minute session once per week over an eight to 10 week period.

‘Running out of time’: 60 percent of primates sliding toward extinction‘Running out of time’: 60 percent of primates sliding toward extinction

Mongabay, John C. Cannon

from January 19, 2017

The assessment of 504 primate species found that 60 percent are on track toward extinction, and the numbers of 75 percent are going down.

MIT launches Play Labs to help AR, VR, and AI startups get off the ground

VentureBeat, Dean Takahashi

from January 18, 2017

Bayview Labs, Seraph Group, and MIT Game Lab are creating Play Labs to accelerate startups in augmented reality, virtual reality, and artificial intelligence.

The accelerator is targeted at students and alumni of the Massachusetts Institute of Technology. The accelerator will offer funding, facilities, and mentorship resources for selected startups. Applications open now for the official program launch in summer 2017.

The GovLab wants to build a Wikipedia of open data

Technical.ly Brooklyn, Tyler Woods

from January 18, 2017

What if data could be collected and searched online in the same way that Wikipedia is? How much benefit could that have for research and problem solving?

That’s the idea behind a new project by NYU Tandon’s GovLab, DataCollaboratives.org.

Sergey Brin: The Future of AI and Google

YouTube, Singularity Lectures

from January 19, 2017

From Davos 2017

Evolving network structure of academic institutions

Applied Network Science; Shufan Wang, Mariam Avagyan and Per Sebastian Skardal

from January 19, 2017

Today’s colleges and universities consist of highly complex structures that dictate interactions between the administration, faculty, and student body. These structures can play a role in dictating the efficiency of policy enacted by the administration and determine the effect that curriculum changes in one department have on other departments. Despite the fact that the features of these complex structures have a strong impact on the institutions, they remain by-and-large unknown in many cases. In this paper we study the academic structure of our home institution of Trinity College in Hartford, CT using the major and minor patterns between graduating students to build a temporal multiplex network describing the interactions between different departments. Using recent network science techniques developed for such temporal networks we identify the evolving community structures that organize departments’ interactions, as well as quantify the interdisciplinary centrality of each department. We implement this framework for Trinity College, finding practical insights and applications, but also present it as a general framework for colleges and universities to better understand their own structural makeup in order to better inform academic and administrative policy. [full text]

Diversity in the R Community

Microsoft, Revolution Analytics, Revolutions blog

from January 18, 2017

In the follow-up to the useR! conference in Stanford last year, the Women in R Task force took the opportunity to survey the 900-or-so participants about their backgrounds, experiences and interests. With 455 responses, the recently-published results provide an interesting snapshot about the R community (or at least that subset able to travel to the US and who were able to register before the conference sold out). Among the findings (there are summaries; check the report for the detailed breakdowns):

33% of attendees identified as women

26% of attendees identified as other than White or Caucasian

5% of attendees identified as LGBTQ

The Machines Are Coming For Poker

FiveThirtyEight, Oliver Roeder

from January 19, 2017

The sound of shuffling stacks and splashing piles of colorful chips filled the cavernous poker room of the Rivers Casino. It was noon on a Wednesday, and on one side of the hall, dozens of men at a handful of tables were peering at their cards, placing bets and taking one another’s money. On the other side sat two players, roped off from the rest. They were carefully deploying sophisticated poker strategies and tactics, drawing on the sum of human knowledge about the game. Yet they held no cards and stacked no chips. Their faces were lit by the blue glow of computer screens, and their opponent, an artificial intelligence program running on a brand new Hewlett Packard supercomputer, sat unblinking in a suburb 15 miles away. The two poker pros were playing for more than money. Pride, and the future of poker, was on the line.

Those two players, Jason Les and Daniel McAulay, are part of a four-person team, along with Jimmy Chuo and Dong Kim, taking on Libratus, a poker superprogram. The pros and the program are both experts in a type of poker called heads-up no-limit Texas Hold ’em. The game is one-on-one, and each player is dealt two private cards and uses up to five shared, public cards to make the best hand.

Replication studies offer much more than technical details

Nature News & Comment, Nature Editorial

from January 18, 2017

Purists will tell you that science is about what scientists don’t know, which is true but not much of a basis on which to develop new cancer drugs. Hence the importance of knowledge: how crucial this mutation or that cell-surface receptor really is to cancer growth. These are the findings that launch companies and clinical trials — provided, of course, that they have been published in research papers in peer-reviewed journals.

As we report in a News story this week, a systematic effort to check some of these findings by repeating an initial five published cancer studies has reported that none could be completely reproduced. The significance of this divergence — how the specific experiments were selected and what the results mean for the broader agenda of reproducibility in research — is already hotly contested.

Journals Seek Out Preprints

The Scientist Magazine®, Tracy Vance

from January 18, 2017

It’s no different than approaching a scientist who has just given a riveting talk on unpublished work at a conference, according to Christopher “Casey” Brown, a geneticist at the University of Pennsylvania. Because part of an editor’s job is to pursue the latest and greatest research for potential publication (whether presented at a meeting or submitted online), those at plugged-in journals are prospecting preprint servers, seeking standouts among the scads of non-peer–reviewed manuscripts posted for perusal by all.

Brown is one of three “preprint editors” recently recruited by PLOS Genetics to solicit manuscripts posted to arXiv, its life sciences counterpart bioRxiv, and other servers, for peer review and potential publication. “Editors are paying very close attention—at a whole number of journals—to the preprint atmosphere,” he said. “The role of editors is changing a bit, there is more interest in this sort of thing.”

Scientists Buck FASEB Opposition to Preprints in NIH Grants

The Scientist Magazine®, Joshua A. Krisch

from January 19, 2017

As ASAPbio and individual life scientists respond to FASEB’s statement against the inclusion of preprints in NIH grant applications, more than 600 researchers sign a petition in support of citing these non peer-reviewed works in such proposals.

Events

Frontiers of Science and Engineering at UW

Seattle, WA Please join us for a 2-hour symposium highlighting the accomplishments of some of the University of Washington’s outstanding women scientists and engineers. Friday January 20 starting at 2:30 p.m. [free]

Cities and Immigration in the Age of Trump

New York, NY Thursday Jan 26,
8:45 a.m. to 11:15 a.m. at Greenberg Lounge, Vanderbilt Hall (40 Washington Square South) [free, registration required]

3rd International Workshop on AI, Ethics and Society

San Francisco, CA Held within the 31st AAAI Conference on Artificial Intelligence on February 4. [$$$]

RSA Conference 2017

San Francisco, CA February 13-17 [$$$$]

Women in Machine Intelligence Dinner

London, England February 21 [$$$]

Deadlines

Summer@ICERM 2017: Topological Data Analysis

The Brown University program will give undergraduates an opportunity for exposure and research in the methods of “Applied Topology” in the study of complex data sets. Students must apply by February 17.

Call for papers – 2nd Workshop on Representation Learning for NLP

Vancouver, BC, Canada Workshop is August 3 or 4. Deadline for submission is April 21.

Tools & Resources

Rstudio Conference: Tips And Tricks – Hooked on Data

Emily Robinson

from January 18, 2017

This is the second part of my posts on the rstudio::conf. If you’re interested in more general thoughts on the conference and some personal notes, check out my other post. This post is to gather, as succintly and organized as possible, the practical and technical things I learned at the conference. While I did a whole training day on writing R Packages, I haven’t included most of what I’ve learned here. Instead, I’ll be integrating it into a future post on writing my first R package.

Putting the science back in data science

O'Reilly Radar, Daniel Whitenack

from January 19, 2017

The problem of reproducibility is one that data science teams within an organization will have to tackle at some point. However, there is good news! With a little bit of discipline and the right tooling, data science teams can achieve reproducibility. This post will discuss the value of reproducibility and will provide some practical steps toward achieving it.

What are the challenges of building a data team at a startup?

Quora, Monica Rogati

from January 09, 2017

“A typical bad scenario: Founders know they need a “data play” (or worse, “AI play”); investors and clients keep asking about machine learning (or worse, deep learning). Founder hires a machine learning scientist, often a fresh graduate, and tells them they will build machine learning models. Scientist gets there — there’s no infrastructure, no ETL, data is a mess because nobody tried to do anything with it yet, instrumentation is non-existent except for ops, everything takes forever, engineers are working on their sprints and their patience is wearing thin. Everybody is frustrated” … “This won’t happen to your startup, though, because you thought through the questions above ahead of time.”

GIS as a Research Communication Tool

Princeton Center for Information Technology Policy, Freedom to Tinker blog, Yusuf Dahl

from January 17, 2017

The power of geospatial analysis lies in the new ways it provides to look at datasets and the relations among them. It allows you to explore more nuanced questions and discover correlations previously hidden. Used properly, geographic information system (GIS) tools can increase the saliency of a policy issue by expressing your argument visually and often much more effectively. Below is my recent experience in using GIS tools to broaden the audience for my research.

Careers

Full-time, non-tenured academic positions

Research Scientist

University of Minnesota, Minnesota Population Center; Minneapolis, MN

Senior Data Analyst

University of Minnesota, Minnesota Population Center; Minneapolis, MN

Data Analyst

University of Minnesota, Minnesota Population Center; Minneapolis, MN

Full-time positions outside academia

Data Visualization Advisor

USAID; Arlington, VA

Sports.BradStenger.com

Data Science newsletter – January 20, 2017

Leave a Comment Cancel reply