NYU Data Science newsletter – March 17, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 17, 2016

GROUP CURATION: N/A

 
Data Science News



Nestle, Medidata to use fitness trackers to study impact of activity, nutrition on adults with joint discomfort

mobihealthnews


from March 10, 2016

Nestle is conducting a two-year clinical study with data startup Medidata in China to assess the impact of physical activity and nutrition on adults with joint discomfort. The study will use Garmin fitness trackers and a mobile app.

“The research program, developed in collaboration with dedicated experts from Nestle research centers in Lausanne and Beijing, is extending Medidata and Nestle Clinical Development Unit’s well-established relationship beyond traditional clinical trials,” Nestle Head of Clinical Operations Stephane Collet said in a statement.

 

New climate science innovation center opens downtown

WLOS, Asheville NC


from March 11, 2016

Asheville may be earning the new title of Climate City with the launch of a new climate science center. U.S. Secretary of Transportation Anthony Foxx was the keynote speaker at the grand opening. [video, pre-roll + 1:46]

 

Mobile data sharing in emergencies – consent, care and control

Linnet Taylor


from March 15, 2016

The Centre for Internet and Society recently released a groundbreaking paper on the practical, legal and ethical implications of using mobile phone data (CDRs, or Call Detail Records) in emergencies, with Liberia’s experience of the recent Ebola epidemic as the case study. Written by Sean Martin McDonald, the paper is brilliant, insightful and well researched, and is creating a much-needed debate in the humanitarian and responsible data communities. This post is a contribution to that debate. It reflects on just one of the ethical issues surfaced by the analysis, that of consent to data-sharing in the context of humanitarian emergency.

 

Mathematicians are geeking out about a bizarre discovery in prime numbers

Quartz, Akshat Rathi


from March 15, 2016

… Researchers studying prime numbers at Stanford University have stumbled upon a new phenomenon. In their study, published on arXiv, they show that consecutive prime numbers try hard not to be similar. That is, they may not be as random as once thought.

 

Weber Collaborates with Columbia’s Data Science Institute

O'Dwyers PR News


from March 15, 2016

Weber Shandwick [a public relations firm] has joined the Affiliates Program at Columbia University’s Data Science Institute to collaborate on data solutions and adopt capabilities that will help the global PR agency improve how it benchmarks audience engagement with brands.

 

The Evolution of the Data Scientist

KDnuggets, Chris Pearson


from March 16, 2016

… what we can clearly see, is that there is evolution in the methods, process and technology used by a Data Scientist. Many would contest the true beginnings of statistical modelling,but fewer would argue what that evolutionary lifecycle looks like. From the early days of scratching numbers into papyrus, up to the modern day punching of numbers into a keyboard, Data Science has come a long way.The technology may have changed, the methods may also have changed, but what hasn’t changed going as far back past the industrial revolutions of the 19th and 20th centuries, or past the Renaissance, as far back as the dawn of human kind, is that we’ve always sought to leverage mathematics and statistics to improve the world around us.

 

New guidelines on #OpenAccess to Scientific Publications & Research Data in #H2020

Twitter, Digital for Science


from March 16, 2016

 

Court rulings threaten to upset defences against data breach claims

Financial Times


from March 16, 2016

Until recently plaintiffs have faced difficulties demonstrating material injury caused by data breaches, however several recent high-profile cases look likely to make it easier for consumers to bring individual suits and class actions against companies in the event of such attacks.

 

More federal open source appreciated, if it behaves

FCW


from March 14, 2016

Technology companies have welcomed the White House’s recently unveiled policy that seeks to make software code used by federal agencies more open, sharable and reusable, but they’re concerned about a few details.

Federal CIO Tony Scott announced a draft Federal Source Code policy on March 10 that would create a new set of rules for custom code developed by or for the federal government. The proposal is now open for comment on GitHub.

 

CSCW 2016: Beyond the Belmont Principles

LinkedIn, SlideShare, Jessica Vitak


from March 01, 2016

Pervasive information streams that document people and their routines have been a boon to social computing research. But the ethics of collecting and analyzing available—but potentially sensitive—online data present challenges to researchers. In response to increasing public and scholarly debate over the ethics of online data research, this paper analyzes the current state of practice among researchers using online data. Qualitative and quantitative responses from a survey of 263 online data researchers document beliefs and practices around which social computing researchers are converging, as well as areas of ongoing disagreement. The survey also reveals that these disagreements are not correlated with disciplinary, methodological, or workplace affiliations. The paper concludes by reflecting on changing ethical practices in the digital age, and discusses a set of emergent best practices for ethical social computing research.

 
Events



University of San Francisco Data Visualization Speaker Series (DVSS), Miriam Posner



Data visualization would seem to have great potential for humanities research, except for one major problem: humanities scholars don’t have data. Or they don’t think they do. Mention “data” to a scholar of literature, history, or the arts and watch how quickly they tune you out. And yet humanists work with evidence. And they speak of proving their claims. So is this just a problem of terminology? I’ll argue here that our data trouble is more substantial than that. The term “data” seems alien to the humanities not just because humanists aren’t used to computers, but because it exposes some very real differences in the way humanists and scholars from other fields conceive of the work they do.

Monday, March 21, at McLaren Conference Center Rm 250, University of San Francisco, starting at 6 p.m.

 

Data Visualization Clinic No.2



Come to this discussion-based session to learn about best practices, tips, and inspirations for creating versatile visualizations.

If you are currently working with data and have begun working through your data visualizations, bring your graphics to the workshop to showcase your skills or to seek feedback. Or feel free to bring in a favorite visualization you’ve encountered for discussion and technical analysis. Examples of visualizations include plots and charts (network graphs, scatterplots, histograms, line graphs, word clouds, etc.) and maps (choropleth, heat, topographic, thematic, point density, etc.)

The session will begin by dedicating time to understanding the scope and variety of visualizations available; examining best practices and the pitfalls of making good charts, tables, and maps; and building a framework that establishes goals for visualizations. The workshop will then continue with focused analysis of participants’ working visualizations.

Friday, March 25, at 12:00 p.m. in Bobst Library, Rm. 619

 

Advanced Git and Github



Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

This workshop takes a look at advanced usage and collaboration using Git and GitHub, including: the concept of branches, and how to manipulate them with merge and rebases, forks and pull requests, and we’ll even rewrite history using rebase, and possible workflows.

Tuesday, March 29, at 4 p.m., Bobst Library, Rm. 619

More NYU Libraries Training: Data Visualization Clinic No.2 on Friday, March 25.

 
Deadlines



Design Contest (BioVis@Vis)

deadline: subsection?

The contest is aimed at the exploration of the visual representations to communicate, or to support the communication of, risks in Bayesian reasoning tasks. Previous studies have found that the ability of people to solve Bayesian problems depends on the way the information is conveyed [2]. In this contest, a special emphasis is laid on the design of visual aids.

For this challenge, we picked two Bayesian reasoning tasks: the classic mammography problem [3] and an example from a prenatal test brochure for Down Syndrome [2]. Participants are asked to design visual solutions for both tasks. We supply a scenario to provide a context to design your visual solutions, and a text and a target question for each task.

Deadline for submissions is Friday, September 23.

 
CDS News



On learning — and loving — data science

Medium, NYU Center for Data Science


from March 16, 2016

Junbo (Jake) Zhao is a MS in Data Science student from the class of ‘16. Prior to studying with us, Jake received a Bachelor’s degree in Electronic Engineering. He also has experience in Computer Vision and Music Information Retrieval. Jake is primarily interested in Large Scale Machine Learning and Big Data.

 
Tools & Resources



Interview with a Data Scientist Tool Developer

yhat blog, Peadar Coyle


from February 18, 2016

I [Peadar Coyle] interviewed one of the core members of the pandas Python Library Masaaki Horikoshi (sinhrks). I was really happy to interview him, and glad to show that Data-science and software development are really global things. 🙂 I lightly edited his answers at his request because English is not his native language.

 

Working with Elasticsearch 

O'Reilly Media


from October 17, 2015

In this Working with Elasticsearch training course, expert author Radu Gheorghe will teach you how to search, aggregate, analyze, and scale large volume datastores. This course is designed for users that already have a familiarity with the use of a UNIX-like terminal. [video, 2:42]

 

Celebrating Figaro 4.0: What is Structured Factored Inference?

Avi Pfeffer, Practical Probabilistic Programming blog


from March 16, 2016

Figaro 4.0 has just been released, available from http://www.cra.com/figaro. The headline new feature is called “Structured Factored Inference”, or SFI. So I’d like to take this opportunity to explain what SFI is about and why we’re pursuing it.

 

Leave a Comment

Your email address will not be published.