NYU Data Science newsletter – May 10, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 10, 2016

GROUP CURATION: N/A

Data Science News

High failure rates spur universities to overhaul math class

WHYY, Newsworks

from May 05, 2016

… CUNY Start is a program for students entering one of CUNY’s seven community colleges (and two other CUNY colleges that offer associates degrees) who need to raise their scores on math, reading or writing placement exams in order to take for-credit classes in those areas.

But getting students to engage, in this way, is tricky. They’re accustomed to emphasis being placed on answers. So how do you get students onboard?

“Poker face,” Winkler says.

The instructor must present a poker face, Winkler explains, because the goal here isn’t necessarily to get the right answer out of students—it’s to get them thinking and articulating reasons. If Fein hesitates before pivoting back to the board, a student might feel the hint to quickly change an answer.

Open Science Prize announces six team finalists in first phase of competition

National Institutes of Health

from May 09, 2016

Six teams have been selected to advance their product ideas into prototypes to compete for $230,000 in the Open Science Prize (link is external), a global science competition to make both the outputs from science and the research process broadly accessible to the public. The finalists, announced at the 7th Health Datapalooza Conference in Washington, D.C., were selected out of 96 multinational, interdisciplinary teams representing 450 innovators from 45 countries. These are the first finalists for this recently launched global prize competition, a collaboration between the National Institutes of Health and the U.K.-based Wellcome Trust with additional funding provided by the Howard Hughes Medical Institute of Chevy Chase, Maryland.

[1605.02019] Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

arXiv, Computer Science > Computation and Language; Christopher E Moody

from May 06, 2016

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous dense document representations, this formulation produces sparse, interpretable document mixtures through a non-negative simplex constraint. Our method is simple to incorporate into existing automatic differentiation frameworks and allows for unsupervised document representations geared for use by scientists while simultaneously learning word vectors and the linear relationships between them.

Banking’s New Normal

The New Yorker, James Surowieki

from May 16, 2016

… Dodd-Frank’s success is important in its own right. But it also teaches us an important lesson about regulation more generally. For decades, the debate over regulation in the U.S. has been dominated by those who believe that, in the words of the Chicago School economist Eugene Fama, “even the best-constructed regulation is bound to fail.” As Fama put it a couple of years ago, “Eventually, the regulators get captured by the people they regulate.” Regulatory capture is always a danger. But the history of financial reform after the crisis shows that it’s not inevitable: if you have well-designed rules, and if regulators have the resources and the public support to enforce them, industry does not always win. Before Dodd-Frank became law, Wall Street lobbied furiously to emasculate it, but the attempt failed. Likewise, the banks’ efforts at softening the bill’s provisions during its implementation have often been unsuccessful. A paper by the political scientists John T. Woolley and J. Nicholas Ziegler looks in detail at the fight over derivatives-trading regulations. “Most of the industry was violently opposed to the new rules,” Ziegler told me. “But a combination of small but very engaged advocacy groups and gutsy regulators made sure they got through.”

No More Magic Algorithms: Cultural Policy in an Era of Discoverability

Medium, Data & Society: Points, Fenwick McKelvey

from May 09, 2016

Call it Discoverability. That’s what Canadian cultural institutions have labelled the new paradigm of broadcasting. Discoverability will be the focus of a summit happening in Toronto this week. While the venue might be local, the summit deserves international attention as a point of intersection between cultural policy and algorithms.

Discoverability refers to the “technological changes [that] have transformed how audiences consume content and made it possible for content from any source to bypass the regulated broadcasting system.” Bypassing the regulated systems poses significant challenges to the historic institutions of Canadian cultural policy like its main media regulatory institution, the Canadian Radio-television and Telecommunications Commission (CRTC). The CRTC has a mandate to “safeguard, enrich and strengthen the cultural, political, social and economic fabric of Canada” according to one of policy objectives of Canada’s Broadcasting Act.

How do algorithms relate to enriching Canada’s social fabric?

Are Google’s Predictive Analytics Worth Patient Privacy Risks?

HealthITAnalytics

from May 09, 2016

A Google company offering predictive analytics to the NHS is finding itself in hot water after a data-sharing agreement raises questions over patient privacy and data use.

UW researchers shine at CHI

UW CSE News

from May 05, 2016

UW faculty and students are gearing up for the Association of Computing Machinery’s CHI 2016 conference that begins this weekend in San Jose, California. As the top conference for human-computer interaction research, CHI offers a terrific opportunity to showcase the breadth and depth of the University of Washington’s expertise in HCI and design as well as the strength of our interdisciplinary collaborations. UW CSE professor James Fogarty put together a terrific overview of UW-authored papers featured at this year’s conference for the DUB website—including three Best Paper Awards representing the top one percent of submissions.

Siri-creator shows off first public demo of Viv, ‘the intelligent interface for everything’ | TechCrunch

TechCrunch

from May 09, 2016

Today, onstage at TechCrunch Disrupt NY, Siri creator Dag Kittlaus showed off the first public demo of Viv, an AI virtual system that aims to be “the intelligent interface for everything.”

In a live presentation that mentioned the word “paradigm” at least a dozen times, Kittlaus talked a bit about the next wave of computer interaction methods and how Viv would come to “breathe life into the inanimate objects of our life through conversation.”

Events

JuliaCon 2016: Boston, MA.

The third Julia conference will take place June 21st-25th, 2016 at the Massachusetts Institute of Technology in Cambridge, Massachusetts. Expect cutting-edge technical talks, hands-on workshops, a chance to rub shoulders with Julia’s creators, and a weekend in a city known for its historical significance and colonial architecture. Looking forward to seeing you there!

Registration closes on June 10, but be sure to buy your tickets by May 6 May 13 to take advantage of the early registration discount!

Tuesday-Saturday, June 21-25, at MIT.

Deadlines

WiMIR Grants to attend ISMIR 2016 – WiMIR

deadline: subsection?

Women in MIR (WiMIR) Grants are being offered for the first time thanks to the generous support of industry partners (Smule, Shazam) to female first or supporting authors of accepted full papers, as well as female first authors of accepted late-breaking demo (LBD) submissions to be presented at ISMIR 2016. Applicants do NOT need to be students to apply for the WiMIR Award.

New York, NY ISMIR 2016 will Sunday-Thursday, August 7-11.

Deadline to apply is Friday, May 20.

Smarticipation2016 | Workshop at the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp’16)

deadline: subsection?

Smarticipation2016 is the 1st International Workshop on Intelligent Personal Guidance of Human Behavior Utilizing Anticipatory Models.

Heidelberg, Germany Tuesday, September 13; the workshop is held in conjunction with UbiComp2016.

Deadline for submissions is Tuesday, June 7.

LDAV 2016 – Big Data Analysis and Visualization October 23, 2016 Baltimore, Maryland

deadline: subsection?

In many areas of science, simulations and experiments begin to generate many petabytes of data, with some sciences facing exabytes of data near term. Similarly, the collection of information about the Internet applications and users for a variety of purposes is generating only more data. Our ability to manage, mine, analyze, and visualize the data is fundamental to the knowledge discovery process. That is, the value of data at extreme scale can be fully realized only if we have an end-to-end solution, which demands a collective, inter-disciplinary effort to develop.

Baltimore, MD This symposium, held in conjunction with IEEE VIS 2016, aims at bringing together domain scientists, data analytics and visualization researchers, and users, and fostering the needed exchange to develop the next-generation data-intensive analysis and visualization technology.

Deadline to submit abstracts is Friday, June 17.

CDS News

Data science students tackle patient readmission rates at the Iowa Business Analytics Competition — Center for Data Science — Medium

Medium, NYU Center for Data Science

from May 06, 2016

Last April, two CDS students, Zewei Liu and Olivia Yang, along with two Stern Business School Students, Andrew Hamlet and Troy Manos, won the University of Iowa’s MBA Business Analytics Case Competition. We asked Zewei and Olivia a few questions about their work.

New Book on Verification of Visualization Techniques

NYU Tandon School of Engineering

from May 09, 2016

An Introduction to Verification of Visualization Techniques, Claudio Silva (along with Tiago Etiene of Modelo Inc. and Robert Kirby of the University of Utah), Morgan & Claypool Publishers 2015

As we increase our reliance on computer-generated information, often using it as part of our decision-making process, we must devise tools to assess the correctness of that information. Consider, for example, software embedded on vehicles, used for simulating aircraft performance, or used in medical imaging. In those cases, software correctness is of paramount importance as there’s little room for error. Software verification is one of the tools available to attain such goals. Verification is a well-known and widely studied subfield of computer science and computational science and the goal is to help us increase confidence in the software implementation by verifying that the software does what it is supposed to do.

Tools & Resources

Bioinformatician at large: Managing and Analysing Big Data – Part II

Ewan Birney, Ewan's Blog

from May 10, 2016

This is the second of three blog posts about planning, managing and delivering a ‘big biodata’ project. Here, I share some of my experience and lessons learned in management and analysis – because you can’t have one without the other.

hyperlist: A performant virtual scrolling list utility capable of rendering millions of rows

GitHub – tbranyen

from May 10, 2016

This is a simple component that can be dropped into any JavaScript application and provide a virtual scrolling area that is highly performant and lightweight. With zero dependencies and well under 200 lines of code sans comments, it is easy to parse and use.

API Updated – word count languages

Media Cloud

from May 09, 2016

We have updated the api to better support languages in the wc/list word counting end point.

Before this update, the language for stemming and stopwording had to be specified explicitly in the api call. Now if the user does not specify a language for the api call, the api call will detect the languages used by the returned text and use those languages for stemming and stopwording, as specificed in the updated api spec.

The OKCupid dataset: A very large public dataset of dating site users

Open Science Framework

from May 09, 2016

For understanding the dataset, its uses and limitations, read the paper (paper.pdf).

The dataset has two main parts:

The question-level data

(question_data.csv).

The case-level data (user_data.csv or parsed_data.rds).

Welcome to OpenML’s documentation! — OpenML 0.2.1 documentation

Matthias Feurer, Andreas Müller, Farzan Majdani, Joaquin Vanschoren and Pieter Gijsbers.

from May 09, 2016

Python interface to OpenML.

The OpenML server can only be accessed by users who have signed up to the OpenML platform. If you don’t have an account yet, sign up now.

Store your private data inside a git repository

Coderwall

from April 10, 2016

git-secret encrypts files and stores them inside the git repository, so you will have all the changes for every commit.

git-secret doesn’t require any other deploy operations rather than git secret reveal, so it will automatically decrypt all the required files.

Careers

How a Data Scientist (Who Studied Astrophysics) Ended Up in Fashion

Fashionista

Postdoctoral Position on Cognitive and Neural Modeling of Perceptual Decision Making

Vanderbilt University, Category Laboratory

Sports.BradStenger.com

NYU Data Science newsletter – May 10, 2016

Leave a Comment Cancel reply