NYU Data Science newsletter – February 10, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for February 10, 2016

GROUP CURATION: N/A

Data Science News

[1602.02410] Exploring the Limits of Language Modeling

arXiv, Computer Science > Computation and Language

from February 07, 2016

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 24.2. We also release these models for the NLP and ML community to study and improve upon.

ICLR 2016: List of accepted papers:

Igor Carron, Nuit Blanche blog

from February 08, 2016

From Hugo Larochelle’s Twitter: Accepted papers to the ICLR 2016 Conference Track are now available here.

Adam D’Angelo’s answer to Why are Twitter users unhappy about a switch to an algorithmic timeline? – Quora

Quora, Adam D'Angelo

from February 07, 2016

I think there are some very legitimate concerns with a switch to a timeline that’s primarily algorithmic. This is despite my general bias toward an algorithmic feed, having worked on the original Facebook news feed and Quora’s feed, which are not time ordered.

Let’s assume Twitter has run a good a/b test and it shows that people will use Twitter more under the algorithmic treatment, particularly the less engaged users that they need to restart growth. And assume this holds up over the long run so we can ignore recency effects. This outcome wouldn’t be particularly surprising to me – generally if the average stuff a user sees is more relevant, they will use a product more. But it’s not completely straightforward, because people have certain expectations of Twitter and breaking those will come with some costs.

To Keep AI Safe — Use AI

Re/code; Amitai Etzioni and Oren Etzioni

from February 04, 2016

Software is already pervasive in our society, but artificial intelligence software raises unique concerns even among the technological elite. The recent announcement that tech titans, including Elon Musk, have committed $1 billion to an artificial intelligence research center out of concern for what AI may become denotes an important question: Will AI software obey the law of the land and adhere to our ethical standards?

It is true that AI software is increasingly autonomous and potentially self-modifying, but it is our view of AI as a hegemonic, monolithic entity that drives our fear of it. Just as there is no one “software” entity, there will be no one AI entity. So here is a new viewpoint on how AI will be kept in check — more AI.

IMS Health Is Little Known But It’s Getting Rich Off Your Medical Data

Fortune, Adam Tanner

from February 09, 2016

You may never have heard of it, but IMS Health knows an awful lot about your medical history.

A global company based in Danbury, Connecticut, IMS IMS 0.56% buys bulk data from pharmacy chains such as CVS CVS 2.87% , doctor’s electronic record systems such as Allscripts, claims from insurers such as Blue Cross Blue Shield and from others who handle your health information. The data is anonymized—stripped from the identifiers that identify individuals. In turn, IMS sells insights from its more than half a billion patient dossiers mainly to drug companies.

A New High In Deal Activity To Artificial Intelligence Startups In Q4’15

CB Insights

from February 04, 2016

Advancements in AI were recently spotlighted by AlphaGo, a computer program developed by Google’s DeepMind team. The program — which relies on decision-making algorithms and neural networks — defeated a human European champion at the board game Go in a feat previously believed to be years away.

On the investor side, Jim Breyer of Breyer Capital has said AI will deliver massive returns for investors betting on applications for industries including healthcare and entertainment.

With this in mind, we used CB Insights’ database to look at funding to artificial intelligence startups since 2010.

Journalism as a Professional Model for Data Science

Brian C. Keegan

from February 09, 2016

… I suggest data scientists should look to journalism as we develop a professional identity. Of course journalism isn’t going to replace data science’s close affinities to engineering and academic cultures, but journalists do have decades of experience developing norms and practices to balance their considerable privileges against enormous amount of scrutiny and risk. These include ombudspeople, a culture of public criticism, as well as ethical guidelines that are developed and enforced as the level of professional societies as well as corporate management. How these manifest into journalists’ professional identities in turn can inform how professional data scientists might develop (and enforce) norms about dealing with sensitive information, navigating conflicting values, developing a shared culture, and persisting through change. This argument draws from Michael Schudson’s (1989) sociology of organizations and occupational ideology that explores the tensions between “journalists’ professed autonomy and decision-making power and…[constraints] by organizational and occupational routines.” Asking a similar question of data scientists, how do they balance the autonomy they have to analyze data and influence decisions against the limitations imposed by managers, methods, and regulations?

[1602.02665] The happiness paradox: your friends are happier than you

arXiv, Computer Science > Social and Information Networks

from February 08, 2016

Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood. A Friendship paradox does not necessarily imply a Happiness paradox where most individuals are less happy than their friends. Here we report the first direct observation of a significant Happiness Paradox in a large-scale online social network of 39,110 Twitter users. Our results reveal that popular individuals are indeed happier and that a majority of individuals experience a significant Happiness paradox. The magnitude of the latter effect is shaped by complex interactions between individual popularity, happiness, and the fact that users cluster assortatively by level of happiness. Our results indicate that the topology of online social networks and the distribution of happiness in some populations can cause widespread psycho-social effects that affect the well-being of billions of individuals.

ConEd Taps IBM to Supply New Yorkers With Real-Time Power Data

Bloomberg Business

from February 09, 2016

New York utility owner Consolidated Edison Inc. has chosen International Business Machines Corp. to build a platform that will allow its customers to monitor and control their power demand in real time.

The platform is part of ConEd’s $1.3 billion plan to install millions of smart meters for customers across its service territories that will track their power and gas use. The program will offer consumers access to demand data every 15 minutes, faster than any other service of its kind, according to ConEd.

Events

DataScienceHackathon

Cornell Tech is our meeting spot for a weekend of coding, camaraderie, and creativity. This interdisciplinary & experiential event is open-ended and meant to bring together engineers, business students, designers, entrepreneurs, etc. With a focus on utilizing data science, it’s your turn to produce viable solutions in one of our verticals.

Friday-Sunday, February 12-14, at Cornell Tech, 111 8th Avenue. Kickoff keynote presentation by Claudia Perlich is Friday, 7:15 p.m.

Deadlines

NGS Summer 2016 – Analyzing Next-Generation Sequencing Data

deadline: subsection?

This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms (e.g., Nanopore, PacBio). The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on genome and transcriptome assembly, transcript quantitation, mapping, and other topics.

Deadline to apply is Tuesday, March 1.

Tools & Resources

What Have We Learned From This Open Source Project?

Taskwarrior

from February 08, 2016

Here is the collected wisdom that we have gained from running the Taskwarrior project for nine years. It has been rewarding, enjoyable, and sometimes frustrating. We learned a lot about users and Open Source expectations.

Websites, Storage, and Sharing

New York University, University Life

from February 10, 2016

NYU IT works to create a seamless technology environment where conversations and awareness thrive no matter where students and faculty are located. Our sites, storage and sharing services ensure that collaboration thrives amongst a connected, global university community.

Tools such as NYU Box, NYU Drive, Web Publishing and the NYU Events calendar allow members of the community to store and share data for research, archiving, and daily work. Explore the services below, or see the service comparison charts on the right to decide which resource best suits your needs.

Gale-Shapley Algorithm and Designing Matching Markets

Data Science Association

from February 05, 2016

… In honor of the Nobel prizes to Al Roth and Lloyd Shapley, here is a primer on matching theory. Matching is a fundamental property of many markets and social institutions. Jobs are matched to workers, husbands to wives, doctors to hospitals, kidneys to patients.

The field of matching may be said to start with the Gale-Shapley deferred choice algorithm. Here is how it works, applied to men and women and marriage (the algorithm can also work for gay marriage but it’s a little easier to explain and implement with men and women). Each man proposes to his first ranked choice. Each woman rejects any unacceptable proposals but defers accepting her highest-ranked remaining suitor. Each rejected man proposes to his second ranked choice. Each woman now rejects again any unacceptable proposals, which may include previous suitors who have now become unacceptable. The process repeats until no further proposals are made; each woman then accepts her most preferred suitor and the matches are made.

A note to academics wondering how to collaborate with companies or trying to work at one

Medium, Behavioral Economics, Kristen Berman

from January 29, 2016

Over the last 3 years I’ve worked to bring social science insights to large and small companies. This includes doing new research within companies and helping disseminate existing research to the hearts and minds of product people.

What did this look like?

It looked like throwing massive conferences to connect behavioral economists and startups, being a founding member of the Google Behavioral economics team and manually connecting top tier academics with companies that may be able to use their research.

Sports.BradStenger.com

NYU Data Science newsletter – February 10, 2016

Leave a Comment Cancel reply