NYU Data Science newsletter – November 18, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 18, 2015

GROUP CURATION: N/A

 
Data Science News



The Acute Pain and Chronic Reward of Public-Facing Work

Trey Causey


from November 17, 2015

Doing public-facing work, be it open source programming or writing, is an open invitation to allow internet commenters into your life. It can introduce anxiety, aggravate depression, induce anger, and more. There are certainly benefits and positives, but they are often more nebulous, revealing themselves over time in the form of “visibility” or “recognition”. The criticisms manifest immediately and acutely.

A number of “I’m quitting my work on open source” posts have recently made the rounds on social media, with a common theme being that it’s often quite literally thankless, that it consumes one’s life without returning rewards, and demands that one work unpaid, on the weekends and at nights, to satisfy users… and for what?

 

The Discovery of Statistical Regression

Pricenomics, Dan Kopf


from November 06, 2015

The legendary mathematician Carl Friedrich Gauss considered his alleged discovery of statistical regression “trivial.”

The method seemed so obvious to Gauss that he figured he must not have been the first to use it. He was sure enough it must have been discovered that he did not publicly state his finding until many years later, after his contemporary Adrien-Marie Legendre had published on the method. When Gauss suggested he had used it before Legendre it set off “one of the most famous priority disputes in the history of science…” Gauss would eventually be given most of the credit as the founder of regression, but not without a fight.

 

Machine Learning and Technical Debt with D. Sculley

Software Engineering Daily


from November 17, 2015

Technical debt, referring to the compounding cost of changes to software architecture, can be especially challenging in machine learning systems.

D. Sculley is a software engineer at Google, focusing on machine learning, data mining, and information retrieval. He recently co-authored the paper Machine Learning: The High Interest Credit Card of Technical Debt. [audio, 33:08]

 

Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance

Todd W. Schneider


from November 17, 2015

The New York City Taxi & Limousine Commission has released a staggeringly detailed historical dataset covering over 1.1 billion individual taxi trips in the city from January 2009 through June 2015. Taken as a whole, the detailed trip-level data is more than just a vast list of taxi pickup and drop off coordinates: it’s a story of New York. How bad is the rush hour traffic from Midtown to JFK? Where does the Bridge and Tunnel crowd hang out on Saturday nights? What time do investment bankers get to work? How has Uber changed the landscape for taxis? And could Bruce Willis and Samuel L. Jackson have made it from 72nd and Broadway to Wall Street in less than 30 minutes? The dataset addresses all of these questions and many more.

I mapped the coordinates of every trip to local census tracts and neighborhoods, then set about in an attempt to extract stories and meaning from the data.

 

US Neuroscientists Call for Creation of ‘Brain Observatories’

Kavli Foundation, Cell Press


from November 15, 2015

What is the future of the BRAIN Initiative? This national White House Grand Challenge involving more than 100 laboratories in the United States has already made progress in establishing large-scale neuroscience goals and developing shared tools. And now in an Opinion paper publishing October 15 in Neuron, leading American neuroscientists call for the next step: a coordinated national network of neurotechnology centers or “brain observatories.”

 

Exponential Medicine: The Future of the Quantified Brain

Singularity HUB


from November 16, 2015

… From non-invasive neuromodulation techniques, like transcranial magnetic stimulation (TMS), to optogenetics demos, we saw an impressive line up at Exponential Medicine last week of leaders using new technologies and research in neuroscience to bridge the gap between consumer and medical brain devices.

Here are a few of our favorite applications.

 

Here’s a glimpse of Cornell’s new tech campus, the future of NY startup scene | VentureBeat | Business | by Ruth Reader

VentureBeat


from November 17, 2015

For three years, Cornell University has been promising New York City a technology campus to rival Stanford University.

Today, the university and its partners revealed the first glimpse of Cornell Tech, the campus to come. During a press briefing, Cornell Tech’s senior director of capital projects, Andrew Winters, gave an overview of the campus and what still needs to be done before it opens in summer 2017.

 

Evaluating language identification performance | Twitter Blogs

Twitter Blogs, Under the hood


from November 16, 2015

Understanding the content of Tweets is important for many reasons: grasping a user’s interests (which in turn lets us show more relevant content), improving search, and fighting spam. There are many steps involved in a typical natural language processing pipeline, but one of the first and most fundamental steps is language identification — determining the language in which a piece of text is written.

This is generally not a hard problem. Even with a small and simple model (e.g., list of most common words in each language), we can achieve near-perfect accuracy when classifying news articles [1], for example. However, Tweets are different from the average news article or web page: they’re very short and use informal language. In practice, they’re different enough that we don’t want to evaluate language classifiers on news articles, because that doesn’t tell us much about their performance on Tweets. What we need, then, is a golden set of language-annotated Tweets to evaluate on. So, how did we go about constructing it?

 

Cloudera Proposes to Donate Impala and Kudu to The Apache Software Foundation

Cloudera


from November 17, 2015

Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache™ Hadoop® and the latest open source technologies, announced today that it will submit proposals for Impala and Kudu to join the Apache Software Foundation (ASF). By donating its leading analytic database and columnar storage projects to the ASF, Cloudera aims to accelerate the growth and diversity of their respective developer communities.

 
Deadlines



Programming contest taps analytics to mark endangered whales

deadline: subsection?

Analytics software maker MathWorks in Natick, Mass., is sponsoring the competition, along with the U.S. National Oceanographic and Atmospheric Administration (NOAA). It’s hosted on Kaggle, which is a platform for programming contests and analytics competitions.

The public competition is looking for algorithms that cull through aerial photographs and successfully recognize individual whales in the right whale community. [audio, 5:28]

The deadline for submissions is Thursday, January 7, 2016.

 

NYU/CNS-CDS : Faculty Position

deadline: subsection?

The Center for Neural Science (CNS) and Center for Data Science (CDS) at New York University invite applications for an open rank faculty position from candidates with research programs aimed at quantitative analyses and modeling of neural data. We are particularly interested in scholars with cross-disciplinary research interests that complement those of our existing faculty.

The deadline for receipt of applications is Friday, January 15, 2016.

 

Chronicling America: Historic American Newspapers Data Challenge | Challenge.gov

deadline: subsection?

NEH encourages contestants to develop data visualizations, web-based tools, or other innovative and interesting web-based projects using the open data found in Chronicling America. There are over ten million pages of digitized newspapers in Chronicling America, published between 1836 and 1922, from towns and cities across the United States. The newspapers illuminate 19th- and 20th-century American life, with stories about politics, sports, shopping, music, food, health, science, movies, and everything in between. Entries should uncover trends, display insights, explore a theme, or tell a story.

Submission Deadline is Wednesday, June 15, 2016

 

Leave a Comment

Your email address will not be published.