NYU Data Science newsletter – July 16, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 16, 2015

GROUP CURATION: N/A

 
Data Science News



The arXiv as Dataset – Lateral

Lateral.io blog


from July 12, 2015

The arXiv is a repository of over 1 million preprints in physics, mathematics and computer science. It is truly open access, and the preprints are an excellent dataset for testing out all sorts of language modelling / machine learning prototypes.
What’s available? … There are many tasks for which the arXiv is an ideal dataset. You could use the tags (MSC categories) to train a tagger, for instance, or test out your ideas for summarisation or keyword extraction. We feed the abstracts into our content recommender to provide a way to conceptually browse the arXiv. When reading an abstract, articles with conceptually-related abstracts are surfaced automatically (see earlier post). The arXiv dataset often turns up in the language modelling literature as well, including in a recent paper authored by arXiv founder, Paul Ginsparg, and Alexander Alemi.

 

Tracking down the Villains: Outlier Detection at Netflix

Netflix Tech blog


from July 14, 2015

It’s 2 a.m. and half of our reliability team is online searching for the root cause of why Netflix streaming isn’t working. None of our systems are obviously broken, but something is amiss and we’re not seeing it. After an hour of searching we realize there is one rogue server in our farm causing the problem. We missed it amongst the thousands of other servers because we were looking for a clearly visible problem, not an insidious deviant.

In Netflix’s Marvel’s Daredevil, Matt Murdock uses his heightened senses to detect when a person’s actions are abnormal. This allows him to go beyond what others see to determine the non-obvious, like when someone is lying. Similar to this, we set out to build a system that could look beyond the obvious and find the subtle differences in servers that could be causing production problems. In this post we’ll describe our automated outlier detection and remediation for unhealthy servers that has saved us from countless hours of late-night heroics.

 

7 Python Tools All Data Scientists Should Know How to Use

Galvanize


from July 14, 2015

If you’re an aspiring data scientist, you’re inquisitive – always exploring, learning, and asking questions. Online tutorials and videos can help you prepare you for your first role, but the best way to ensure that you’re ready to be a data scientist is by making sure you’re fluent in the tools people use in the industry.

I asked our data science faculty to put together seven python tools that they think all data scientists should know how to use. The Galvanize Data Science and GalvanizeU programs both focus on making sure students spend ample time immersed in these technologies, investing the time to gain a deep understanding of these tools will give you a major advantage when you apply for your first job.

 

Disaster relief efforts show promise of analytics and seemingly unrelated data sources – The Corner Office

SAS, The Corner Office blog


from July 08, 2015

As monsoon season begins, many Nepal earthquake victims have shelter over their heads thanks in part to an unlikely intersection of two SAS global development projects.

The first project is with the International Organization for Migration (IOM). IOM is the first responder to any crisis that displaces people. IOM provides temporary shelter and helps coordinate the efforts of other relief agencies that provide food, clean water, medical care and security.

IOM is currently assisting thousands of victims in the earthquake-ravaged areas of Nepal. SAS is helping IOM analyze shelter data to help better allocate resources, based on the work we did with them following Typhoon Haiyan in the Philippines.

 

Climate Change: A Risk Assessment

Centre for Science and Policy, University of Cambridge


from July 13, 2015

We know that climate change is a problem – but how big a problem is it? We have to answer this question before we can make a good decision about how much effort to put into dealing with it.

This report argues that the risks of climate change should be assessed in the same way as risks to national security, financial stability, or public health. That means we should concentrate especially on understanding what is the worst that could happen, and how likely that might be.

 

Can your smartphone’s sensors reveal if you’re depressed? | The Verge

The Verge


from July 15, 2015

Data gathered by a smartphone’s sensors might one day be able to identify whether someone suffers from symptoms of depression, a study published today in the Journal of Medical Internet Research shows. By tracking average daily phone use and recording GPS data, scientists say they were able to identify people with depressive symptoms with a high level of accuracy. Unfortunately, the small size of the study and missing data points suggests that this method isn’t exactly ready for prime time.

Major depression is one of the most common mental health issues in the US. In 2012, about 16 million adults — or 6.9 percent of all US adults — experienced at least one major depressive episode. That’s why finding new ways of keeping track of patients is so important. Harnessing a phone’s sensors could help identify people who are at risk for depression, and ensure that patients receive treatments more promptly, says David Mohr, a behavioral scientist at Northwestern University and co-author of the study.

 

Machined Learnings: ICML 2015 Review

Paul Mineiro, Machined Learnings blog


from July 14, 2015

This year’s location was truly superlative: the charming northern French city of Lille, where the locals apparently subsist on cheese, fries, and beer without gaining weight. A plethora of vendors and recruiters were in attendance, handing out sweet swag to starving grad students. Honestly it’s hard to feel bad for ML grad students nowadays: getting a PhD in English indicates true selfless love of knowledge, while being a machine learning grad student is more like being a college basketball player.

The conference was not lacking for entertainment: in case you haven’t been paying attention, the enormous success of deep learning has generated some controversy about inventorship. Between Stigler’s Law of Eponymy and Sayre’s Law, this is of course not surprising, but when they announced the deep learning panel would have some of the contesting luminaries together on stage, everybody prepped the popcorn. I hope they videotaped it because it did not disappoint.

As far as trends: first, “deep” is eating everything.

 
Events



TODAY–Come Out and Say Tickets, Brooklyn | Eventbrite



Physical and real-world games present unique design challenges, from managing large crowds to keeping track of players over large spaces to unpredictable weather and changing city architecture. Join us for Come Out & Say as veteran real-world game designers talk about a tricky design challenge they faced and tried to solve with one of their games. Gain valuable insight into making games from a wide variety of perspectives in this round of fast paced design talks.

Thursday, July 16, at 7 p.m., MAGNET – 2 MetroTech Roadway Brooklyn

 

New England Symposium on Statistics in Sports



The 2015 New England Symposium on Statistics in Sports will be a meeting of statisticians and quantitative analysts connected with sports teams, sports media, and universities to discuss common problems of interest in statistical modeling and analysis of sports data. The symposium format will be a mixture of invited talks, a poster session, and a panel discussion.

Saturday, September 26, at Harvard University Science Center

 

ICHPS 2015



Throughout the last 19 years, the International Conference on Health Policy Statistics (ICHPS), organized by the Health Policy Statistics Section of the American Statistical Association, has played a vital role in the dissemination process of health policy (and health services) statistics. ICHPS provides a unique forum for discussing research needs and solutions to the methodological challenges in the design of studies and analysis of data for health policy research.

Wednesday-Friday, October 7-9, in Providence, RI

 

Digital Compass at IEEE VIS 2015



One of our goals in organizing the Compass activities at the main IEEE VIS conference is to support the next generation of visualization researchers by helping connect junior and more experienced researchers from the scientific visualization, information visualization and visual analytics communities. This starts with connecting those who have attended IEEE VIS before with those who may be new to the conference, and perhaps the community.

Tuesday-Thursday, October 27-29, part of the IEEE VIS 2015 Conference in Chicago

 

Fifth Seattle Symposium in Biostatistics | Biomarkers for Diagnosis, Prognosis, and Therapy Guidelines



Biomarkers for Diagnosis, Prognosis, and Therapy Guidance

Saturday-Tuesday, November 21-24, at the Grand Hyatt Seattle

 
Deadlines



The GVG-AI Competition

deadline: subsection?

The GVG-AI Competition explores the problem of creating controllers for general video game playing. How would you create a single agent that is able to play any game it is given? Could you program an agent that is able to play a wide variety of games, without knowing which games are to be played?

Submission Deadlines: Saturday, August 1, for CIG 2015, and Tuesday, September 1, for CEEC 2015

 

IEEE VGTC VPG 2015

deadline: subsection?

The IEEE Visualization and Graphics Technical Committee (VGTC) is sponsoring its second International Data-Visualization Contest to recognize outstanding design in visualization and visual analytics from around the world. The 2015 contest theme is “Perceived vs. Actual Student Interest.” The contest will be administrated by the IEEE Visualization Pioneers Group (VPG) under the auspices of IEEE VGTC and in collaboration with ACT, Inc. (ACT is a non-profit organization in the education readiness assessment sector.) The awards will be announced at the IEEE VPG Reception to be held at the IEEE VIS 2015 conference. You are not required to attend either the VIS 2015 conference or the VPG reception to participate in the contest.

Submission Deadline: Friday, August 28

 
CDS News



The race to build ‘‘big data machines’’ in financial investing

O'Reilly Radar, Vasant Dhar


from July 14, 2015

Financial markets emanate massive amounts of data from which machines can, in principle, learn to invest with minimal initial guidance from humans. I contrast human and machine strengths and weaknesses in making investment decisions. The analysis reveals areas in the investment landscape where machines are already very active and those where machines are likely to make significant inroads in the next few years.

 

Deep Learning AI Experts In Demand at Google, Facebook, Twitter, IBM

Re/code


from July 15, 2015

… Once passé, deep learning, the subset of artificial intelligence focused on teaching machines to find and classify patterns in mass quantities of data, is now de rigueur across Google, Facebook, Microsoft, IBM and a host of other Silicon Valley companies. The trend has ignited an expensive race to scoop up scarce talent. And much of that expertise ties back to a cabal-like group of researchers who kept the futuristic field on life support 15 years prior.

In the decades since [Tomi] Poutanen entered Toronto, deep learning fell into what’s often called an “AI Winter” — a period, typical of the ambitious community, where the promise of theory fails to meet practical applications. Financial support dries up and researchers lose interest. Scientists had developed advanced theories of how neural networks operate, but lacked the computing power and data to put them to work.

Three computer scientists, Hinton, Yann LeCun and Yoshua Bengio, apparently missed the memo.

 

Leave a Comment

Your email address will not be published.