NYU Data Science newsletter – July 18, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 18, 2016

GROUP CURATION: N/A

Data Science News

Is it time for authors to leave SSRN?

Authors Alliance

from July 17, 2016

Since we first heard of mega-publisher Elsevier’s acquisition of SSRN, the popular social sciences pre-print and working paper repository, we have expressed concern. Elsevier is not known to be an avid supporter of the open access publishing practices favored by many of our members, and has historically taken a restrictive stance toward author control and ownership of scholarship.

In response, we reached out to Elsevier and to SSRN with a set of principles the service could adopt that would reassure authors that SSRN could continue to be a go-to resource for those looking to refine and share their work. We have since heard back from SSRN: they would not commit to adopting even one of our principles. They offered more general reassurances that their policies would continue as before. We were not satisfied, but we decided to wait and see whether our fears would be borne out.

As feared, it now appears that SSRN is taking up restrictive and hostile positions against authors’ ability to decide when and how to share their work.

Big data and its developer fallout

TechCrunch, Ben Schippers

from July 17, 2016

As the internet social turf wars continue to mature, the land grab is becoming much better understood. With a few companies controlling 95 percent of the social data, the internet is more closed and much more controlled than ever before.

The term (and concept behind) big data has been thrown around a lot over the past 15 months. What I’m referring to here is user data, primarily from social businesses that can be leveraged to build other apps and businesses if done within the confines of a company API.

GE and Microsoft partner to bring Predix to Azure, accelerating digital transformation for industrial customers

Microsoft News Center

from July 11, 2016

GE (NYSE: GE) and Microsoft Corp. (Nasdaq: “MSFT”) today announced a partnership that will make GE’s Predix platform for the Industrial Internet available on the Microsoft Azure cloud for industrial businesses. The move marks the first step in a broad strategic collaboration between the two companies, which will allow customers around the world to capture intelligence from their industrial assets and take advantage of Microsoft’s enterprise cloud applications.

Open Data Reveals $791 Million Error in Newly Adopted NYC Budget

I Quant NY

from July 15, 2016

The headline in a recent NYC press release caught my eye: “MAYOR AND CITY COUNCIL LAUNCH SEARCHABLE OPEN BUDGET FOR NEW YORK CITY”. I was pretty excited. As mentioned in my talk on Ted, NYC has entombed this data in PDFs for years, making it basically impossible to analyze and understand what is going on. But for the first time, we can actually do things like look at the top spending for each agency. This is a big deal.

The bad news in all of this is that the City only put this year’s budget (FY2017) on the Open Data portal and left all prior years off. This makes it impossible to look for trends and year-to-year changes, which is exactly what is most interesting in a budget – so that was a real disappointment from a progressive administration.

“Topic Modeling: What Humanists Actually Do With It.”

Digital Humanities at Berkeley, Teddy Roland

from July 14, 2016

… Despite its algorithmic nature, it would be a gross mischaracterization to claim that topic modeling is somehow objective or absent interpretation. I will simply emphasize that human evaluative decisions and textual assumptions are encoded in each step of the process, including text selection and topic scope. In light of this, I will focus on how topic modeling has been used critically to work on humanistic research questions.

Topic modeling’s use in humanistic research might be thought of in terms of three broad approaches: as a tool to guide our close readings, as a technique for capturing the social conditions of texts, and as a literary method that defamiliarizes texts and language.

10 years later, ‘inconsequential’ tweets a boon for researchers

University of Rochester, NewsCenter

from July 15, 2016

Twitter is anything but “inconsequential” in terms of data science research and its applications.

Using machine learning, natural language processing and other data science techniques, researchers leverage Twitter’s 300 million followers and the hundreds of millions of tweets they post each day as a kind of distributed sensor network, where each person observes and reports on some aspect of the world, says Henry Kautz, the Robin and Tim Wentworth Director of the Goergen Institute for Data Science at the University of Rochester.

“Each report is very noisy, but the aggregate results can be reliable,” Kautz says. “The approach can be used for health, environmental protection, public safety and many other applications.”

The 7 biggest problems facing science, according to 270 scientists

Vox, Julie Belluz

from July 14, 2016

… we sent scientists a survey asking this simple question: If you could change one thing about how science works today, what would it be and why?

We heard back from 270 scientists all over the world, including graduate students, senior professors, laboratory heads, and Fields Medalists. They told us that, in a variety of ways, their careers are being hijacked by perverse incentives. The result is bad science.

Our nine-point guide to spotting a dodgy statistic

The Guardian, Science

from July 17, 2016

Every statistician is familiar with the tedious “Lies, damned lies, and statistics” gibe, but the economist, writer and presenter of Radio 4’s More or Less, Tim Harford, has identified the habit of some politicians as not so much lying – to lie means having some knowledge of the truth – as “bullshitting”: a carefree disregard of whether the number is appropriate or not.

So here, with some help from the UK fact-checking organisation Full Fact, is a nine-point guide to what’s really going on.

Data Mining/Data Science “Nobel Prize”: 2016 SIGKDD Innovation Award to Philip S. Yu

KDnugget

from July 15, 2016

ACM SIGKDD is pleased to announce that Philip S. Yu is the winner of its 2016 Innovation Award. He is recognized for his influential research and scientific contributions on mining, fusion and anonymization of big data.

The ACM SIGKDD Innovation Award is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD).

Events

The Power of Projects

Networking and a project portfolio are often cited by data career panelists as what can give you an edge over the competition. In this event we dive into the details of how to design a project that will take your career to the next level, demonstrate exemplars, and provide an opportunity for speed networking. You will leave this event with new ideas, collaborators, and a stronger network.

New York, NY Tuesday, July 26, starting at 6:30 p.m., Columbia University

3rd GESIS Computational Social Science Winter Symposium 2016

Based on the success of first two events in 2014 and 2015 we have decided to continue to organize a follow-up event.

Cologne, Germany Wednesday-Thursday, November 30 – December 1, at KOMED im Mediapark

Tools & Resources

Inside the Collaboration That Built the Open Source JupyterLab Project

Tech at Bloomberg blog

from July 14, 2016

Researchers and programmers know the Jupyter Notebook as a web-based environment where they can do computational research with native support for code, math and data visualization.

Today at SciPy 2016, Bloomberg joined Continuum Analytics and Project Jupyter to reveal the new JupyterLab platform so that early adopters can help test the alpha release. JupyterLab creates a more desktop-like experience on the Web, rivaling expensive software suites that allow programmers to use familiar tricks like keyboard shortcuts, tabs and configurable editor layouts.

JupyterLab: the next generation of the Jupyter Notebook

Jupyter.org blog

from July 14, 2016

It’s been a long time in the making, but today we want to start engaging our community with an early (pre-alpha) release of the next generation of the Jupyter Notebook application, which we are calling JupyterLab.

Navigating the Range of Statistical Tools for Inferential Network Analysis – Cranmer – 2016 – American Journal of Political Science – Wiley Online Library

American Journal of Political Science; Skyler J. Cranmer et al.

from July 13, 2016

The last decade has seen substantial advances in statistical techniques for the analysis of network data, as well as a major increase in the frequency with which these tools are used. These techniques are designed to accomplish the same broad goal, statistically valid inference in the presence of highly interdependent relationships, but important differences remain between them. We review three approaches commonly used for inferential network analysis—the quadratic assignment procedure, exponential random graph models, and latent space network models—highlighting the strengths and weaknesses of the techniques relative to one another. An illustrative example using climate change policy network data shows that all three network models outperform standard logit estimates on multiple criteria. This article introduces political scientists to a class of network techniques beyond simple descriptive measures of network structure, and it helps researchers choose which model to use in their own research.

Careers

Machine Learning Engineer, Niantic

Niantic

Postdoctoral Fellowships in Humanities and Social Sciences 2017-2020

HASTAC

Sports.BradStenger.com

NYU Data Science newsletter – July 18, 2016

Leave a Comment Cancel reply