NYU Data Science newsletter – July 20, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 20, 2015

GROUP CURATION: N/A

Data Science News

Must Watch Data Science Videos from SciPy Conference 2015

Analytics Vidhya

from July 16, 2015

Scientific Computing with Python 2015, popularly known as SciPy 2015 conference, which was held recently at Austin, Texas, USA had a similar experience to offer. This 6 day conference showcased the best of research work happening in Python. There were some fantastic tutorials on data science. The pain point is, if you check the YouTube channel, 115 videos have been uploaded which covers topic ranging from bio science, geo science, astro science and data science. We had to watch every video to come up with this recommended list.

The future of the Scientific Python ecosystem

Konrad Hinsen's Blog

from July 16, 2015

SciPy 2015 is over, meaning that many non-participants like myself are now busy catching up with what happened by watching the videos. Today’s dose for me was Jake VanderPlas’ keynote entitled “State of the Tools”. It’s about the history, current state, and potential future of what is now generally known as the Scientific Python ecosystem: the large number of libraries and tools written in or for Python that scientists from many disciplines use to get their day-to-day computational work done.

History is done, the present status is a fact, but the future is open to both speculation and planning, so that’s what I find most interesting in Jake’s keynote. What struck me is that everything he discussed was about paying back technical debt: refactoring the core libraries, fixing compatibility problems, removing technical obstacles to installation and use of various tools. In fact, 20 years after Python showed up in scientific computing, the ecoystem is in a state that is typical for software projects of that age: a bit of a mess. The future work outlined by Jake would help to make it less of a mess, and I hope that something like this will actually happen. The big question mark for me is how this can be funded, given that it is “only” maintenance work, producing nothing fundamentally new. Fortunately there are people much better than me at thinking about funding, for example everyone involved in the NumFOCUS foundation.

Scientific computing in Python

John D. Cook

from July 16, 2015

… I used Python for a while before I discovered that there were so many Python libraries for scientific computing. At the time I was considering learning Ruby or some other scripting language, but I committed to Python when I found out that Python has far more libraries for the kind of work I do than other languages do. It felt like I’d discovered a secret hoard of code. I expect it would be easier today to discover the scientific Python stack. (It really is becoming more of an integrated stack, not just a collection of isolated libraries. This is one of the themes in the keynote above.)

When people ask me why I use Python, rather than languages like Matlab or R, my response is that I do a mixture of mathematical programming and general programming. I’d rather do mathematics in a general programming language than do general programming in a mathematical language.

Pyxley: Python Powered Dashboards | Stitch Fix Technology – Multithreaded

Stitch Fix Technology – Multithreaded blog

from July 16, 2015

Web-based dashboards are the most straightforward way to share insights with clients and business partners. For R users, Shiny provides a framework that allows data scientists to create interactive web applications without having to write Javascript, HTML, or CSS. Despite Shiny’s utility and success as a dashboard framework, there is no equivalent in Python. There are packages in development, such as Spyre, but nothing that matches Shiny’s level of customization. We have written a Python package, called Pyxley, to not only help simplify the development of web-applications, but to provide a way to easily incorporate custom Javascript for maximum flexibility. This is enabled through Flask, PyReact, and Pandas.

The key to agile data science: experimentation – O’Reilly Radar

O'Reilly Radar, Jerry Overton

from July 15, 2015

… An agile data scientist works in small iterations, pivots based on results, and learns along the way. Being agile doesn’t guarantee that an idea will succeed, but it does decrease the amount of time it takes to spot a dead end. Agile data science lets you deliver results on a regular basis and it keeps stakeholders engaged.

The key to agile data science is delivering data products in defined time boxes — say, two- to three-week sprints. Short delivery cycles force us to be creative and break our research into small chunks that can be tested using minimum viable experiments. We deliver something tangible after almost every sprint for our stakeholders to review and give us feedback. Our stakeholders get better visibility into our work, and we learn early on if we are on track.

Big Data In Big Oil: How Shell Uses Analytics To Drive Business Success

LinkedIn, Bernard Marr

from July 13, 2015

… Shell uses fibre optic cables, created in a special partnership with Hewlett-Packard, for these sensors, and data is transferred to its private servers, maintained by Amazon Web Services.

This gives a far more accurate image of what lies beneath. Data from any prospective oil field can then be compared alongside that from thousands of others around the world, to enable geologists to make more accurate recommendations about where to drill.

Production forecasting is one of the first jobs – determining the likely output of the reservoir is key to determining what resources should be spent on collecting it. When this decision is data-led, operators can have more confidence that this will be done efficiently.

How Wal-Mart hired 2,200 developers and made ‘magic’ – Yahoo Finance

Yahoo Finance, Business Insider

from July 19, 2015

… In addition to all the web traffic that a company like Wal-Mart has to deal with, there’s all kinds of complex wizardry that goes into wrangling a tremendous amount of data — millions of items, their prices, customer recommendations, warehouse availability — into a website that you can actually search and buy stuff from.

This is why four years ago, Wal-Mart opened its Walmart Labs division in the heart of Silicon Valley’s Mountain View, not far from Google, LinkedIn, and Yahoo.

It started from Wal-Mart’s acquisition of a 65-person search-and-analytics startup called Kosmix in 2011, but now it employs 2,200 “technologists” — developers, engineers, data experts, and the like — to solve the retailer’s biggest digital problems.

Microsoft Research explores ‘Machine Teaching’ – SD Times

SD Times

from July 13, 2015

Microsoft announced it has formed the Machine Teaching Group, a research project to advance “Machine Teaching.”

Machine Teaching is described by Microsoft as the next evolution of machine learning (ML), wherein developers or even users without a data analysis or computer science backgrounds can teach a computer to perform tasks. The Machine Teaching Group is a multi-disciplinary effort encompassing researchers in the fields of machine learning, human-computer interaction, visualization, and software engineering, all working toward improving ML and Machine Teaching capabilities.

Science special issue on Artificial Intelligence

Science

from July 17, 2015

Although most would agree that the average person is smarter than the average cat, comparing humans and machines is not as straightforward. A computer may not excel at abstract reasoning, but it can process vast amounts of data in the blink of an eye. In recent years, researchers in artificial intelligence (AI) have used this computational firepower on the scads of data accumulating online, in academic research, in financial records, and in virtually all walks of life. The algorithms they develop help machines learn from data and apply that knowledge in new situations, much like humans do. The ability of computers to extract personal information from seemingly innocuous data raises privacy concerns. Yet many AI systems indisputably improve our lives; for example, by making communication easier through machine translation, by helping diagnose illness, and by providing modern comforts, such as your smartphone acting as your personal assistant. This special issue presents a survey of the remarkable progress made in AI and outlines challenges lying ahead.

Many AI systems are designed for narrow applications, such as playing chess, flying a jet, or trading stocks. AI researchers also have a grander aspiration: to create a well-rounded and thus more humanlike intelligent agent. Scaling that research peak is daunting. But triumphs in the field of AI are bringing to the fore questions that, until recently, seemed better left to science fiction than to science: How will we ensure that the rise of the machines is entirely under human control? And what will the world be like if truly intelligent computers come to coexist with humankind?

Events

Program | KDD 2015, 10-13 August 2015, Sydney

KDD 2015 is a premier conference that brings together researchers and practitioners from data mining, knowledge discovery, data analytics, and big data. KDD 2015 will be the first Australian edition of KDD, and is its second time in the Asia Pacific region.

Monday-Thursday, August 10-13, in Sydney, Australia

This American Life’s Audio Hackathon

We are inviting developers, coders, designers, producers, sound designers…anyone who has skills and ideas to offer to join us. We will group you into teams, and after introductory talks by audio professionals, you will participate in a two day hackathon. Afterward, teams will present their creations to our speakers and to employees from our partners, who will evaluate and critique their work.

Saturday-Sunday, September 19-20, at ThoughtWorks, 99 Madison Ave

Registration Deadline: Saturday, August 15

CDS News

Should You Trust Your Money to a Robot?

Big Data journal, Dhar Vasant

from June 18, 2015

Financial markets emanate massive amounts of data from which machines can, in principle, learn to invest with minimal initial guidance from humans. I contrast human and machine strengths and weaknesses in making investment decisions. The analysis reveals areas in the investment landscape where machines are already very active and those where machines are likely to make significant inroads in the next few years.

Sports.BradStenger.com

NYU Data Science newsletter – July 20, 2015

Leave a Comment Cancel reply