Data Science newsletter – March 1, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for March 1, 2017

GROUP CURATION: N/A

 
 
Data Science News



Artificial Intelligence System to Diagnose Skin Cancer: Interview with Stanford Scientist Andre Esteva

Medgadget, Conn Hastings


from

Scientists at Stanford University have developed a deep convolutional neural network that can diagnose skin cancer by examining images of skin lesions. Skin cancer is the most common human cancer, and one in five Americans will be diagnosed with it at some point in their lives. At present, skin cancer is primarily diagnosed through an initial visual assessment by a dermatologist, with additional biopsies and histopathological assessments if a cancerous lesion is suspected.

During development of their technology, the researchers trained their artificial intelligence system using a dataset of almost 130,000 images of different skin cancers.


AI to Highlight Our Biggest Game Developer Conference Yet

NVIDIA Blog


from

Artificial intelligence and deep learning are revolutionizing modern computing, and NVIDIA is bringing them to game development.


nextstrain

Richard Neher and Trevor Bedford.


from

Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We provide a continually-updated view of publicly available data with powerful analytics and visualizations showing pathogen evolution and epidemic spread. Our goal is to aid epidemiological understanding and improve outbreak response.


Boston Dynamics Officially Unveils Its Wheel-Leg Robot: “Best of Both Worlds”

IEEE Spectrum, Erico Guizzo and Evan Ackerman


from

When Boston Dynamics introduced its massively upgraded Atlas last year, we said the robot could “do things we’ve never seen other robots doing before, making it one of the most advanced humanoids in existence.” But now, after seeing the video that Boston Dynamics just released to officially unveil its newest creation, Handle, a sort of Atlas on wheels, we’ll just say it again: Handle can do things we’ve never seen other robots doing before, making it one of the most advanced humanoids in existence.

“Wheels are a great invention,” Marc Raibert, founder and president of Boston Dynamics, tells IEEE Spectrum, adding that Handle, which uses a wheel-leg hybrid system, “can have the best of both worlds.”


Interview with Andreas Mueller, Lecturer at Columbia University and Core Contributor to Scikit-Learn, by Reshama Shaikh

The Machine Learning Conference


from

One of our Program Committee members, Reshama Shaikh, recently interviewed Andreas Mueller, a Lecturer in Data Science at Columbia University and core developer of the Python library scikit-learn, on some of his recent work with the scikit-learn open source community. T


ROGER: The CyberGIS Supercomputer – ROGER – NCSA Wiki

University of Illinois Urbana-Champaign , National Petascale Computing Facility»


from

Resourcing Open Geospatial Education and Research, ROGER is the world’s first-ever CyberGIS Supercomputer designed especially for computationally intensive geospatial data processing and analysis.


Government Data Science News

Patti Brennan is the new Interim Associate Director for Data Science at the National Institutes of Health. She is structuring a plan to “develop a sustainable data science infrastructure” at NIH and wants to know what is on your data science “wish-list”? Eighteen biomedical researchers ran a survey, finding that what they need is more short-courses and bioinformatics training.

Montana’s state legislature is considering a bill that would limit the use of biometric data by private entities. The stipulations include written consent before biometric data can be gathered or shared, secure storage, and immediate data destruction once the project is over. Wondering why data destruction matters? Read how Palantir is using Bush-era border surveillance data to aid deportations.

The Civic Analytics Network, a group of Chief Data Officers and analytics principals in large US cities and counties published eight suggestions for government agencies trying to balance citizen privacy with the value of releasing data openly.

Elsewhere, Joe Hellerstein offered bad advice for metadata standards. On the same day, Neil Wilson of the British Library argued that metadata is the “key to collaboration” in making the case for a National Bibliographic Knowledgebase (NBK). In yet more attention to metadata, Project Svalbard has released all of the data scraped at the #datarefuge events and “create[d] a single metadata dataset…[with] 38GB of metadata, over 30 million hashes and URLs of research data files.”


[1702.07800] On the Origin of Deep Learning

arXiv, Computer Science > Learning; Haohan Wang, Bhiksha Raj, Eric P. Xing


from

This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks, and extends to popular recent models like variational autoencoder and generative adversarial nets. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.


Introducing Pinterest Labs

Medium, Pinterest Engineering


from

As much as we’ve done, we still have far to go–most of Pinterest hasn’t been built yet. Which is why today we’re announcing Pinterest Labs, where we’ll tackle the most challenging problems in machine learning and artificial intelligence. Labs brings together top researchers, scientists and engineers to work on image recognition, user modeling, recommender systems and big data analytics. Our researchers are embedded throughout Pinterest allowing our discoveries to affect hundreds of millions of users each day.

Within Pinterest, we have experts with combined decades of experience in AI & ML research, including myself (Stanford associate professor and co-founder of Kosei), Ruben Ortega (former Allen Institute for Artificial Intelligence CTO, A9 CTO & Mechanical Turk Director), Sonja Knoll (former Microsoft Research scientist), and Vanja Josifovski (former Google & Yahoo Research), with more people to come.


Stats Department Expands to Accommodate Upward Trend in Concentrators

Harvard Crimson, Akshitha Ramachandran


from

After a statistically significant increase in the number of concentrators in recent years, several professors say Harvard’s Statistics Department is reaching its limit.

The department has grown from just 20 concentrators in 2008 to 196 in 2015, according to department records, and Statistics faculty say they’ve sometimes scrambled to add the necessary teaching, research, and advising infrastructure to accommodate the influx of new students.


Computer scientist Jennifer Widom named dean of Stanford School of Engineering

Stanford News


from

Jennifer Widom, a professor of computer science and of electrical engineering at Stanford University for 24 years, has been named dean of the School of Engineering, Provost Persis Drell announced Monday.


Google Research Awards 2016

Google Research Blog; Maggie Johnson


from

We’ve just completed another round of the Google Research Awards, our annual open call for proposals on computer science and related topics including machine learning, machine perception, natural language processing, and security. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.


The Unlikely Odds of Making it Big

The Pudding, Russell Goldenberg & Dan Kopf


from

According to data from the music event website Songkick, of the 7,000 bands that headlined a small venue (less than 700 capacity) in the NYC area in 2013, less than half even played another show from 2014 to October of 2016.


A Facebook-Style Shift in How Science Is Shared

The New York Times, Mark Scott


from

Researchers once faced difficulty in getting feedback from peers before publication, and their projects were often closed to outsiders.

This change was initially gradual. But it has increased at pace in recent years as the cost of cloud computing has plummeted and researchers have become comfortable in uploading their work onto social media.

That is what Ijad Madisch, who founded the social network ResearchGate with three partners in 2008, had in mind when he ditched his budding scientific research career in Massachusetts to return home to Germany to build his start-up in Berlin’s fast-growing cluster of technology companies.


National Science Foundation funds supercomputer cluster at Penn State

Penn State University, Penn State News


from

The Penn State Cyber-Laboratory for Astronomy, Materials, and Physics (CyberLAMP) is acquiring a high-performance computer cluster that will facilitate interdisciplinary research and training in cyberscience and is funded by a grant from the National Science Foundation. The hybrid computer cluster will combine general purpose central processing unit (CPU) cores with specialized hardware accelerators, including the latest generation of NVIDIA graphics processing units (GPUs) and Intel Xeon Phi processors.

“This state-of-the-art computer cluster will provide Penn State researchers with over 3200 CPU and Phi cores, as well as 101 GPUs, a significant increase in the computing power available at Penn State,” said Yuexing Li, assistant professor of astronomy and astrophysics and the principal investigator of the project.


Random forests to save human lives

O'Reilly Radar, Zachary Flamig and Race Clark


from

Flash flood prediction using machine learning has proven capable in the U.S. and Europe; we’re now bringing it to East Africa.


How this startup is using wearables and data science to help pharmaceutical companies

Built In Austin, Kelly Jackson


from

Pharmaceutical companies must walk their drugs through a four-phase approval process before they ever hit the market. It’s an incredibly lengthy undertaking (think six to 11 years) and can cost drug developers tens of millions of dollars in clinical trials. Even still, they will sometimes find out their drug has unfavorable side effects, forcing them back to square one.

That’s why an Austin startup named Litmus Health is stepping in, using data and technology to help pharmaceutical businesses during the clinical process.


Outgoing RWJF President and CEO Risa Lavizzo-Mourey Appointed Penn Integrates Knowledge Professor

Robert Wood Johnson Foundation


from

Risa Lavizzo-Mourey, MD, MBA, who departs the Robert Wood Johnson Foundation this April after nearly 14 years as president and CEO, will join the University of Pennsylvania as the institution’s nineteenth Penn Integrates Knowledge University Professor, effective January 1, 2018. Penn President Amy Gutmann and Provost Vincent Price announced the appointment today.

 
Events



MathFinance Conference

MathFinance


from

Frankfort, Germany April 20-21 [$$$]


Open Data Day Portland

Max Ogden (Donut.js/Dat), Danielle Robinson (Mozilla Science Fellow / OHSU)


from

Portland, OR Saturday, March 4, at 9 a.m., OHSU Collaborative Life Sciences Building (2730 SW Moody Ave) [free, registration required]


Understanding Media Studies: “Power Plays with Data” with Zara Rahman and Mimi Onuoha

The New School, School of Media Studies


from

New York, NY Monday, March 27, 2017 at 6 p.m., University Center (63 Fifth Ave, Room UL104), speakers: Zara Rahman, Fellow, Data & Society, and Mimi Onuoha, Artist & Research Resident, Eyebeam. [free]


Enabling Precision Medicine: The Role of Genetics in Clinical Drug Development – A Workshop

National Academies of Science, Engineering, and Medicine


from

Washington, DC Wednesday, March 8, at 8 a.m., Keck Center of the National Academies of Science, Engineering, and Medicine (500 5th Street NW) [free, registration required]


The Future of Media Conference

Stanford Graduate School of Business


from

Stanford, CA Wednesday, March 8, at Stanford Graduate School of Business [$$]

 
Deadlines



Data Visualization Community Survey 2017

You should view this survey as the start of a discussion the community needs to have so that you can get the best support possible in achieving your professional goals. This survey is anonymous and the results will be released to the public on Github.

Conference on Complex Systems 2017

Cancun, Mexico Conferences is September 17-22. Deadline for abstract submissions is March 10.
 
NYU Center for Data Science News



Time to infuse our universities with skills for the future: Chetan Dube

Livemint, Leslie D'Monte


from

What inspired you to set up IPsoft rather than taking up a job or continuing research?

The research I was doing at NYU centred round modelling of a system engineer’s brains with deterministic finite state machines. I remember a summer afternoon in 1998 suggesting to my adviser, Prof. Dennis Shasha, that given a couple of summers, we should be able to extend our research to one on artificial general intelligence (AGI). Prof. Shasha wisely reminded me that even the father of AI, John McCarthy, gave up on that, stating that the problem turned out to be a lot harder than anticipated. But the seeds of an AGI future had been planted. We were drawn inexplicably and compellingly by a future where man and machine would work together to create a beautiful planet—and IPsoft was born. Here we are today, 18 summers later, finally starting to approach that ever elusive Turing horizon.


9th Data Science Showcase

Moore-Sloan Data Science Environment


from

New York, NY Tuesday, March 7, at 4:30 p.m., Kimmel Center (60 Washington Square South) [free, registration required]

 
Tools & Resources



Keras with GPU on Amazon EC2 – a step-by-step instruction

Medium, Mateusz Sieniawski


from

Due to the need of using more and more complex neural networks we also require better hardware. Our PCs often cannot bear that large networks, but you can relatively easily rent a powerful computer paid by hour in Amazon EC2 service.

I use Keras – an open source neural network Python library. It’s great for a beginning the journey with deep learning mostly because of its ease of use. It is build on top of TensorFlow (but Theano can be used as well) – an open source software library for numerical computation. The rented machine will be accessible via browser using Jupyter Notebook – a web app that allows to share and edit documents with live code.


How to Navigate the Jupyter Ecosystem

Silicon Valley Data Science, Jonathan Whitmore


from

Project Jupyter encompasses a wide range of tools (including Jupyter Notebooks, JupyterHub, and JupyterLab, among others) that make interactive data analysis a wonderful experience. However, the same tools that give power to individual data scientists can prove challenging to integrate in a team setting with additional requirements. Challenges stem from the need to peer review code, to perform quality assurance on the analysis itself, and to share the results with management or a client expecting a formal document.

In this post, we’ll be talking through a few tools that help make data science teams more productive.


Encouraging user help for the Docathon (and beyond)

Docathan


from

“It’s not always clear to people how to contribute documentation. At the Docathon there will be many attendees who have experience in coding, but aren’t sure where to begin.”


numjs: Like NumPy, in JavaScript

GitHub – nicolaspanel


from

NumJs is a npm/bower package for scientific computing with JavaScript.


Perspective

Jigsaw


from

Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. Developers and publishers can use this score to give realtime feedback to commenters or help moderators do their job, or allow readers to more easily find relevant information, as illustrated in two experiments below. We’ll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as “toxic” to a discussion.


What’s Wrong With My Time Series

Stitch Fix Technology, Multithreaded blog, Alex Smolyanskaya


from

Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quick-response algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. The trouble is that the usual approach of cross-validation doesn’t work for time series models. The reason is simple: time series data are autocorrelated so it’s not fair to treat all data points as independent and randomly select subsets for training and testing. In this post I’ll go through alternative strategies for understanding the sources and magnitude of error in time series.

 
Careers


Internships and other temporary positions

Teaching Fellow – Samuelson Law, Technology & Public Policy Clinic



University of California-Berkeley, School of Law; Berkeley, CA

Leave a Comment

Your email address will not be published.