NYU Data Science newsletter – August 9, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 9, 2016

GROUP CURATION: N/A

Data Science News

Johns Hopkins Discovery Awards given to 24 cross-divisional research teams

Johns Hopkins University, Hub

from August 05, 2016

Twenty-four cross-divisional teams have been awarded Johns Hopkins Discovery Awards to pursue innovative partnerships on collaborative research projects that bridge dozens of fields.

With these awards, teams will be combining expertise in chemistry and electrical engineering to invent new photovoltaic materials; geriatric medicine and musical performance to improve quality of life for dementia patients and their caregivers; and molecular biology, neurobiology, and pathogen identification to develop a diagnostic device for detecting malaria by breath. In total, the 24 teams are composed of 84 faculty members and research scientists from across 10 Johns Hopkins schools and divisions.

CMU School of Computer Science – New Faculty Profiles

Carnegie Mellon University, Andrew W. Moore

from August 08, 2016

27 new hires. [pdf]

UMD Names Dr. Keith Marzullo as Dean of College of Information Studies

University of Maryland, UMD Right Now

from August 01, 2016

The University of Maryland announces the appointment of Dr. Keith Marzullo as Dean of the College of Information Studies (iSchool). Dr. Marzullo officially begins his position on August 1, 2016.

As Dean, Dr. Marzullo will build upon his extensive background in computer science and cybersecurity to elevate the iSchool’s leadership in information management, libraries and archives, and human-computer interaction.

Google teams up with Stanford Medicine for Clinical Genomics innovation

Google Cloud Platform Blog, Sam Schillace

from August 08, 2016

Google Cloud Platform has teamed up with Stanford Medicine to help clinicians and scientists securely store and analyze massive genomic datasets with the ultimate goal of transforming patient care and medical research.

Stanford Medicine ranks as one of the country’s best academic medical centers, and we’re eager to see what can happen when we work together. We anticipate that our contributions of HIPAA-compliant cloud computing, machine learning and data science — combined with Stanford’s expertise in genomics and healthcare — could lead to important advances in precision health, a predictive and preventive approach to healthcare.

This is a great opportunity to bring data science to patient care by combining genomics and traditional health records. Our collaboration is in support of the new Clinical Genomics Service at Stanford Health Care, which aims to sequence and analyze thousands of patients’ genomes. Cloud Platform will allow Stanford scientists and clinicians to securely analyze these massive datasets immediately and scale up painlessly as clinical genomics becomes more commonplace.

Facial recognition can now identify you without seeing your face

The Next Web

from August 08, 2016

Facial recognition already posed serious problems for privacy advocates. Used by everyone from law enforcement to churches, the privacy concerns with facial recognition are very real, and they’re about to get a lot worse.

The ability to identify anyone just by analyzing an image of their face creates a severe imbalance of power from the common citizen to the people in charge. The ability to identify those whose faces are blurred or otherwise obstructed kills that balance entirely. Yet that’s exactly what algorithms like the ‘Faceless Recognition System’ (FRS) are aiming to do.

FRS was a creation by researchers at the Max Planck Institute in Saarbrücken, Germany. The idea was to create a method of identifying individuals through use of imperfect — blurry or otherwise obscured — images. The system trains a neural network on a set of photos containing obscured and unobscured images before using that training to spot similarities from a target’s head and body.

The dirtiest little secret about big data: Jobs

ZDNet

from August 07, 2016

Most of the companies we talk to tell us that the biggest frustration they have with big data is the amount of time and resources it takes to do data preparation, cleansing, sorting, scrubbing, and deduplication before the data can be analyzed and put to use.

Unfortunately, a lot of that falls on data scientists–some of the hottest (and most well-paid) jobs in tech right now. According to an Xplenty study, a third of today’s high paid data wonks spend 50% to 90% of their time cleaning data before they can even analyze it.

As a result, many companies have tried to put machine learning and artificial intelligence to use in doing some of the data sorting and data cleansing. According to a Narrative Science survey, 58% of the companies that have big data solutions deployed, have also implemented AI.

Matt Cutts – Session on Aug 9, 2016

Quora, Matt Cutts

from August 09, 2016

Session with Matt Cutts of U.S. Digital Service and Defense Digital Service

Anthony Goldbloom: The jobs we’ll lose to machines — and the ones we won’t

TED Talk, TED.com

from August 08, 2016

Machine learning isn’t just for simple tasks like assessing credit risk and sorting mail anymore — today, it’s capable of far more complex applications, like grading essays and diagnosing diseases. With these advances comes an uneasy question: Will a robot do your job in the future? [video, 4:36]

More about jobs:

The dirtiest little secret about big data: Jobs (August 07, ZDNet)

()

()

()

» How can government make better use of data science? Insights from the first Data Science & Government Conference

UK Government, The Behavioural Insights Team

from August 03, 2016

Earlier this year the Behavioural Insights Team, working together with Oxford’s Blavatnik School of Government and Harvard’s Behavioral Insights Group, hosted a one-day conference exploring the topic of data science and government. The conference attracted 200 participants from across government, industry, and academia.

In his opening remarks, David Halpern, BIT’s CEO and the National Advisor on What Works, observed that the world of data for government has transformed dramatically over quite a short period. Whilst David’s PhD thesis required weeks of work combing through hundreds of consultation documents in a single GP practice to collect data, our paper published earlier this year in the Lancet allowed us to analyse all of the prescriptions data from every GP practice in the UK over a two year period in a matter of hours. This was thanks to the Health and Social Care Information Centre (HSCIC) which was founded in 2013.

Events

Urban Prototyping August Meetup

San Francisco, CA Join us as we create a place for designers, artists, and technologists to discuss redesigning city streets in partnership with nonprofits, city officials, and peers. — Thursday, August 18, at Gray Area Art & Technology (2665 Mission St), 7 p.m.

Findata day: Big data conference: Strata + Hadoop World

New York, NY Finance is information. From analyzing risk and detecting fraud to predicting payments and improving customer experience, data technologies are transforming the financial industry. And we’re diving deep into this change with a new day of data-meets-finance talks. — Monday-Thursday, September 26-29 at Javits Center (34th St and 11th Ave).

Tools & Resources

The Hardest Part About Microservices: Your Data

Christian Posta, Software Blog

from July 14, 2016

Of the reasons we attempt a microservices architecture, chief among them is allowing your teams to be able to work on different parts of the system at different speeds with minimal impact across teams. So we want teams to be autonomous, capable of making decisions about how to best implement and operate their services, and free to make changes as quickly as the business may desire. If we have our teams organized to do this, then the reflection in our systems architecture will begin to evolve into something that looks like microservices.

To gain this autonomy, we need to “shed our dependencies” but that’s a lot easier to say than do. I’ve seen folks refer to this idea in part, trivially, as “each microservice should own and control its own database and no two services should share a database.” The idea is sound: don’t share a single database across services because then you run into conflicts like competing read/write patterns, data-model conflicts, coordination challenges, etc. But a single database does afford us a lot of safeties and conveniences: ACID transactions, single place to look, well understood (kinda?), one place to manage, etc. So when building microservices how do we reconcile these safeties with splitting up our database into multiple smaller databases?

Resources for Running Workshops

Data Carpentry, Erin Becker

from August 08, 2016

A successful Data Carpentry workshop is the result of coordinated effort among many different types of participants, including instructors, helpers, hosts, learners and Data Carpentry staff. Data Carpentry offers two types of workshops – self-organized and centrally-organized. These workshop types differ in terms of instructor training requirements, fee structures, and participant responsibilities – with local hosts and instructors at self-organized workshops taking on administrative responsibilities normally handled by Data Carpentry staff.

Instructors (both new and experienced) and workshop hosts often have questions about their roles in workshops logistics, especially with how their responsibilities differ between self-organized and centrally-organized workshops. To help clarify the roles played by the different participants, and the differences between self- and centrally-organized workshops, we’ve put together some resources to guide participants through the workshop organizational process.

Building Data Pipelines with Python and Luigi

Marco Bonzanini

from October 24, 2015

As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to be re-engineered. This usually slows down innovation, and generally speaking your project as a whole.

This post will discuss some experience in building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus in on data plumbing, and how a workflow manager like Luigi can come to the rescue, without getting in your way. With a minimal effort, the transition from prototype to production can be smoother.

Careers

Tenured and tenure track faculty positions

Assistant/Associate/Full Professor of Genetics in the Institute for Biomedical Informatics

Perelman School of Medicine at the University of Pennsylvania

Sports.BradStenger.com

NYU Data Science newsletter – August 9, 2016

Leave a Comment Cancel reply