NYU Data Science newsletter – March 24, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 24, 2016

GROUP CURATION: N/A

 
Data Science News



Smart Machines…and What They Can Still Learn from People – YouTube

YouTube, Santa Fe Institute


from March 21, 2016

For nearly half a century, Artificial Intelligence (AI) has been more science fiction than science: exciting, possible, but just out of reach. And despite significant advances, “strong AI” in many ways remains elusive. Best-selling author and entrepreneur Gary Marcus provides a cognitive scientist’s perspective on AI. What have we learned? What are we still struggling with? Perhaps most compelling, is there anything programmers of AI can still learn from studying the science of human cognition?

Gary Marcus is professor of psychology and neural science at NYU and CEO and co-founder of the recently-formed Geometric Intelligence, Inc. His books include the New York Times best-seller “Guitar Zero”. He blogs for The New Yorker and is co-editor of the just-released collection “The Future of the Brain: Essays by the World’s Leading Neuroscientists”.

 

Searching in the Balkans

NYU Tandon School of Engineering


from March 22, 2016

Mergim Cahani (Class of ’09) might one day be thought of as the Balkans’ answer to Larry Page and Sergey Brin, the founders of Google. Co-founder and CEO of the search engine Gjirafa, he aims to bring relevant and easily accessible information to over 12 million Albanian-speaking people in the Balkans and around the world.

Gjirafa is strategically positioned in a way that Google could not hope to match: not only is the Albanian language lexically unique and complex, but much of the information that users try to find—bus schedules, for example, or business addresses—has not yet been digitized in that region. Cahani and his team are working hard to rectify the situation.

 

Innovation For Rent Is The Heart Of Google’s Cloud

The Next Platform


from March 23, 2016

… as the company’s top brass talked at the Next 2016 conference in San Francisco about how its Cloud Platform was different from key competitors Amazon Web Services, Microsoft Azure, and IBM SoftLayer, and Rackspace Hosting, the discussion always drifted towards services and away from virtualized infrastructure where companies can do their own thing. This may get the hackles up for those who, like Google, think they can bring differentiation to their IT organizations at the infrastructure layer and all the way up the stack to applications. And Google is committed to offering infrastructure, platform, and software services across the board, and to its credit, it has set about exposing the services it uses internally so that others can buy them with metered pricing.

If you want to go through – or have to go through – the evolutionary stages that Google itself went through to create a massively scaled, largely automated set of infrastructure that makes it relatively easy to deploy applications, you can do it.

The biggest message coming out of Next 2016 is that Google is absolutely committed to the public cloud, and that it intends to be a contender against AWS and Azure, which have a similar scale in terms of the raw infrastructure they deploy but which have different levels of revenue derived cloud capacity and services.

 

CUNY, Cornell Tech launch women in tech initiative

Crain's New York Business


from March 21, 2016

Computer-science classes aimed at women will be getting a boost this fall as CUNY launches an introductory program in partnership with Cornell Tech and Verizon.

 

Google announces Cloud Dataflow with Python support

Google Cloud Platform, Google Cloud Big Data Blog


from March 22, 2016

Today, we’re happy to announce Alpha support for executing batch processing jobs with the Cloud Dataflow SDK for Python. This SDK is a pure Python implementation of the Apache Beam computation model (formerly known as Dataflow model), which we recently contributed to the Apache Software Foundation as an incubating project.

This release allows new categories of users, particularly Python developers and members of the scientific community, to benefit from the Apache Beam model for large-scale distributed data processing. For example, users of the popular NumPy, SciPy, and pandas packages can now use Apache Beam to implement their calculations at larger scales.

 

Machine Learning in the Cloud, with TensorFlow

Google Research Blog, Slaven Bilac


from March 23, 2016

At Google, researchers collaborate closely with product teams, applying the latest advances in Machine Learning to existing products and services – such as speech recognition in the Google app, search in Google Photos and the Smart Reply feature in Inbox by Gmail – in order to make them more useful. A growing number of Google products are using TensorFlow, our open source Machine Learning system, to tackle ML challenges and we would like to enable others do the same.

Today, at GCP NEXT 2016, we announced the alpha release of Cloud Machine Learning, a framework for building and training custom models to be used in intelligent applications.

 

4 Tips on Recruiting an Engineering Team in New York City

Cornell Tech, News & Views


from March 23, 2016

… For Peter Brodsky, CEO of HyperScience, a startup creating artificial intelligence for enterprise clients, the relatively small ecosystem in New York has made finding engineering talent easier. … Being a deeply technical company in the city — 24 employees, all engineers — Brodsky said HyperScience has built a reputation with engineers in New York, which has helped them recruit top talent, something that might have been more difficult in California.

 

UPMC Enterprises puts $3M into Big Data innovations

MedCity News


from March 22, 2016

UPMC Enterprises, the commercialization and investment arm of the University of Pittsburgh Medical Center, will put more than $3 million into the first six innovations created under the auspices of the Pittsburgh Health Data Alliance.

The Pittsburgh Health Data Alliance is a year-old collaboration between UPMC, the University of Pittsburgh and Carnegie Mellon University. The alliance seeks to innovate in digital health as it applies to Big Data, analytics and interoperability.

 

NCEAS Receives $5.9 Million Award to Host New NSF Arctic Data Center

UC Santa Barbara, National Center for Ecological Analysis and Synthesis


from March 23, 2016

The National Science Foundation has made a 5-year, $5.9 million award to a national partnership, led by NCEAS at the University of California Santa Barbara, to develop and curate the NSF Arctic Data Center, a new archive for Arctic scientific data as well as other related research documents. The National Oceanic and Atmospheric Administration’s National Centers for Environmental Information (NCEI) and the NSF-funded Data Observation Network for Earth (DataONE) are partners in the Arctic Data Center cooperative agreement between NSF and NCEAS.

 

Google’s Greene Hastens Cloud Expansion to Catch Amazon

Bloomberg Business


from March 22, 2016

Google’s new cloud chief Diane Greene had unsettling news for employees at an internal sales meeting this month in Las Vegas: They weren’t taking corporate customers seriously enough and needed to sell harder, be hungrier and less complacent.

That was an unusual message at Google, which typically venerates technology over sales and marketing. But it was a necessary one. Google is third in cloud computing, an increasingly popular way for companies to run their IT operations.

 

Predictions for Precision Medicine get an evidence-based reality check

HealthNewsReview.org


from March 22, 2016

… in the last month a couple of papers that address Collins’s vision have been published. The first paper in the BMJ is a systematic review and meta-analysis on the impact of communicating genetic risks of disease on risk-reducing health behavior. Collins’s personal experience notwithstanding, the conclusions of this paper were pretty unambiguously negative:

Expectations that communicating DNA based risk estimates changes behavior is not supported by existing evidence. These results do not support use of genetic testing or the search for risk-conferring gene variants for common complex diseases on the basis that they motivate risk-reducing behavior.</blockquote.

 
Events



NYC Data Science Seminar Series: Susan Athey, Adapting machine learning methods for causal inference



This talk will review several recent papers which aim to modify popular machine learning methods for problems of causal inference, such as evaluating the impact of a treatment using experimental or observational data.

Tonight — Thursday, Mar 24, at 6 p.m., Davis Auditorium, 530 W 120th Street, 4th FL

 

MARL/MIR Guest Talk on March 29 — Hélène Papadopoulos



Learning, Logic, and Probability: A Unified View for (Music) Signal Processing

Analyzing and formalizing the intricate mechanisms of music is a very
challenging goal for Artificial Intelligence. The analysis and the
automatic extraction of relevant content information from music audio
signals are two of the most important aspects of Music Information
Retrieval (MIR). Music audio signals are complex, both because of the
intrinsic nature of audio, and because of the information they convey.

Tuesday, March 29, at 10 a.m., 6th floor conference room, 35 West 4th Street

 

NASA Announces Dates for One of World’s Largest Hackathons



NASA’s open innovation incubator, the International Space Apps Challenge, will take place April 22-24. The global main stage for this year’s event will be in Pasadena, California, with local events taking place simultaneously in 193 locations spanning 72 countries.

On April 23 and 24, participants are asked to develop mobile applications, software, hardware, data visualizations and platform solutions that could contribute to space exploration missions and help improve life on Earth.

This year’s challenge will include a Data Bootcamp on April 22, streamed live from the global main stage. The bootcamp is open to the public and will give participants the opportunity to learn new skills with computer coding and data.

 
Deadlines



Deep Learning Summer School 2016

deadline: subsection?

Deep neural networks that learn to represent data in multiple layers of increasing abstraction have dramatically improved the
state-of-the-art for speech recognition, object recognition, object detection, predicting the activity of drug molecules, and many other
tasks. Deep learning discovers intricate structure in large datasets by building distributed representations, either via supervised, unsupervised or reinforcement learning.

This summer schools is aimed at graduate students and industrial engineers and researchers who already have some basic knowledge of machine learning (and possibly but not necessarily of deep learning) and wish to learn more about this rapidly growing field of research. This year’s edition of the summer school is organized by Aaron Courville and Yoshua Bengio.

Deadline for applications is Monday, April 11.

 

Recruitment Details – Associate or Full Professor In Residence, UCSF Institute for Computational Health Sciences

deadline: subsection?

The Institute for Computational Health Sciences (ICHS) is a critical component of a global UCSF initiative in Precision Medicine, which seeks to aggregate and integrate vast, disparate datasets to advance understanding of biological processes, determine mechanisms of disease, inform diagnosis, and prevention and treatment of disease. Beginning with a base of excellent computational faculty in our four top-ranked professional schools (Dentistry, Medicine, Nursing and Pharmacy) and Graduate Division, superb research programs and outstanding Medical Center, ICHS will establish a central convening center, hire additional faculty, and build programs for research and education. ICHS will develop and enhance UCSF’s computational approaches and strategies in basic, translational, clinical and population-based biomedical research. ICHS will be the campus hub for computer scientists and for researchers who employ computation as a primary tool in their research. A primary academic appointment will be provided in a department closely allied with the applicant’s scholarly or clinical expertise.

Deadline to apply is Friday, April 15.

 
Tools & Resources



Comparing ggplot2 and R Base Graphics

Nathan Yau, Flowing Data blog


from March 22, 2016

In R, the open source statistical computing language, there are a lot of ways to do the same thing. Especially with visualization.

R comes with built-in functionality for charts and graphs, typically referred to as base graphics. Then there are R packages that extend functionality. Although there are many packages, ggplot2 by Hadley Wickham is by far the most popular.

These days, people tend to either go by way of base graphics or with ggplot2. It’s one or the other. Rarely both. I use base graphics. I don’t use ggplot2.

It’s not that I think one is better than the other. It’s just that base graphics continues to get me where I want to go, and the times I tried ggplot2, it didn’t get me anywhere faster than the alternative.

 

Building a high-throughput data science machine

O'Reilly Media, Eduardo Ariño de la Rubia


from March 23, 2016

Scaling is hard. Scaling data science is extra hard. What does it take to run a sophisticated data science organization? What are some of the things that need to be on your mind as you scale to a repeatable, high-throughput data science machine? Erik Andrejko, VP of science at The Climate Corporation, has spent a number of years focused on this problem, building and growing multi-disciplinary data science teams. In this post, he covers what he thinks is critical to continue building world-class teams for his organization. I recently sat down with Andrejko to discuss the practice of data science, the scaling of organizations, and key components and best practices of a data science project. We also talked about the must-have skills for a data scientist in 2016—and they’re probably not what you think. I encourage you to watch the full interview—we cover a wide range of topics, and it’s a fun conversation about what it takes to take data science to the next level. What follows is a series of key takeaways from our chat that I’d like to highlight.

 

Lunchbox

NPR


from November 20, 2015

Lunchbox is a suite of tools to create images for social media sharing. It includes Quotable, Factlist and Waterbug. To customize Lunchbox for your organization, follow the directions on https://github.com/nprapps/lunchbox.

 

Leave a Comment

Your email address will not be published.