Data Science newsletter – August 4, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for August 4, 2017

GROUP CURATION: N/A

 
 
Data Science News



Data Visualization of the Week

Syfy, Bad Astronomy blog, Phil Plait


from


Zuckerberg, Chan give UCSF $10 million for EHR work, health sciences, big data

MobiHealthNews, Bernie Monegain


from

Dr. Atul Butte, pediatrician and computer science leader, has been named the Priscilla Chan and Mark Zuckerberg Distinguished Professor at UC San Francisco, where he takes the reins at the Institute for Computational Health Sciences. The $10 million gift from the Silicon Valley philanthropists Chan and Zuckerberg will launch the institute and support Butte’s work.


The Future of Search Engines

University of Texas, Texas Advanced Computing Center


from

Researchers combine artificial intelligence, crowdsourcing and supercomputers to develop better, and more reasoned, information extraction and classification methods


A Holistic Approach to Farm Sensing

The Yield, John McGee


from

The first thing to point out is that sensing is just the first bite of the apple, in terms of what Sensing+ for Agriculture does. It is an end-to-end ecosystem, involving four key stages: sense, analyse, predict and inform.

First, we sense multiple conditions within your microclimate. We then apply data science to analyse the information, before creating 7-day on-farm predictions using artificial intelligence. These insights are sent from our secure Microsoft cloud to a mobile app, which is also used to record activities like irrigation and spraying, as well as setting notifications.

The second thing to note is the importance of having a stable, reliable measurement system underpinning your analytics. This is essential for any sensing system to be effective, or reliable as a decision support tool.


NYU Tandon Named Top Tech Feeder School for Financial Industry

NYU Tandon School of Engineering


from

The New York University Tandon School of Engineering sends more graduates to become technology analysts at America’s financial institutions than any school in the world, according to a newly published study by eFinancialCareers. The online site reported earlier this year that NYU ranked fourth in front-office job recruitment for investment banking, as well.

eFinancialCareers looked at more than 35,000 resumes in its database of U.S.-based technologists in banks and buy-side firms for the latest analysis. NYU Tandon ranked as the top feeder school for technology at financial services firms in the United States. Princeton University, Rutgers, Columbia University, and the University of Illinois followed.


Facebook now uses Caffe2 deep learning for the site’s 4.5 billion daily translations

VentureBeat, Khari Johnson


from

Facebook announced today that it has started using neural network systems to carry out more than 4.5 billion translations that occur each day on the backend of the social network. Translations carried out with recurrent neural networks (RNNs) were able to scale with the use of Caffe2, a deep learning framework open-sourced by Facebook in April.

The Caffe2 team today also announced that in part due to work done around translation, the framework is now able to work with recurrent neural networks.

“Using Caffe2, we significantly improved the efficiency and quality of machine translation systems at Facebook.


CPS launches new training program in data science

The GW Hatchet, Leah Potter


from

The College of Professional Studies [at George Washington University] is introducing a six-month program to teach data science and analytical skills to working professionals.

CPS partnered with Trilogy Education Services to launch the GW Data Analytics Boot Camp because of the increasing value of data skills in the world’s digital economy, officials said in a University release Thursday. Classes for the program, which begins in September, will be offered on evenings and weekends to accommodate students with jobs.


Harnessing the power of data science with global stat labs

University of Colorado Boulder, CU Boulder Today


from

Collecting statistics is one thing. Communicating vast quantities of data in a way that helps researchers arrive at solutions to real-world problems is another. The Laboratory for Interdisciplinary Statistical Analysis (LISA), created by director Eric Vance, was born to meet this need—but its reach now extends far beyond the community of researchers at CU Boulder.

On July 14, data scientists and statisticians from Nigeria, Ethiopia, Brazil, Pakistan, India, Zimbabwe and the United States attended the first annual LISA 2020 Symposium at the International Statistical Institute’s World Statistics Congress in Marrakech, Morocco. The event, which included ten presentations at the World Statistics Congress throughout the following week, allowed participants to build their skillsets, share best practices, and strengthen collaborations within LISA’s supportive global network of statistical collaboration laboratories, also known as “stat labs.”


Scientists Link Cutting-edge Biodiversity Genomics With Museum Wisdom Through New Public Database

Smithsonian, Newsdesk


from

A new publically available database will catalog metadata associated with biologic samples, making it easier for researchers to share and reuse genetic data for environmental and ecological analyses.

The resource, called the Genomic Observatories Metadatabase (GeOMe), was developed by researchers at the Smithsonian’s National Museum of Natural History and eight other museums and research institutions. It links publically available genetic data to records of where and when samples were collected, providing contextual information that until now has been missing from widely shared databases. Such information is critical for comparing biodiversity in different locations worldwide and tracking it across time. But despite calls for more data sharing within the research com


The American Scientists Stepping Up to Run for Office

WIRED, Science, Mother Jones, Tim Murphy


from

More than a dozen Democratic candidates with science backgrounds have announced their candidacies for Congress or are expected to in the coming months. The boomlet of STEM-based candidates amounts to a minor seismic event in a community where politics and research have traditionally gone together like sodium and water. Trump has been in office just six months, but he’s already done something remarkable—he’s gotten scientists to run for office.


New imaging technique overturns longstanding textbook model of DNA folding

NIH, News Releases


from

How can six and half feet of DNA be folded into the tiny nucleus of a cell? Researchers funded by the National Institutes of Health have developed a new imaging method that visualizes a very different DNA structure, featuring small folds of DNA in close proximity. The study reveals that the DNA-protein structure, known as chromatin, is a much more diverse and flexible chain than previously thought. This provides exciting new insights into how chromatin directs a nimbler interaction between different genes to regulate gene expression, and provides a mechanism for chemical modifications of DNA to be maintained as cells divide. The results will be featured in the July 28 issue of Science.


Meet SeaHawk: A nanosatellite that will offer a colorful new view of the ocean

Gordon and Betty Moore Foundation


from

SeaHawk, a type of nanosatellite called a CubeSat, is closing in on its debut. This small but powerful device – with cubes measuring about 10cm on each side – will provide a unique vantage point for observing changes in ocean surface color with its sophisticated color sensor. Ocean color is one of the most useful remote-sensing missions to society, for both science and operations. The color of ocean waters relates directly to the substances and the organisms within it, which can have a profound effect on our health, our environment and our commerce. Capturing daily, high-resolution observations of ocean color changes can be used in multiple ways – from alerting researchers to the onset and expansion of harmful algal blooms to potential fishing zones.


New Robot Speeds Sampling of Ocean’s Biogeochemistry and Health

Woods Hole Oceanographic Institute


from

The world’s first underwater vehicle designed specifically to collect both biological and chemical samples from the ocean water column successfully completed sea trials off the coast of New England on July 9, 2017. The new autonomous underwater vehicle (AUV), named Clio, will help scientists better understand the inner workings of the ocean.

Developed in an engineering collaboration between Woods Hole Oceanographic Institution (WHOI) and the University of Texas Rio Grande Valley (UTRGV) and in scientific collaboration with the University of Michigan at Ann Arbor, Clio will improve sampling efficiency and also reduce the time and cost of broad biogeochemical surveys, which are necessary to understand patterns and cycles of the marine food web and the role that the ocean plays in shaping Earth’s climate.


After failed space flights, NASA investigation leads to Portland

OregonLive.com, Jeff Manning


from

Twice in the past decade, NASA launched unmanned spacecraft ferrying advanced satellites into Earth’s orbit as part of a mission that could offer researchers an unprecedented new source of data on climate change.

But the satellites failed to deploy and, within minutes, NASA’s $550 million investment and years of work vaporized in fiery balls of space junk.

NASA has been investigating ever since. Now the inquiry has led to a nondescript industrial building in Northeast Portland, where a company called Sapa Extrusions acknowledges it has been dealing in bad aluminum and bad faith for as long as two decades.

 
Events



DubHacks Fall 2017

Major League Hacking


from

Seattle, WA October 21-22 at the
University of Washington. [free, application required]


Pythonic Testing and Infrastructure!

The New York Python Meetup Group


from

New York, NY Thursday, August 17, starting at 6:30 p.m. [rsvp required]

 
NYU Center for Data Science News



Inside the legislation lab

Center for Public Impact, Joel Tito


from

In politics, it’s fair to say that accurate predictions are easier said than done. Or are they?

While few saw Brexit and the election of Donald Trump coming, unpredictability does not cast its shadow across the entire political arena. Not any more. Take the US Congress for example. Its hallowed halls and respect for long-held tradition might seem to be an unlikely place for artificial intelligence (AI) to take root. But take root it has, thanks to the work of John Nay of Skopos Labs, who has developed an algorithm that predicts which congressional bills will make their way through both chambers and which will become law.

“Each Congress has thousands of different pieces of legislation,” he points out. “But unless you’re really plugged into what’s going on, it is really hard to know which are relevant and which might have a chance of being enacted. That’s where we come in.”

 
Tools & Resources



Introducing Vectorflow

Medium, Netflix TechBlog


from

At Netflix, our machine learning scientists deal with a wide variety of problems across a broad spectrum of areas: from tailoring TV and movie recommendations to your taste to optimizing encoding algorithms. A subset of our problems involve dealing with extremely sparse data; the total dimensionality of the problem at hand can easily reach tens of millions of features, even though every observation may only have a handful of non-zero entries. For these cases, we felt the need for a minimalist library that is specifically optimized for training shallow feedforward neural nets on sparse data in a single-machine, multi-core environment. We wanted something small and easy to hack, so we built Vectorflow, one of the many tools our machine learning scientists use.


Gathering Human Feedback

OpenAI; Tom Brown, Dario Amodei & Paul Christiano.


from

RL-Teacher is an open-source implementation of our interface to train AIs via occasional human feedback rather than hand-crafted reward functions. The underlying technique was developed as a step towards safe AI systems, but also applies to reinforcement learning problems with rewards that are hard to specify.


Managing Deep Learning Development Complexity

The Next Platform, Nicole Hemsoth


from

“New training samples, human insights, and operation experiences can consistently emerge even after deployment. The ability of updating a model and tracking its changes thus becomes necessary,” says a team from Imperial College London that has developed a library to manage the iterations deep learning developers make across complex projects. “Developers have to spend massive development cycles on integrating components for building neural networks, managing model lifecycles, organizing data, and adjusting system parallelism.”

To better manage development, the team developed TensorLayer, an integrated development approach via a versatile Python library where all elements (operations, model lifecycles, parallel computation, failures) are abstracted in a modular format. These modules include one for managing neural network layers, another for models and their lifecycles, yet another to manage the dataset by providing a unified representation for all training data across all systems, and finally, a workflow module that addresses fault tolerance. As the name implies, TensorFlow is the core platform for training and inference, which feeds into MongoDB for storage—a common setup for deep learning research shops.


Building an API on open data

GET PUT POST blog, Gordon Wintrob


from

I spoke with Ed Freyfogle, Co-Founder of OpenCage. Their API makes it easy to turn addresses into coordinates and, unlike Google and others, it’s built on top of a completely open data set. We discuss how the business got started, their stack to fan requests to other services, and scrappy marketing efforts.


Mozilla’s Send is basically the Snapchat of file sharing

The Verge, Jacob Kastrenakes


from

Mozilla has launched a new website that makes it really easy to send a file from one person to another. The site is called Send, and it’s basically the Snapchat of file sharing: after a file has been downloaded once, it disappears for good.


How to give your Raspberry Pi ‘state-of-the art computer vision’ using Intel’s Neural Compute Stick

TechRepublic, Nick Heath


from

Movidius has now released a video, see above, and text guide demonstrating how to try out object recognition on the Raspberry Pi 3 and the NCS, showing off the system recognizing sunglasses and a computer mouse as a camera pans around a room. Getting this demo running requires downloading a few software libraries, owning a Pi camera and copying some files from a PC running Ubuntu 16.04.


Cybersecurity for the travelling scientist

Nature News & Comment, Brian Owens


from

Virtual private networks, tracking apps and ‘burner’ laptops: how to protect sensitive data when you take your research on the road.


Open source release of the Kepler data processing pipeline

NASA, Kepler & K2, Geert Barentsen


from

The source code for the official Kepler data processing pipeline has been released to the public under the NASA Open Source License, and made available for download via the GitHub repository at https://github.com/nasa/kepler-pipeline. The release is accompanied by a Source Code Roadmap (pdf) which presents an overview of the various pipeline components found in the source code directory tree.


English word vectors

Facebook, Tomas Mikolov


from

This page gathers several pre-trained word vectors trained using fastText.

 
Careers


Postdocs

Post-Doc opportunity



Rutgers University, Department of Computer Science; Piscataway, NJ
Tenured and tenure track faculty positions

Tenure-Track Assistant Professor, Political Methodology



George Mason University, Schar School of Policy and Government; Fairfax, VA

Leave a Comment

Your email address will not be published.