Data Science newsletter – February 24, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 24, 2017

GROUP CURATION: N/A

 
 
Data Science News



Data Visualization of the Week

Twitter, Present & Correct


from


Tweet of the Week

Twitter, NASA JPL


from


Deep Learning predicts hematopoietic stem cell development

Helmholtz Zentrum München, German Research Center for Environmental Health


from

Autonomous driving, automatic speech recognition, and the game Go: Deep Learning is generating more and more public awareness. Scientists at the Helmholtz Zentrum München and their partners at ETH Zurich and the Technical University of Munich (TUM) have now used it to determine the development of hematopoietic stem cells in advance. In Nature Methods they describe how their software predicts the future cell type based on microscopy images.


Could a subtle tweak to Metro’s map fix overcrowding on the Blue Line?

The Washington Post, Martine Powers


from

With Metro’s most recent SafeTrack shutdown, Northern Virginia riders are finally doing the thing that Metro has been asking them to do for years: They’re switching from the historically overcrowded Blue Line to roomier Yellow Line trains.

But after Surge 12 concludes Tuesday, will these riders stick with their new commuting routes?

Zhan Guo thinks he can help. Guo is director of the urban planning program at New York University’s Wagner School of Public Service, and he’s done research into how a simple thing like the design of a subway map can radically shape people’s commuting decisions.


China is funding Baidu to take on the United States in deep learning research

Quartz, Dave Gershgorn


from

While US-based companies like Alphabet, IBM, Facebook, and Microsoft typically dominate US artificial-intelligence headlines, China’s government is now accelerating the country’s own contributions to the field.

China’s National Development and Reform Commission, a government agency tasked with planning economic and social strategies, will fund search giant Baidu’s development of a national deep-learning research lab, according to a post on Baidu’s Chinese WeChat account. The amount of funding was not disclosed, but Beijing-based Baidu will work with Tsinghua and Beihang universities, as well as other research Chinese institutions.

One important caveat: The laboratory won’t be a physical structure, but instead a digital network of researchers working on problems from their respective locations.


The 500 Cities Project: New Data for Better Health

Robert Wood Johnson Foundation, Culture of Health blog, Oktawia Wojcik


from

For the first time ever, the CDC and CDC Foundation are providing city and neighborhood level data for 500 of the largest U.S. cities, making it possible to identify emerging health problems and effective interventions.


Big Blue’s big blunder: IBM accidentally hands over root access to its data science servers

The Register, Thomas Claburn


from

IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers.

This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.


Better explaining the world around us

The University of Queensland, Australia, UQ News


from

A new University of Queensland-led study could help scientists more accurately predict and explain patterns of diversity in nature.

Ecology Centre Director in the UQ School of Biological Sciences Associate Professor Margie Mayfield said the project had developed a mathematically simple framework for accurately assessing the outcomes of species’ interactions.

“This advancement will improve the accuracy of studies of the diversity of biological communities,” she said.


NSF and Big Data Hubs Receive Cloud Credits for Big Data Research

Computing Community Consortium, The CCC blog, Khari Douglas


from

The National Science Foundation (NSF) recently announced that Amazon Web Services (AWS), Google, and Microsoft will participate in it’s new Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA) research program.

The three major cloud providers will supply cloud credits and resources to qualifying NSF-funded projects. This collaboration aims to combine cutting-edge, industry cloud computing resources with real data sets and assorted NSF-supported projects to advance research in big data and data science.


A note on our lawsuit against Otto and Uber

Medium, Waymo


from

Competition in the self-driving space is a good thing; it pushes everyone to develop better, safer and more affordable technology. But we believe that competition should be fueled by innovation in the labs and on the roads, not through unlawful actions.

Recently, we uncovered evidence that Otto and Uber have taken and are using key parts of Waymo’s self-driving technology. Today, we’re taking legal action against Otto and its parent company Uber for misappropriating Waymo trade secrets and infringing our patents. We wanted to share more context on why we made this decision.


SatSummit Highlights

developmentSEED, Ian Schuler


from

At SatSummit, 231 global development and satellite industry experts joined us to map how we can do smarter development work with satellite data. The major satellite data providers (including NASA, ESA, DigitalGlobe and Planet) joined big data gurus (from AWS, IBM, and Orbital Insights) and dozens of global development organizations (including USAID, World Bank, Humanitarian OpenStreetMap Team, and World Resources Institute) for a truly inspiring day of discussion


Company Data Science News

Facebook let Steven Levy in to profile its renowned applied machine learning group. [Long read]

Textbook publisher Pearson reported record losses. US university students’ are abandoning textbooks for online course material and an overall downturn in US enrollments.

ZestFinance offers a new machine learning credit score application, ZAML, that claims to offer capacity to score users with little or no credit history. The model was developed in China where practices around spending, saving, and lending patterns are meaningfully different compared to the US. Buyer beware the biases.

Forbes reports on a stunning failed partnership between IBM Watson and MD Anderson cancer center at the University of Texas. The university reportedly lost a breathtaking $62m in a twisted university-industry partnership arrangement. More bad news, an attendee to an IBM data science event discovered the company had left private keys to their Docker host environment in a publicly accessible container.

Uber had an atrocious week. The company’s human resources department was revealed to be consistently insensitive to claims of sexism by Susan J. Fowler, an articulate former member of Uber’s engineering staff. Then Uber’s self-driving truck subsidiary Otto was sued. Allegedly, departing Google employees stole LiDAR technology IP from Google’s Waymo project and used it at their new positions with Otto. Reportedly, Uber’s partnership with Carnegie Mellon researchers was not performing quickly enough for Otto/Uber who were desperate to keep up in the competitive driverless car market. One small upside: Professor Christopher Knittel at MIT is helping the company reduce racism in the Boston Uber market (passengers with black sounding names have been more likely to have their requests cancelled).

Body Labs, a computer vision startup in New York, develops accurate models of individual human bodies, hoping to integrate them into sizing assumptions made by fashion retailers. Is it possible that someday, clothing ordered online might actually fit upon arrival? Eddie Bell has a post on working with fashion models, that is, models to predict fashion customer preferences.

The Intercept reports that Palantir has been working with the NSA for years to surveil the whole world. Palantir is run by Silicon Valley heavyweight and Trump Economic Advisory board member Peter Thiel.


A Super Smash Bros-playing AI has taught itself how to stomp professional players

Quartz, Dave Gershgorn


from

In a crowded convention center in San Jose, Calif., this past January during the Genesis 4 Super Smash Bros. tournament, away from the main competitive stage, a small group of gamers gathered around a clunky, four-year-old HP laptop. Amidst the onlookers, a professional player called Gravy was battling on familiar ground against an unfamiliar opponent.

The arena was Battlefield, a flat stage with three small platforms, considered the standard for professional play. He’s played professionally as Captain Falcon for nearly five years, and considered one of the world’s top players for the character—but he was losing to the AI playing as the same character. It had only been practicing for two weeks.


Instrument finds new earthly purpose

symmetry magazine, Amanda Solliday


from

Detectors long used to look at the cosmos are now part of X-ray experiments here on Earth. [longform]


Luis Bettencourt to head University of Chicago’s Mansueto Institute of Urban Innovation

Santa Fe Institute


from

SFI Professor Luis Bettencourt has accepted an offer to direct the Mansueto Institute of Urban Innovation at the University of Chicago. Starting July 1, Bettencourt will transition to SFI’s external faculty as he assumes his new role as Pritzker Director of the Mansueto Institute for Urban Innovation and Professor of Ecology and Evolution at the University of Chicago.


The 5G Frontier: Millimeter Wireless

IEEE Spectrum, Robert W. Lucky


from

Millimeter-wave wireless truly is a frontier. Today the millimeter band is largely uninhabited and inhospitable, as signals using these wavelengths run up against difficult propagation problems. Even when signals travel through free space, attenuation increases with frequency, so usable path lengths for millimeter waves are short, roughly 100 to 200 meters. Such distances could be accommodated with the smaller cell sizes envisioned in 5G, but there are numerous other impediments. Buildings and the objects in and around them, including people, block the signal. Rain and foliage further attenuate millimeter waves, and diffraction—which can bend longer wavelengths around occluding objects—is far less effective. Even surfaces that might be conveniently nicely reflective at longer wavelengths appear rougher to millimeter waves, and so diffuse the signal.

So there may be gold in that frontier, but it is going to be very difficult to mine. [longform]


An epigenetics gold rush: new controls for gene expression

Nature News & Comment, Cassandra Willyard


from

How rediscovered chemical tags on DNA and RNA are shaking up the field.

 
Events



SIGMOD 2017 New Location

ACM


from

Chicago We are pleased to announce that the SIGMOD/PODS Conference 2017 will be held in Chicago in the week of May 14, relocating from North Carolina. [$$$]


Nipype workshop and hackweek: March 27 – 31 at MIT

NIPY


from

Cambridge, MA Nipype is a Python-based scalable workflow engine with extensive support for neuroimaging. [$$$]


AAAS pact with March for Science signals new embrace of activism

STAT, AAAS


from

Washington, DC Among the two dozen official partners of the upcoming March for Science, a recent addition dwarfs the rest: The American Association for the Advancement of Science, the world’s largest general scientific organization.

The partnership represents a legitimacy boost for the march, planned for April 22 in Washington, D.C., with satellite marches in various other cities.

 
Deadlines



Data Science Capacity Building Questionnaire Survey

In collaboration with the White House Data Cabinet Policy Working Group, the Big Data Hubs are seeking input on how community leaders are developing data science capacity. We would love to hear your perspective on this critical topic.

UNDG Data Visualization Contest

Sponsored by Tableau – an opportunity for data scientists to showcase their skills by analysing and visualizing data from the new UNDG Transparency Portal. Deadline for submissions is April 30.
 
Tools & Resources



Read-through: Wasserstein GAN

Alex Irpan, Sorta Insightful blog


from

I really, really like the Wasserstein GAN paper. I know it’s already gotten a lot of hype, but I feel like it could use more.

I also think the theory in the paper scared off a lot of people, which is a bit of a shame. This is my contribution to make the paper more accessible, while hopefully retaining the thrust of the argument.


4 steps to learn TensorFlow when you already know scikit-learn

Medium, Zelros AI


from

Our team at Zelros loves learning new things. TensorFlow is one of the many technologies we decided to leverage. Here is how we did it.


Data Stories Episode 91 | Visualizing Data with RAW

Enrico Bertini and Moritz Stefaner


from

In this episode we have on a nice trio of Italian visualizers — Michele Mauri (Density Design), Giorgio Uboldi (Calibro), and Giorgio Caviglia (Trifacta) — to talk about RAW, the data visualization tool they built to help people visualize data interactively. [audio, 32:06]


Building machine learning solutions that can withstand adversarial attacks

O'Reilly Data Show Podcast, Ben Lorica


from

In this episode of the Data Show, I spoke with Parvez Ahammad, who leads the data science and machine learning efforts at Instart Logic. He has applied machine learning in a variety of domains, most recently to computational neuroscience and security. [audio, 44:53]


Insights on scaling and integrating databases

O'Reilly Radar, Adam Michael Wood, Greg Meddles


from

Scaling databases for the enterprise is hard. You have to parallelize, avoid bottlenecks, and shard across multiple machines. You have to carefully consider tradeoffs between data integrity and constant uptime, between optimizing for reading and writing, between speed of development and speed at runtime. You have to integrate wildly disparate data sources, satisfy stakeholders with competing expectations, and find the structure hidden in unstructured data. Working with databases at the scale of global enterprise is about bringing order to chaos.

I recently had the opportunity to interview MarkLogic’s Greg Meddles on this topic.


GPUs are now available for Google Compute Engine and Cloud Machine Learning

Google Cloud Platform Blog, John Barrus


from

Google Cloud Platform gets a performance boost today with the much anticipated public beta of NVIDIA Tesla K80 GPUs. You can now spin up NVIDIA GPU-based VMs in three GCP regions: us-east1, asia-east1 and europe-west1, using the gcloud command-line tool. Support for creating GPU VMs using the Cloud Console appears next week.

 
Careers


Full-time positions outside academia

Research Manager, Basic Income



Y Combinator Research; Oakland, CA
Postdocs

Postdoctoral Researchers (2)



Honey Lab, Johns Hopkins University; Baltimore, MD
Full-time, non-tenured academic positions

DATA SCIENTIST



University of Pennsylvania, Perelman School of Medicine, BE-Informatics; Philadelphia, PA
Tenured and tenure track faculty positions

PROFESSOR/ASSOCIATE PROFESSOR



University of Michigan, School of Information; Ann Arbor, MI

Leave a Comment

Your email address will not be published.