Data Science newsletter – April 21, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for April 21, 2017

GROUP CURATION: N/A

Data Science News

The Pudding, Russell Goldenberg

from April 18, 2017

diverting reads

Bickering with your partner? There’s an app for that! Researchers in the Couple Mobile Sensing Project (what a name) at USC got 34 couples to have their speech and GPS coordinates captured while wearables measured their skin conductivity, physical activity, and body temperature. Nineteen of the thirty-four couples reported having a conflict during the one day collection period. (I’m sensing that one member of these couples may have been more into this research project than the other, manufacturing conflicts that may not otherwise have occurred. Ah, what we do in the name of science.) The machine learning process accurately predicted the conflicts 79.3 percent of the time. The hope is that these predictions can allow apps on the phone to offer ‘helpful suggestions’ to ameliorate conflict. Adding two smart aleck phones to a bickering couple does not immediately strike me as a good idea.

Johan van der Beek used AI to optimize the board game Monopoly, adjusting the rent and fees on some properties to ensure every player has an equal chance of winning. Van der Beek raised the payout on Water Works from 4x what’s shown on the dice to 7x. What if the player also owns the Electric Company? The fee goes from 10x to 17x.

Using a remote sensing technique and historic elephant census data, photographer Morgan Trimble and Ashley Robson predicted that 75 percent of Africa’s elephants are ‘missing’ due to poaching.

Tahany Als recorded students’ computer use in her Earth 222/Environ 232 course at the University of Michigan. She then created a slide listing all of the things they do online during class. Some activity is predictable – Facebook, NYTimes, shopping, their programming homework. Other activity is less predictable: photoshopping President Trump’s head onto muppets, breaking up with a boyfriend over chat (not recommended).

The US Census released a descriptive page-turner about the lives of young people: “The Changing Economics and Demographics of Young Adulthood From 1975 to 2016”. [Netflix also has a controversial new original show that could be said to be about the changing lives of young people: 13 Reasons Why.] A few quick highlights from the Census report:

“over half of Americans believe that marrying and having children are not very important in order to become an adult”

“In the 1970s, 8 in 10 people married by the time they turned 30. Today, not until the age of 45 have 8 in 10 people married”

“More young people today live in their parents’ home than in any other arrangement: 1 in 3 young people, or ~24 m 18- to 34-year olds, lived in their parents’ home in 2015.”

“In 1975, 25 percent of young men ages 25-34 had incomes of less than $30k per year. By 2016, that share rose to 41 percent (incomes are in 2015 dollars).”

Kevin Ho of IDEO created an interactive font map that is clustered using a convolutional neural network. Ho writes that, “choosing a font is one of the most common visual decisions a designer makes” but without an easy to navigate catalogue they typically, “fall back on fonts they’ve used before or search within categories like serif, san-serif, or grotesque.” In this case, the application of computation could lead to more creativity, not less, if we define creativity as building ideas in areas outside one’s status quo.

Tweet of the Week

Twitter, XKCD

from April 20, 2017

Inside an Intergalactic Video Game, a Search for Real Alien Worlds

MIT Technology Review, Simon Parkin

from April 17, 2017

Showing that algorithms are often no match for human vision, players of EVE Online will sift through satellite imagery in a vast citizen science hunt for exoplanets.

Prediction of Natural Gas Leak Events in New York City from Open Data

NYU Center for Urban Science and Progress

from April 20, 2017

In New York City, there were more than 60,000 emergency calls made by the New York City Fire Department (FDNY) related to gas leaks between 2013 and 2015. While most of these calls resulted in no damage or injury, several incidents did end with fatalities. Infrastructure in Manhattan leaks three to five times more natural gas than cities with newer infrastructure (Gallagher et al., 2015) and of the 6,400 miles of gas main lines running under New York City’s streets, 53% were installed prior to 1960. In 2012, Con Edison experienced 83 leaks for every 100 miles of gas main. Furthermore, replacing a main in NYC can cost from $2.2 million to $8 million per mile, so prioritizing investment is critical.

Machine Learning Visionary, David Heckerman, Joins Human Longevity, Inc. as Chief Data Scientist

Human Longevity, Inc.

from March 21, 2017

Human Longevity, Inc., the genomics–powered, health intelligence company, announced today that David Heckerman, M.D., PhD., a 25-year veteran of Microsoft, has joined the company as the Chief Data Scientist. Dr. Heckerman, who will report to J. Craig Venter, Ph.D., HLI Co-founder, Executive Chairman and head of scientific strategy, will lead an advanced analytics team at HLI’s Mountain View, CA, office to continue to tackle some of the most important challenges in the analysis of large-scale medical and genomic data.

At Flatiron Health, Doctors And Developers Work On Cracking The Code For Better Cancer Treatment

Fast Company, David Zax

from April 19, 2017

At the Google-backed tech company, physicians and engineers work side by side to improve workflow in oncology clinics—and help fix America’s broken health care system.

The Amazon That You Don’t See (That Google And Facebook Are Watching)

Medium, Startup Grind, Mitch Joel

from April 18, 2017

Amazon has become a beast of a search engine. If someone is looking for anything to purchase, they use Amazon (over Google) for their searches.

This is profound. Amazon’s marketplace also enables vendors (not just Amazon) to build and stock their own stores. Amazon also has a lot of personal information (and who it’s connected to). Think about wishlists, gifting friends and being able to pull that data together.

Americans are more distressed than ever: study

NY Daily News, Joe Dziemianowicz

from April 17, 2017

In the U.S., misery has company.

More Americans than ever have serious psychological distress, according to a new study from NYU Langone Medical Center.

Researchers analyzed Centers for Disease Control data and concluded that 3.4% of the adult U.S. population — more than 8.3 million — suffer from serious psychological distress, known as SPD.

Finding order and structure in the atomic chaos where materials meet

North Carolina State University, NC State News

from April 20, 2017

The model developed in Patala’s research group finds irregular three-dimensional shapes within the grain boundary, classifies them and then identifies patterns of those irregular shapes.

“Advances in microscopy can help us capture images of how atoms are arranged in a grain boundary, but then we don’t really know what we’re looking at – you can connect the dots any way you want,” Patala says. “Our tool helps to discern patterns of geometric features in an atomic landscape that can appear chaotic.

My proposal for JASA: “Journal” = review reports + editors’ recommendations + links to the original paper and updates + post-publication comments

Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog

from April 19, 2017

I think that journals should get out of the publication business and recognize that their goal is curation. My preferred model is that everything gets published on some sort of super-Arxiv, and then the role of an organization such as the Journal of the American Statistical Association is to pick papers to review and to recommend. The “journal” is then the review reports plus the editors’ recommendations plus links to the original paper and any updates plus post-publication comments.

220,000 cores and counting: MIT math professor breaks record for largest ever Compute Engine job

Google Cloud Platform Blog, Alex Barrett & Michael Basilyan

from April 20, 2017

An MIT math professor recently broke the record for the largest ever Compute Engine cluster, with 220,000 cores on Preemptible VMs, the largest known high-performance computing cluster to ever run in the public cloud.

Andrew V. Sutherland is a computational number theorist and Principal Research Scientist at MIT, and is using Compute Engine to explore generalizations of the Sato-Tate Conjecture and the conjecture of Birch and Swinnerton-Dyer to curves of higher genus. In his latest run, he explored 1017 hyperelliptic curves of genus 3 in an effort to find curves whose L-functions can be easily computed, and which have potentially interesting Sato-Tate distributions. This yielded about 70,000 curves of interest, each of which will eventually have its own entry in the L-functions and Modular Forms Database (LMFDB).

Mapping Human Settlement Around the Earth

CityLab, Linda Poon

from April 14, 2017

As the human population grows, so does its footprint. To map these changes, researchers often turn to satellite imagery, because government-collected data can be infrequent and outdated. In particular, nighttime light images can offer a wealth of information about human activity. In fact, as CityLab’s Richard Florida has written, more than 3,000 studies since 2000 have used nighttime lights as a proxy for all sorts economic activities.

But nighttime maps aren’t perfect. “If you need to figure out how large a city is and where the boundary of a city ends, lights will spread, and a city will look too large relative to its actual size,” says Amit Khandelwal, director of the Chazen Institute for Global Business at Columbia Business School. And there’s another problem: Satellite sensors have a saturation point that limits their ability to distinguish between different levels of light. That means an extremely bright place like Midtown Manhattan may appear to be equally bright as other parts of New York, even though it has more activity.

Facebook CTO Mike Schroepfer On The Company’s AI Progress–And What’s Next

Fast Company, Harry McCracken

from April 18, 2017

First Facebook built a world-class machine-vision team. Now it’s turning algorithms into products, with much more to come.

The science of saving endangered species

The San Diego Union-Tribune, Allison Alberts

from April 20, 2017

While zoo detractors may claim that animals do not seem “happy” under human care, this is a misdirected projection of our human values and perceptions onto other species that undoubtedly view the world vastly differently than we do. At San Diego Zoo Global, we are committed to an evidence-based approach that goes beyond personal opinions and perceptions. Our animal care specialists, veterinarians and scientists continuously monitor the health and well-being of the species in our care through scientific data collection and analysis. We track everything from nutrition and health status to stress, reproduction, and social behavior. Whenever our research suggests that adjustments to an individual’s environment is warranted, the outcome is carefully documented using the most advanced research tools available. Last year, accredited zoos across the country came together to support the development of ZooMonitor, an innovative app that allows animal care managers and zoo researchers to constantly improve the welfare of the species in our care.

Can We Improve National Security Using What We Know about Face Recognition?

Scientific American, Anna K. Bobak, Viktoria R. Mileva, Peter J.B. Hancock

from April 18, 2017

The take home message remains: unfamiliar face recognition is an extremely difficult task that only few get almost right. And this brings us back to the original question: can we improve national security with science? The answer is yes, in many ways. Firstly, we could do away with facial photo identification and use some other form of biometric. Secondly, when faces are used as the main biometric, human decision makers should be aware of their own limitations and apply the laboratory findings in their duty to keep us safe, for example working in pairs and receiving specific instructions. Finally, there is clear potential in national security agencies to use super-recognisers to keep dangerous criminals off the streets and away from our borders.

Planet enlists machine learning experts to parse a treasure trove of Amazon basin data

TechCrunch, Darrell Etherington

from April 20, 2017

Planet, the satellite imaging company that operate the largest commercial Earth imaging constellation in existence, is hosting a new data science competition on the Kaggle platform, with the specific aim of developing machine learning techniques around forestry research. Planet will open up access to thousands of image ‘chips,’ or blocks covering around 1 sauce kilometre, and will give away a total of $60,000 to participants who place in the top three when coming up with new methods for analyzing the data available in these images.

Planet notes that each minute, we lose a portion of forest the size of approximately 48 football fields, which is a heck of a lot of forest. The hope is that by releasing this data and hosting this competition, Planet can encourage academics and researchers worldwide to apply advances in machine learning that have been put to great use in efforts like facial recognition and detect, to this pressing ecological problem.

The Great AI Recruitment War: Amazon Is On Top, And Apple Is Almost Nowhere To Be Seen

Forbes, Aaron Tilley

from April 18, 2017

When Matt Zeiler finished his PhD in machine learning from New York University in 2013, the tech giants came scrambling.

While he was graduating, the young artificial intelligence researcher won the ImageNet competition, which pits teams against each other for who can get the best image recognition. His team beats outs giants like Google. His teammate, the NYU Professor Rob Fergus, was quickly picked up by Facebook, and the tech giants were hungry for Zeiler too. Google, where Zeiler had previously interned, offered him a position, but so did Microsoft, Apple and Facebook. A bidding war started. Facebook CEO Mark Zuckerberg even met with him personally, Zeiler said.

What all those scientists on Twitter are really doing

Nature News & Comment, Sarah McQuate

from April 20, 2017

Analysis reveals that female researchers are over-represented on the social-media site and that mathematicians and life scientists are less likely to use it.

Using Data Science to Improve Traffic Safety

Microsoft Research NY

from April 19, 2017

As U.S. traffic deaths continue to rise, cities across America are increasingly focused on eliminating crash-related injuries and fatalities. Data can be a powerful resource in these efforts to make streets safer. We’re happy to support this effort, partnering with DataKind, which recently completed the Vision Zero Labs Project. This effort worked to develop valuable analytical models and tools to help the cities of New York, Seattle and New Orleans further their work to increase road safety.

In partnership with DataKind, a nonprofit that harnesses the power of data science in service of humanity, and the New York City Department of Transportation, we launched this project in August 2015, joining forces with the Seattle Department of Transportation and the City of New Orleans’ Office of Performance and Accountability in March 2016. With these cities, the Vision Zero Labs Project has become the first and largest multi-city, data-driven collaboration of its kind to drive traffic safety efforts in the U.S.

Machine Learning Storms Into Climate Research

The Next Platform, Jeffrey Burt

from April 18, 2017

The fields where machine learning and neural networks can have positive impacts seem almost limitless. From healthcare and genomics to pharmaceutical development, oil and gas exploration, retail, smart cities and autonomous vehicles, the ability to rapidly and automatically find patterns in massive amounts of data promises to help solve increasingly complex problems and speed up discoveries that will improve lives, create a heathier world and make businesses more efficient.

Climate science is one of those fields that will see significant benefits from machine learning, and scientists in the field are pushing hard to see how the technology can help them address the challenges presented by climate change. Those efforts were on display at the recent Conference on Artificial and Computational Intelligence and Its Applications to the Environmental Sciences run by the American Meteorological Society, where one of the tracks focused on machine learning and others looked at complementary technologies like big data and cloud computing.

Events

CITP Conference: Ethics of Computer Science Research

Princeton University, Center for Information Technology Policy

from May 05, 2017

Princeton, NJ Friday, May 5, starting at 9 a.m., Princeton University (Frist Campus Center, Muti-Purpose Rooms B & C) [please RSVP]

Join us for the launch of the SAP Next-Gen program in New York in partnership with Hasso Plattner Institute

SAP, Hasso Plattner Institute

from April 27, 2017

New York, NY April 27 at
10 Hudson Yards [free, waiting list only]

Deadlines

Google seeking input on next directions in CS Education Research

Feel free to share this survey with others who may be interested in sharing their insights.

Computational Creativity & Games Workshop – Computational Creativity & Games Workshop

Atlanta, GA June 19, an ICCC’17 Workshop. Deadline for paper submissions is April 25.

NYU Center for Data Science News

Women in Data Discuss Data Science, Design-Thinking, and Healthcare

NYU Center for Data Science

from April 20, 2017

Earlier this month, CDS hosted the Women in Data (WID) event, run by Norma A. Padron and Tran Ly. The two women have recently rebooted the non-profit organization, which was founded 5 years ago. Today, WID exists to educate and foster a community of women in data-driven fields like journalism, health care, finance, and technology.

In this session, both women discussed how to bridge data science techniques and design thinking to improve the healthcare industry. Padron, a current board member of Women in Data and Associate Director at the Main Line Health Center for Population Health Research at the Lankenau Institute for Medical Research, began by explaining the importance of implementing new interdisciplinary approaches like computational design (which uses data to inform how physical spaces are constructed) to improve healthcare.

Tools & Resources

Failing to Answer These 8 Questions Can Sink a Startup Data Science Team

Inc.com, Quora, Monica Rogati

from April 17, 2017

If you’re establishing a data science team simply because its trendy or your investors asked, you’re setting your company up for across-the-board frustration.

The Pitbull Framework: Next-Gen Data Warehousing in the Cloud

Medium, Michael DC Bowen

from April 20, 2017

Warehousing in the Cloud

I want to begin to tell you a bunch of stories about where we are in our Big Data & Data Warehouse practice at Full360. We’ve been improving our capabilities for several years and now is the time to begin getting in front of the public and letting you know what’s new. The Pitbull framework is something we’ve been doing since 2014, it’s pretty formalized now that we’ve had some time working with it. In fact, it’s Pitbull 2.0 but the fundamentals haven’t changed.

Pitbull is a cloud-native way of managing the flow of data through a data warehouse. It consists primarily of three independent processes. The Producer, the Ingestor and the Transformer. These work on each stream of data relevant to your data warehouse.

Teaching Statistics: A Bag of Tricks (second edition)

Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science blog

from April 20, 2017

“Hey! Deb Nolan and I finished the second edition of our book, Teaching Statistics: A Bag of Tricks.”

d3.annotation: Design & Modes

Susie Lu

from April 20, 2017

“I started this library by gathering examples of annotations that I liked. From those examples, the majority of use cases followed a pattern: a subject (the thing the annotation is annotating), a note, and a connector joining the note to the subject.”

Release ‘open’ data from their PDF prisons using tabulizer

rOpenSci, Thomas J. Leeper

from April 18, 2017

“There is no problem in science quite as frustrating as other peoples’ data. Whether it’s malformed spreadsheets, disorganized documents, proprietary file formats, data without metadata, or any other data scenario created by someone else, scientists have taken to Twitter to complain about it. As a political scientist who regularly encounters so-called “open data” in PDFs, this problem is particularly irritating.”

Announcing Datazar v2.0

Datazar Blog, Aman Tsegai

from April 19, 2017

“Now you can analyze any dataset using R and Python with the notebook or console interfaces right in your browser. All the computation is done on Datazar’s servers so you can literally do it using a Chromebook.”

Careers

Full-time positions outside academia

NEON Observatory Director/Chief Scientist

Battelle; Boulder, CO

Postdocs

Postdoctoral Associate | The IECA

Stony Brook University, Alan Alda Center for Communicating Science; Stony Brook, NY

Elevate 2-year postdoctoral fellowship

Mitacs; Multiple Locations in Canada

Postdoctoral Research Fellow for the “Responsible Terrorism Coverage”

University of Mannheim, Mannheim Center for European Social Research; Mannheim, Germany

Full-time, non-tenured academic positions

Program Coordinator

University of Washington, Center for Collaborative Systems for Security, Safety, and Regional Resilience; Seattle, WA

Sports.BradStenger.com

Data Science newsletter – April 21, 2017

Leave a Comment Cancel reply