Data Science newsletter – September 7, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 7, 2017

GROUP CURATION: N/A

 
 
Data Science News



Hurricane Harvey is a billion-dollar disaster – America’s 10th in 2017

The Guardian, Joanna Walters


from

Even before Hurricane Harvey began to form, out in the north Atlantic, federal climate experts signalled that 2017 was going to be a bad year for weather and climate catastrophes in America. According to scientists with the National Oceanic and Atmospheric Administration (Noaa), there had already been nine climate and weather-related events that caused more than $1bn of damage.

This “rapid succession of disaster events” from 1 January to 7 July 2017 was part of a years-long increaseand cost a total of $16bn and 57 lives. Such events included flooding in California, Missouri and Arkansas; hailstorms in Colorado and Minnesota; and a sudden, crop-killing spring freeze across South Carolina and Georgia that destroyed an abnormally early fruit blossom.

It is possible that on top of this, economic damage from Harvey will exceed the inflation-adjusted $160bn cost of Hurricane Katrina in 2005, Noaa experts told the Guardian.

But despite Noaa’s work to track what is happening to climate, weather and the oceans, and how society is affected and might react, Donald Trump proposed earlier this year slashing the administration’s budget by almost $1bn. That would cut US and international research as well as funding for other environmental studies and protections.


United Technologies Hires IBM Vet Steve Abrams to Lead Data Science Efforts

Cetus News, WSJ


from

United Technologies Corp. hired Steve Abrams as its chief data scientist as the industrial conglomerate works to expand its Internet of Things efforts.

He started Aug. 28 and reports to Vince Campisi, UTC’s senior vice president for digital and chief information officer. Dr. Abrams announced the move in a LinkedIn post.


Human Evolution in Action: Natural Selection Means We Are Hitting Puberty Later, Huge Genetic Study Indicates

Newsweek, Hannah Osborne


from

The team led by Columbia University’s Hakhamanesh Mostafavi looked at the genomes of 210,000 people from Europe taken from the Resource for Genetic Epidemiology Research on Aging (GERA) Cohort and U.K. Biobank. To make up for the lack of older people in the dataset, the team used the age of parental death for people in the Biobank genomes as a proxy.


Dataiku raises $28m to enhance data science platform and double staff

ZDNet, Tas Bindi


from

Dataiku has announced raising $28 million in a Series B round led by Battery Ventures, with participation from FirstMark, Serena Capital, and Alven.

The Series B round brings the total amount raised by the New York City-headquartered data science software company to approximately $45 million.

Dataiku said the funding will be allocated across three areas: Development, marketing, and recruitment.


Learning a Multi-View Stereo Machine

The Berkeley Artificial Intelligence Research Blog, Abhishek Kar


from

Consider looking at a photograph of a chair. We humans have the remarkable capacity of inferring properties about the 3D shape of the chair from this single photograph even if we might not have seen such a chair ever before. A more representative example of our experience though is being in the same physical space as the chair and accumulating information from various viewpoints around it to build up our hypothesis of the chair’s 3D shape. How do we solve this complex 2D to 3D inference task? What kind of cues do we use? How do we seamlessly integrate information from just a few views to build up a holistic 3D model of the scene?


Putting the “Science” Back in Data Science

KDnuggets, Rubens Zimbres


from

Lately I’ve seen a lot of hype surrounding — and lots of newcomers to — the Data Science field. But what exactly is SCIENCE in Data Science? The scientific method to approach a problem, in my point of view, is the best way to tackle a problem and offer the best solution. If you start your data analysis by simply stating hypotheses and applying Machine Learning algorithms, this is the wrong way.

The picture below shows the steps necessary for scientific research, corresponding data analysis and simulation. In fact, it is a sketch of what I did in my PhD thesis. In a few words, I studied the past 27 years of Business Management literature and I tried to develop an epistemologically disruptive approach to measure and predict service quality, mixing Business Administration with Electrical Engineering concepts. Over the course of 4 years I performed quali-quantitative longitudinal research and developed a simulation using Agent-Based Modeling to try to find a 5 State Cellular Automata rule that could mimic human behavior. I approached Complexity concepts, self-organizing systems, emergence of order, and social networks.


IBM pitched Watson as a revolution in cancer care. It’s nowhere close

STAT, Casey Ross and Ike Swetlitz


from

It was an audacious undertaking, even for one of the most storied American companies: With a single machine, IBM would tackle humanity’s most vexing diseases and revolutionize medicine.

Breathlessly promoting its signature brand — Watson — IBM sought to capture the world’s imagination, and it quickly zeroed in on a high-profile target: cancer.

But three years after IBM began selling Watson to recommend the best cancer treatments to doctors around the world, a STAT investigation has found that the supercomputer isn’t living up to the lofty expectations IBM created for it. It is still struggling with the basic step of learning about different forms of cancer. Only a few dozen hospitals have adopted the system, which is a long way from IBM’s goal of establishing dominance in a multibillion-dollar market. And at foreign hospitals, physicians complained its advice is biased toward American patients and methods of care.


The secret to Germany’s scientific excellence

Nature News & Comment, Alison Abbott


from

Ask any German researcher why the country’s science base is blooming, and they are bound to mention Chancellor Angela Merkel. The world’s most powerful woman, they say, has not forgotten her roots as an East German physicist.

During a decade of global financial turbulence, her government has increased annual science budgets in a stable, predictable, quintessentially German way. It has spurred competition among universities and improved collaboration with the country’s unique publicly funded research institutions. Under Merkel’s watch, Germany has maintained its position as a world leader in areas such as renewable energy and climate; and with the guarantee of strong support for basic research, its impact in other sectors has grown.


Freshman’s company uses big data to improve cancer diagnoses

Cornell Chronicle


from

Abu Qader ’21 is just settling into his room at Mary Donlon Hall. Like any new Cornell freshman, he’s been spending lots of time joining clubs and hanging out with new friends on his floor.

But unlike lots of freshman, Qader is also running a company, GliaLab, which is developing software that can improve the accuracy of breast cancer diagnoses, especially in developing countries. He’s been featured by both Google and TedX Teen, talking about his work and his passion for combining computer science and health care.


FDA seeking digital health entrepreneur-in-residence

CNBC, Christina Farr


from

The U.S. Food and Drug Administration is so hip these days that it’s looking for its first entrepreneur-in-residence.

The agency today announced that its digital health unit is recruiting fellows to help it with tasks like assessing product quality and developing new ways to collect data.

Candidates are expected to spend at least three days a week on-site, according to a post on the FDA website.

 
Events



ACM Data Science Camp 2017

SF Bay ACM


from

San Jose, CA October 14. Data Science Camp is SF Bay ACM’s annual event combining sessions, keynote, and optional tutorial (extra-fee). [$$]

 
Deadlines



Michigan Life Sciences Fellows

The Medical School, the College of Pharmacy, and College of Literature, Science, and the Arts (LSA), and the Life Sciences Institute (LSI) at the University of Michigan have partnered to offer the Michigan Life Sciences Fellows program, a highly competitive postdoctoral fellowship program to enhance the research program of the entire life sciences enterprise at Michigan. Deadline for applications is October 16.
 
NYU Center for Data Science News



NYU Center for Data Science’s Ph.D. program begins with inaugural cohort of 4 students

Medium, NYU Center for Data Science


from

The inaugural cohort of Ph.D. students at the NYU Center for Data Science (CDS) begin classes today.

The Center’s Ph.D. program in Data Science is the first of its kind in the country.


Using data to learn about successful online interactions

Medium, NYU Center for Data Science


from

What makes a successful online interaction? In a working paper that combines sociology with data science, affiliated faculty member at CDS and sociology professor Paul DiMaggio, along with Clark Bernier (Princeton), Charles Heckscher (Rutgers), and David Mimno (Cornell) explore the rules that govern successful online conversations.

Titled “Interaction Ritual Threads: Does IRC Theory Apply Online?” the paper is “the first empirical application of [sociologist] Randall Collins’s theory of interaction ritual chains (IRC) to internal corporate interactions.”

 
Tools & Resources



The Total Newbie’s Guide to Cassandra

Insight Data, Max M McKittrick


from

Cassandra is a distributed NoSQL data storage system from Apache that is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure.

Cassandra’s data model is a partitioned row store with tunable consistency where each row is an instance of a column family that follows the same schema, and the first component (a component is a Cassandra data type) of a table’s primary key is the partition key.


Communication Tips for Data Scientists

Pivigo Blog, Alfie Lambert


from

When one thinks about the essential skills of a data scientist the first ones to come to mind are coding, business acumen and mathematics. However, a crucial and often overlooked skill is good communication. As a data scientist, you will be interacting with many different people and stakeholders. It is therefore essential to ensure that one can communicate effectively and also not be robotic. Here are my 3 top tips for effective communication for data scientists. I hope that you find these tips useful.

1. Body language


Publishing with Apache Kafka at The New York Times

Confluent, Boerge Svingen


from

This article describes a new approach we developed to solving this problem, based on a log-based architecture powered by Apache KafkaTM. We call it the Publishing Pipeline. The focus of the article will be on back-end systems. Specifically, we will cover how Kafka is used for storing all the articles ever published by The New York Times, and how Kafka and the Streams API is used to feed published content in real-time to the various applications and systems that make it available to our readers. The new architecture is summarized in the diagram below, and we will deep-dive into the architecture in the remainder of this article.


Jepsen Offers System-Wide Approach to Testing Databases at Scale

The New Stack, Mark Boyd


from

“Jepsen is a project to analyze distributed systems under stress and to understand if concurrency problems are going to manifest, not only in applications but in databases and queues and so forth,” said creator Kyle Kingsbury.” “We want to verify that the safety properties of a distributed system hold up, given concurrency, non-determinism, and partial failure.”

The software creates a database instance and then sets up a number of tests that run through example operations that might typically be performed on a distributed database at scale. Then verification logs are checked and reports created to ensure that the results of those operations have kept the data records in sync.

 
Careers


Postdocs

Postdoctoral Researcher Position in Computational HCI



Aalto University; Espoo, Finland

Leave a Comment

Your email address will not be published.