NYU Data Science newsletter – April 12, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 12, 2016

GROUP CURATION: N/A

Data Science News

OpenTrials: towards a collaborative open database of all available information on all clinical trials | Trials | Full Text

BioMed Central

from April 08, 2016

OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine.

How Academia Is Shaping Ad Tech And Platform Business Models

AdExchanger

from April 06, 2016

Academia has a unique role to play in supporting the digital economy, as its members bring a completely different perspective to studying ad auctions.

The academic underpinning of ad auctions isn’t new – Hal Varian, Google chief economist and early AdWords designer, was also the founding dean of Berkeley’s School of Information, while another Berkeley professor, Steve Tadelis, was given unprecedented research access to eBay’s advertising data for a study that led to the retirement of the company’s search offering.

The academic, economic and societal impacts of Open Access: an evidence-based review

F1000Research

from April 11, 2016

Ongoing debates surrounding Open Access to the scholarly literature are multifaceted and complicated by disparate and often polarised viewpoints from engaged stakeholders. At the current stage, Open Access has become such a global issue that it is critical for all involved in scholarly publishing, including policymakers, publishers, research funders, governments, learned societies, librarians, and academic communities, to be well-informed on the history, benefits, and pitfalls of Open Access. In spite of this, there is a general lack of consensus regarding the advantages or disadvantages of Open Access at multiple levels. This review aims to to be a resource for current knowledge on the impacts of Open Access by synthesizing important research in three major areas of impact: academic, economic and societal.

Also:

OpenTrials: towards a collaborative open database of all available information on all clinical trials (April 8, BioMed Central)

Sharing all types of clinical data and harmonizing journal standards (April 03, BMC Medicine )

Deep Learning and Understandability versus Software Engineering and Verification by Peter Norvig

YouTube, Silicon Valley Deep Learning Group

from March 18, 2016

Silicon Valley Deep Learning Group is honored to host Peter Norvig. Peter talks about Deep Learning and Understandability versus Software Engineering and Verification.

12 Inspiring Women In Data Science, Big Data

InformationWeek

from April 11, 2016

Women make up half the population, yet it’s been well documented that they don’t come close to parity in STEM fields. Could the rise of big data and data science offer women a clearer path to success in technology? Here’s a list of 12 inspiring women who work in big data and data science.

How Michigan Is Using Math to Fight Potholes

CityLab, Aarian Marshall

from April 11, 2016

A new statistical model might make the annual spring road repair cheaper for the Mitten State—and for the trucking industry.

‘Future Commerce’ Explores Innovation in Financial Technology

MIT Technology Review

from April 11, 2016

This new online MIT course provides an expert look at emerging “fintech” trends and an opportunity to shape the future of transactions and commerce.

Sharing all types of clinical data and harmonizing journal standards

BMC Medicine (Full Text)

from April 03, 2016

Despite recent efforts to enforce policies requiring the sharing of data underlying clinical findings, current policies of biomedical journals remain largely heterogeneous. As this heterogeneity does not optimally serve the cause of data sharing, a first step towards better harmonization would be the requirement of a data sharing statement for all clinical studies and not simply for randomized studies. Although the publication of a data sharing statement does not imply that all data is made readily available, such a policy would swiftly implement a cultural change in the definition of scientific outputs. Currently, a scientific output only corresponds to a study report published in a medical journal, while in the near future it might consist of all materials described in the manuscript, including all relevant raw data. When such a cultural shift has been achieved, the logical conclusion would be for biomedical journals to require authors to make all data fully available without restriction as a condition for publication.

The Future Of Big Data Is Bigger Than You Can Possibly Imagine | Co.Exist | ideas + impact

Fast Company

from April 11, 2016

Imagine a world without government, schools, a legal system, law enforcement, or companies. It’s a world unlike the one we currently live in—but based on the evolution of technology and how we use it—representative of what the world may become.

Imagine a computer infrastructure that could—with global knowledge and the ability to enact precise tweaks to the social and economic structure—drive the evolution of society. This is the idea behind the Universal Graph.

SAS: Understanding Zika with Zetabytes

FederalNewsRadio.com

from April 11, 2016

Today’s guest is Dr. Steve Bennett whose official title is principal product marketing consultant at SAS. … You had better strap on your thinking cap for this interview. Dr. Bennett ranges from zetabytes to the Zika virus. You will gain a deeper understanding of biosurveillance, big data, old fashioned analytics, predictive analytics, and advanced analytics. [audio, 41:07]

Thirteen Anonymous Genetic Superheroes Walk Among Us—Or Do They? – The Atlantic

The Atlantic, Ed Yong

from April 11, 2016

Stephen Friend never thought he’d go looking for superheroes.

“The idea came from frustration and unfulfilled quests,” says Friend, a physician at the Icahn School of Medicine at Mount Sinai. For decades, he had watched geneticists trying to find the genes that underlie diseases as diverse as cystic fibrosis, Alzheimer’s disease, and schizophrenia. Such studies have been undoubtedly successful, but the growing list of culprit genes have rarely led to treatments for their respective disorders.

“I had an a-ha moment,” says Friend. “If you want to find a way of preventing disease, you shouldn’t be looking at people with the disease. “You should look at people who should have been sick but aren’t.”

‘New Rembrandt’ to be unveiled in Amsterdam

The Guardian

from April 05, 2016

Portrait is not a lost work by Dutch master but a 3D printed painting made by software that distilled the features of a Rembrandt.

Events

West Big Data Innovation Hub All Hands Meeting

Join us for the first All Hands Meeting of the West Big Data Innovation Hub (WBDIH): a gathering of data enthusiasts looking to share resources, launch collaborations, and strengthen our community. The WBDIH is one of four regional hubs funded by the National Science Foundation to address societal challenges through big data innovation. We strive to build public/private partnerships with academia, industry, government, and nonprofits throughout our region.

Berkeley, CA. Thursday, May 12, starting at 8 a.m., UC Berkeley International House (2299 Piedmont Avenue)

Big Data Science Conferences – ODSC Boston

The Big Data Science conference gathers the world’s best big data specialists and data scientists to share their expertise and insights from across multiple industries.

Friday-Sunday, May 20-22, in Boston. Ticket prices start at $159.

Deadlines

JSM Data Art Show

deadline: subsection?

Explore the “art” in data art with a new JSM exhibit featuring data artists. Positioned just outside the exhibit hall, this feature will explore the relationship between data and art, which promises to be both amazing and beautiful. You can stop by to admire the pieces or submit a piece of your own.

Deadline for submissions is Sunday, May 15.

Tools & Resources

Saving 13 Million Computational Minutes per Day with Flame Graphs

The Netflix Tech Blog

from April 11, 2016

… An on-going focus for the Netflix performance team is to proactively look for optimizations in our service and infrastructure tiers. It’s possible to save hundreds of thousands of dollars with just a few percentage points of improvement. Given that one of our largest workloads is primarily CPU-bound, we focused on collecting and analyzing CPU profiles in that tier. Fortunately with the work Brendan Gregg pushed forward to preserve the frame pointer in the JVM we can easily capture CPU sampling profiles using Linux perf_events on live production instances and visualize them with flame graphs.

The new Impactstory: Better. Freer.

Impactstory blog

from April 08, 2016

We are releasing a new version of Impactstory!

We baked what we’ve learned from hundreds of conversations with researchers into a sleeker, leaner, more useful Impactstory.

New Tools to Summarize Text

Fast Forward Labs

from April 11, 2016

We’re excited to introduce the latest report and prototype from our machine intelligence R&D group! In this iteration, we explore summarization, or neural network techniques for making unstructured text data computable.

Making language computable has been a goal of computer science research for decades. Historically, it has been a challenge to merely collect and store data. But it’s now so cheap to store data that we often have the opposite problem: once we’ve data, how should we analyze it to find meaning and insights?

A Billion Taxi Rides on Google’s Dataproc running Presto

Mark Litwintschik

from April 11, 2016

On February 22nd Google announced that their managed Hadoop and Spark offering “Dataproc” was out of beta and generally available. In this blog post I’ll load the metadata of 1.1 billion NYC taxi journeys into Google Cloud Storage and see how fast a Dataproc cluster of five machines can query that data using Facebook’s Presto as the execution engine.

Sports.BradStenger.com

NYU Data Science newsletter – April 12, 2016

Leave a Comment Cancel reply