NYU Data Science newsletter – January 26, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for January 26, 2016

GROUP CURATION: N/A

Data Science News

Scientists Shed New Light on Workings of Genetic Regulation

NYU News

from January 25, 2016

A team of scientists has uncovered greater intricacy in protein signaling than was previously understood, shedding new light on the nature of genetic production.

Christine Vogel, an assistant professor in New York University’s Department of Biology and one of the study’s senior authors, explains that “to make a protein, we need to make a messenger RNA molecule from the gene encoded in the DNA, and then, in a second process, make proteins from these RNA molecules. Both processes are highly regulated and coupled.”

A new quantum approach to big data

MIT News

from January 25, 2016

From gene mapping to space exploration, humanity continues to generate ever-larger sets of data — far more information than people can actually process, manage, or understand.

Machine learning systems can help researchers deal with this ever-growing flood of information. Some of the most powerful of these analytical tools are based on a strange branch of geometry called topology, which deals with properties that stay the same even when something is bent and stretched every which way.

Such topological systems are especially useful for analyzing the connections in complex networks, such as the internal wiring of the brain, the U.S. power grid, or the global interconnections of the Internet. But even with the most powerful modern supercomputers, such problems remain daunting and impractical to solve. Now, a new approach that would use quantum computers to streamline these problems has been developed by researchers at MIT, the University of Waterloo, and the University of Southern California.

The Bits And Bytes Of The Machine’s Storage

The Next Platform

from January 25, 2016

By now, we have a pretty good sense for the basic topology of The Machine from Hewlett Packard Enterprise. In it are massive amounts of fabric memory, any node and application can access it, no matter where they are executing. In there somewhere, though, is your file, your object, your database table. You know it’s yours and only you have the right to access it. So, what in The Machine is ensuring that only you get to access it? And, in doing so, still allow you efficient access to it.

Folks designing The Machine speak absolutely correctly about the need to have integrated security and integrity throughout The Machine’s design. So let’s start by looking at one very low level aspect of that integrated security.

Data Lakes and Data Governance

Bloomberg BNA

from January 25, 2016

Imagine this situation. You represent a multinational pharmaceutical company. You have a couple hundred scientists around the world who are working on a variety of projects. They have been saving their material in different formats on a shared server. You now need to translate all of those documents to respond to a government investigation for a regulatory action. You assign the translation to a language service provider. The documents are sent back and forth between the linguists, quality control teams and your company. You now realize they are making copies of your documents during the translation process. And you also realize that law firms and legal service providers are the number one target for industrial espionage. How do you protect your client’s confidences?

The answer may lie in controlling the various repositories or “data lakes” where documents like these are increasingly being housed. Establishing a data governance strategy and building a dam will help minimize risks.

County Meath to Host First Facebook Data Center in Ireland

Facebook, Newsroom

from January 24, 2016

Today we are pleased to announce that Clonee, County Meath, will be the site for our newest data center. The Clonee data center will be our first in Ireland and follows Luleå, in Sweden, as our second in Europe. The facility will become part of the infrastructure that enables billions of people to connect with the people and things they care about on Facebook, Messenger, Instagram and more.

Clonee will be packed full of cutting-edge technology, making it one of the most advanced, efficient and sustainable data centers in the world. All the racks, servers, and other components have been designed and built from scratch as part of the Open Compute Project, an industry-wide coalition of companies dedicated to creating energy- and cost-efficient infrastructure solutions and sharing them as open source.

Data Sharing and the Journal

New England Journal of Medicine

from January 25, 2016

We want to clarify, given recent concern about our policy, that the Journal is committed to data sharing in the setting of clinical trials. As stated in the Institute of Medicine report from the committee1 on which I served and the recent editorial by the International Committee of Medical Journal Editors (ICMJE), we believe there is a moral obligation to the people who volunteer to participate in these trials to ensure that their data are widely and responsibly used. Journal policy will therefore follow that outlined in the ICMJE editorial and the IOM report: when appropriate systems are in place, we will require a commitment from authors to make available the data that underlie the reported results of their work within 6 months after we publish them.

The first NEJM editorial Data Sharing from January 21.

Backlash to the first editorial, like this from STAT from January 26, Critism of ‘research parasites’ moves NEJM in the wrong direction.

NPR editorial by Harlan Krumholz from Yale School of Medicine, January 27.

Beyond the Data – Staying Ahead of the Curve: Modeling and Public Health Decision Making

YouTube, Centers for Disease Control and Prevention

from January 22, 2016

In this session of Beyond the Data, Dr. Phoebe Thorpe, Dr. Lauren Meyers, and Dr. Martin Meltzer discuss modeling and its public health applications. Tune in to hear these experts explain what modeling is, how we can use it effectively during infectious disease outbreaks, and how modelers communicate with leaders and decision makers to benefit the public.

Marvin Minsky, “father of artificial intelligence,” dies at 88 | MIT News

MIT News

from January 25, 2016

Marvin Minsky, a mathematician, computer scientist, and pioneer in the field of artificial intelligence, died at Boston’s Brigham and Women’s Hospital on Sunday, Jan. 24, of a cerebral hemorrhage. He was 88.

Minsky, a professor emeritus at the MIT Media Lab, was a pioneering thinker and the foremost expert on the theory of artificial intelligence. His 1985 book “The Society of Mind” is considered a seminal exploration of intellectual structure and function, advancing understanding of the diversity of mechanisms interacting in intelligence and thought. Minsky’s last book, “The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind,” was published in 2006.

Events

NYU Steinhardt Presents Discussion on Video Games as Learning Tools

NYU’s Steinhardt School of Culture, Education, and Human Development will host “Making Waves: Your Brain on Video Games,” a conversation on the promising yet controversial use of video games as tools for learning and to enhance brain functions. The event features Jan Plass, the NYU Steinhardt Paulette Goddard Chair in Digital Media and Learning Sciences and co-director of the Games for Learning Institute. Plass is a leader in improving the ways video games are designed as educational tools.

Tuesday, Feb. 9 at 6 p.m. at NYU’s Global Center for Academic and Spiritual Life (238 Thompson St., 5th Floor).

Untitled: Art Datathon | MoMA

The Untitled: Art Datathon is a two-day workshop in which multidisciplinary teams detect art world trends using data about art, including MoMA collection data that was released on GitHub last year. Datathons challenge participants to come up with research designs that can utilize specific data—in this case, collection data from museums—to create models, figures, maps, and other presentations of findings.

The event is free, but an application is required for participation. New York City college and university students of all levels are encouraged to apply.

Friday-Saturday, February 19-20 at
Cullman Education and Research Building, MoMA

Deadlines

Pacific Humanitarian Challenge: How Robots Are #RethinkingResponse

deadline: subsection?

The Pacific Humanitarian Challenge is seeking innovative ways to prepare and respond to natural disasters in the Paci?c. As the application deadline approaches, Patrick Meier, author of Digital Humanitarians, and a leading thinker in humanitarian technology and innovation, shares his thoughts about innovation in this space.

Application deadline for Pacific Humanitarian Challenge is Sunday, February 7, 2016.

NYPL Labs Remix Residency

deadline: subsection?

NYPL Labs is pleased to announce our first Remix Residency, designed to spur transformative, interesting, beautiful new uses of our digital collections.

Deadline to fill out the submission form is Friday, February 19.

CDS News

[2016 Spring] POL-GA 1251, Quantitative Political Analysis II with Prof Cyrus Samii

NYU Center for Data Science, NYU Politics Department

from January 26, 2016

This course provides a current perspective on estimating causal effects in social science research. The
approach is rooted in non-parametric and semi-parametric frequentist methods, which have at their core
techniques based on least squares regression. We will emphasize research design, causal identification, and
robust estimation and inference. Other methods of estimation will be introduced as well and related to least
squares methods.

The course requires that students have working knowledge of probability theory, matrix algebra, and calculus at the level of POL-GA 1250, “Quant I.” The course is intended to provide foundational methodological
training to PhD students in politics as part of their required sequence of courses toward their degree. Only
PhD students in Politics will be allowed to take the course for a grade.

Tools & Resources

Microsoft moves its CNTK deep learning toolkit from CodePlex to GitHub

VentureBeat, Jordan Novet

from January 25, 2017

Microsoft is announcing today that it’s moving the repository for its Computational Network Toolkit (CNTK) open-source deep learning software from Microsoft’s CodePlex source code repository hosting site to GitHub, a popular site for hosting open-source projects.

In addition to migrating the project, Microsoft is also dropping the Microsoft Research License that’s been on it since it became available on CodePlex in April 2015 — the license limits the software to non-commercial uses — in favor of a more permissive MIT license.

NYU Shortcourse — “Data Science and Social Science” materials

GitHub, pablobarbera/data-science-workshop

from January 22, 2016

This is a three-day short course covering key topics at the intersection of Data Science and Social Science. Each day is structured as a series of modules that will combine instruction on data science methods with implementation using real data in R. his course covers an introduction to the R programming and statistical language, modeling and visualization, automated textual analysis, social network analysis, and web scraping & APIs.

YFCC100M: The New Data in Multimedia Research

Communications of the ACM

from January 25, 2016

… Today, a large number of shared digital media objects have been uploaded to services like Flickr and Instagram, which, along with their metadata and social ecosystem, form a vibrant environment for finding solutions to many research questions at scale. Photos and videos provide a wealth of information covering entertainment, travel, personal records, and various other aspects of life as they were when taken. Viewed collectively, they represent knowledge beyond what is captured in any individual snapshot and provide information on trends, evidence of phenomena or events, social context, and societal dynamics. Consequently, media collections are useful for qualitative and quantitative empirical research in many domains. However, scientific endeavors in fields like social computing and computer vision have generally relied on independently collected multimedia datasets, complicating research and synergy. Needed is a more substantial dataset for researchers, engineers, and scientists around the globe.

To address the call for scale, openness, and diversity in academic datasets, we take the opportunity in this article to present a new multimedia dataset containing 100 million media objects we developed over the past two years and explain the rationale behind its creation.

Sports.BradStenger.com

NYU Data Science newsletter – January 26, 2016

Leave a Comment Cancel reply