NYU Data Science newsletter – March 1, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 1, 2016

GROUP CURATION: N/A

 
Data Science News



How Big Data Is Changing Chemical Manufacturing

Environmental Leader


from February 23, 2016

In a new report, Big Data and Analytics in Chemicals: From Cheminformatics and LIMS to Launch, Lux Research says novel sensors, materials and information technologies are making research and development more effective, integrating lab data with chemical databases and academic literature, and quickening the pace of innovation cycles.

 

President Weighs In on Data From Genes

The New York Times


from February 25, 2016

President Obama on Thursday waded into the complex and high-stakes debate over whether patients own their genetic information, saying that he believes that his tissues and any discoveries that stem from his DNA belong to him.

“I would like to think that if somebody does a test on me or my genes, that that’s mine, but that’s not always how we define these issues,” Mr. Obama said during a White House forum on a major biomedical research initiative he began last year.

 

Content ID and the Rise of the Machines

Electronic Frontier Foundation


from February 26, 2016

In 2007, Google built Content ID, a technology that lets rightsholders submit large databases of video and audio fingerprints and have YouTube continually scan new uploads for potential matches to those fingerprints. Since then, a handful of other user-generated content platforms have implemented copyright bots of their own that scan uploads for potential matches.

Platforms have no obligation to seek out and block infringing content, and such an obligation would entrench existing providers by stifling new platforms before they could achieve popularity. But if a large platform decides it’s in its interests to evaluate every item of user material for potential infringement, the process probably must be automated—at least in part. The problem comes when humans fall out of the picture. Machines are good at many things—making the final determination on your rights isn’t one of them.

 

How LIGO and ‘Physical Review Letters’ worked together to publish the paper of a lifetime

Inside Higher Ed


from February 24, 2016

On Sept. 9, 2015, scientists decided where they would publish evidence for gravitational waves. Five days later, they made the discovery.

 

Driverless Cars Brave Mean City Streets

Inside Science


from February 23, 2016

Cars without drivers face two different worlds: well-spaced highway or suburban roads, and narrow city streets.

A research project in the United Kingdom is tackling the challenges of the city. The Gateway Project, an $11.6 million partnership between the British government and industry, plans to deploy autonomous vehicles in the congested, narrow streets of London. The project will test its first vehicles in Greenwich, along the banks of the Thames, in the spring.

 

Biden’s cancer moonshot effort looks at Utah database

STAT


from February 26, 2016

Joe Biden talks a lot about the potential for “big data” to accelerate the fight against cancer. On Friday, he spent time learning about a model of that approach that’s already being put to work in Utah.

The star of the show in Salt Lake City, where the vice president led a panel discussion for his cancer “moonshot” effort at the University of Utah’s Huntsman Cancer Institute, was the Utah Population Database. The database houses genealogical and historical health records for upwards of 8 million people, a one-of-a-kind medical resource.

 

Vanderbilt, Google’s Verily to Launch Precision Medicine Initiative Cohort

GEN News Highlights


from February 25, 2016

The NIH has selected Vanderbilt University and Verily (formerly Google Life Sciences) to launch the first “pilot” phase of the Precision Medicine Initiative (PMI)’s planned research cohort of 1 million or more Americans from which to glean health and wellness data.

The Vanderbilt-Verily partnership was announced today by President Obama’s administration among actions intended to advance its $200-million-plus PMI, on the occasion of a White House summit promoting the effort, which is expected to attract 170 professionals.

 

Why Use a Paintbrush When You Can Make Mind-Bending Art With Code? | WIRED

WIRED, Design


from February 15, 2016

Computer code underpins many aspect of our lives. Usually we know exactly what we want that code to do—but what if we didn’t? This is the question posed by Los Angeles software artist Casey Reas, who employs code to form abstract, bewildering, and literally unexpected creations.

Reas received his MS from the MIT Media Arts and Sciences program, where he focused on “Behavior kinetic sculpture.” Before graduate school, he worked in visual design. Since he graduated in 2001, he’s worked as an academic and media artist. Each of his pieces begins as a logic-based system. He describes that system in code that translates it into a visual, digital form. Then he fine tunes the results; he doesn’t know exactly what he’ll get the first go-around—which he says is the beauty of it.

“That’s what drew me to this field,” say Reas. “Working with a slight amount of random calculations, I’m able to have some really unexpected things happen. I see things happening and make modifications to control and construct and tune to it to be what I want it to be.”

 

#AAASmtg Twitter NodeXL SNA Map and Report for Wednesday, 17 February

NodeXL Graph Gallery


from February 17, 2016


The graph represents a network of 5,716 Twitter users whose recent tweets contained “#AAASmtg”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 18,000 tweets. The network was obtained from Twitter on Wednesday, 17 February 2016 at 02:54 UTC.

 

Toward Ubiquitous Computational Life Sciences

Bio-IT World


from February 23, 2016

In April 1956, a refitted oil tanker carried fifty-eight shipping containers from Newark to Houston and changed global trade.

Today’s emerging software application containers and their growing importance for computational science and engineering are poised to launch a similar revolution. We have never been so close to ubiquitous computational life science (CLS) for every researcher and engineer, accessible everywhere.

High-performance computing (HPC) and life sciences software tools continue to progress, but the next big step toward ubiquitous CLS will come from novel software container technology which will dramatically facilitate software packaging and porting, ease of access and use, and drastically simplify software maintenance and support. Finally, CLS will be in the hands of every engineer.

 

Obama Is Using The Bully Pulpit To Set Patient Data Free

Forbes, Matthew Herper


from February 25, 2016

There’s something wondrous, or at least mildly cool, happening today in Washington. No, I’m not kidding.

There’s a meeting at the White House around President Obama’s Precision Medicine Initiative, and, as a result, we’re hearing about a whole host of initiatives to make anonymized patient data available to researchers, and to pool information that’s come from different studies. That’s something that really could speed up medical research.

“It’s more than collecting large amounts of data from people and following them,” says Eric Topol, director of the Scripps Translational Science Institute. “The other theme is to really push the open data world, to give people back their data, to share data. That’s really being pushed hard. This initiative is really an initiative for that.”

 

Using Apache Spark to predict attack vectors among billions of users and trillions of events – O’Reilly Radar

O'Reilly Radar, Ben Lorica


from February 25, 2016

In this episode of the O’Reilly Data Show, I spoke with Fang Yu, co-founder and CTO of DataVisor. We discussed her days as a researcher at Microsoft, the application of data science and distributed computing to security, and hiring and training data scientists and engineers for the security domain.

DataVisor is a startup that uses data science and big data to detect fraud and malicious users across many different application domains in the U.S. and China. [audio, 33:40]

 
Events



#StartupColumbia Festival Tickets — Friday, Apr 29



The #ColumbiaStartup Festival is more than just a celebration in April, it’s a semester long event beginning with the opening of the $250,000 Columbia Venture Competition (CVC) in January, the CVC finalist workshops in March, the live pitches to determine the CVC winners on April 28rd and of course the half-day conference on April 29th.

Friday, April 29, Columbia University, starting at 9:45 a.m.

 
Tools & Resources



Visual Genome API

Ranjay Krishna et al.


from February 25, 2016

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world.
However, models used to tackle the rich content in images for cognitive tasks are still being trained using the
same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked What vehicle is the person riding?”, computers will need to identify the objects in an image as well as the relationships riding (man, carriage) and pulling (horse, carriage)
in order to answer correctly that the person is riding a horse-drawn carriage.”

In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 100K images where each image has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun
phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations
represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and ques-
tion answers.

 

Leave a Comment

Your email address will not be published.