Data Science newsletter – November 7, 2016

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 7, 2016

GROUP CURATION: N/A

 
 
Data Science News



Georgia Tech Launches New Research on the Security of Machine-Learning Systems

Georgia Tech College of Computing


from October 31, 2016

A $1.5 million gift from Intel Corporation has established a new research center at the Georgia Institute of Technology dedicated to the emerging field of machine-learning (ML) cybersecurity with a focus on strengthening the analytics behind malware detection and threat analysis.

As the Intel Science & Technology Center for Adversary-Resilient Security Analytics (ISTC-ARSA) housed at Georgia Tech’s Institute for Information Security & Privacy (IISP), researchers will study the vulnerabilities of ML algorithms and develop new security approaches to improve the resilience of ML applications including security analytics, search engines, customized news feeds, facial and voice recognition, fraud detection, and more. Work at the ISTC-ARSA will compliment additional ML research conducted by the Machine Learning at Georgia Tech (ML@GT) research center, established in July in the College of Computing.


France makes its bid to be recognized as a global AI hub

VentureBeat, Chris O'Brien


from November 06, 2016


[1611.01332] Exploring the Long-Term Evolution of GRS 1915+105

arXiv, Astrophysics > High Energy Astrophysical Phenomena; D. Huppenkothen, L.M. Heil, D. W. Hogg, A. Müller


from November 04, 2016

Among the population of known galactic black hole X-ray binaries, GRS 1915+105 stands out in multiple ways. It has been in continuous outburst since 1992, and has shown a wide range of different states that can be distinguished by their timing and spectral properties. These states, also observed in IGR J17091-3624, have in the past been linked to accretion dynamics. Here, we present the first comprehensive study into the long-term evolution of GRS 1915+105, using the entire data set observed with RXTE over its sixteen-year lifetime. We develop a set of descriptive features allowing for automatic separation of states, and show that supervised machine learning in the form of logistic regression and random forests can be used to efficiently classify the entire data set. For the first time, we explore the duty cycle and time evolution of states over the entire sixteen-year time span, and find that the temporal distribution of states has significantly changed over the span of the observations. We connect the machine classification with physical interpretations of the phenomenology in terms of chaotic and stochastic processes.


The Risk to Civil Liberties of Fighting Crime With Big Data

The New York Times


from November 06, 2016

The American Civil Liberties Union, citing reports that the Chicago Police Department used a computer analysis to create a “heat list” that unfairly associated innocent people with criminal behavior, has warned about the dangers of the police using big data. Even companies that make money doing this sort of work warn that it comes with civil rights risks.

“We’re heading to a world where every trash can has an identifier. Even I get shocked at the comprehensiveness of what data providers sell,” said Courtney Bowman, who leads the privacy and civil liberties practice at Palantir Technologies, a company in Palo Alto, Calif., that sells data analysis tools. He has lectured on the hazards of predictive policing and the need to prove in court that predictive models follow understandable logic and do not reinforce stereotypes.


In scathing ruling, Federal Court says CSIS bulk data collection illegal

The Globe and Mail


from November 03, 2016

The Federal Court of Canada has faulted Canada’s domestic spy agency for unlawfully retaining data and for not being truthful with judges who authorize its intelligence programs. Separately, the court also revealed that the spy agency no longer needs warrants to collect Canadians’ tax records.

All this has been exposed in a rare ruling about the growing scope of Canadian intelligence collection disclosed by the court on Thursday. At issue is how the federal domestic spy service has been pushing past its legal boundaries in the name of collecting data, in hopes of rounding out the holdings of a little-known Canadian intelligence facility dubbed the “operational data analysis centre.”


Midwest universities form regional innovation alliance with $3.5M award

University of Michigan News


from November 03, 2016

A new four-university alliance will help researchers across the Midwest turn their inventions into marketable products to benefit society.

Supported by a $3.5 million grant from the National Science Foundation’s Innovation Corps program, the new alliance establishes the Midwest I-Corps Node. It is led by the University of Michigan Engineering’s Center for Entrepreneurship and includes the University of Illinois, Purdue University and the University of Toledo.

Together, these institutions have an annual research budget of $2.6 billion. The universities have sent more than 100 teams to the NSF National I-Corps program and trained more than 400 teams and 1,200 participants throughout the region. The node is designed to form the backbone of a network that educates, supports and connects academic researchers to the entrepreneurial ecosystem across the region.


[1611.01056] Immigrant community integration in world cities

arXiv, Physics > Physics and Society; Fabio Lamanna, Maxime Lenormand, María Henar Salas-Olmedo, Gustavo Romanillos, Bruno Gonçalves, José J. Ramasco


from November 03, 2016

Migrant and hosting communities face long-term challenges in the integration process. Immigrants must adapt to new laws and ways of life, while hosts need to adjust to multicultural societies. Integration impacts many facets of life such as access to jobs, real state and public services and can be well approximated by the extent of spatial segregation of minority group residence. Here we conduct an extensive study of immigrant integration in 53 world cities by using Twitter language detection and by introducing metrics of spatial segregation. In this way, we quantify the Power of Integration of cities (their capacity to integrate diverse cultures), and characterize the relations between cultures when they act in the role of hosts and immigrants.


#MooreInvention in academia

Gordon and Betty Moore Foundation


from November 04, 2016

What conditions are needed to cultivate innovation in an academic setting? Providing aspiring inventors with the funds and time needed to explore promising research pathways is vital. What other factors can help academic scientists become successful inventors?

Earlier this week at The Tech Museum of Innovation in San Jose, CA, four champions of invention—Carla Shatz of Stanford University, Venkatesh Narayanamurti of Harvard University, Ilan Gur of Lawrence Berkeley National Laboratory, and Anne DeGheest of HealthTech Capital—engaged in a passionate conversation about the conditions needed to cultivate successful inventors in academia.

 
Events



Python Office Hours



Online I’m [David Beazley] pleased to announce a new format for learning and having your Python-related programming questions answered. The Python office hour is a no-nonsense live session, professionally hosted by an expert, and limited to just 20 participants. [free, registration required]
 
Deadlines



J-PAL Behavioral Science & Crime Funding Opportunity

J-PAL North America is providing multiple grants of up to $150,000 to projects that (1) employ a strategy related to behavioral science or changing community norms and (2) relate to crime and violence prevention as a primary or secondary policy goal. Letters of interest must be submitted by Friday, November 18, 2016.
 
Tools & Resources



New Expansion of the R Course Finder!

R Exercises


from November 04, 2016

“With so many R courses available online, we thought it was a good idea to offer a tool that helps people to compare these courses, before they decide where to spend their valuable time and (sometimes) money.”


How to legally re-use your own figures

Sara Hänzi


from November 05, 2016

When working on the figures for a recent paper I realised that I was using schemes of the animal I work with that come from a copyright-protected book. I decided that I will get rid of those schemes and instead produce my own. However, there was still a potential copyright issue: depending on where the paper would be published, the rights for the figures might well end up with the journal rather than with me. The solution that allows you and others to re-use your own figures is to publish everything on a platform such as figshare under a creative commons license before you publish it in a paper, and then cite yourself on figshare in your paper. That’s what I did!


NYC General Bike StatsNYC General Bike Stats – Thatcher

Thatcher Clay


from November 06, 2016

I stumbled on a pretty awesome dump of NYC bike stats recently through the NYC OpenData project. Apparently, every year on a particular day in the fall teams go out to specific locations throughout the city and collect data on a bunch of bike related measures – such as how many riders, the split between male and female riders, whether people are wearing helmets, etc. They have been doing this consistently since 2005 in 10 locations throughout the city. With all of those counts, it makes me wonder if I have ever been counted as part of this program! Here are some interesting things that I found.


Apache Spark: A Unified Engine for Big Data Processing

Communications of the ACM; Ion Stoica et al.


from November 01, 2016

“The very nature of ‘big data’ is that it is diverse and messy; a typical pipeline will need MapReduce-like code for data loading, SQL-like queries, and iterative machine learning. Specialized engines can thus create both complexity and inefficiency; users must stitch together disparate systems, and some applications simply cannot be expressed efficiently in any engine.” [video, 4:00]


Comparison of Linguistic APIs – Named Entity Recognition – Persons, Locations, Organizations

LinkedIn, Yuri Kitin


from October 04, 2016

“We carried out testing and measurements using three specially set up text corpora (gathered from various articles available on the web, including Wikipedia), each of which contained 50 proper nouns (person), geographical names (location), and organization/company names (organization), all in all 150 entities.”


Intelligence Platform Stack

Medium, Pavel Surmenok


from November 06, 2016

There is no stable intelligence platform stack yet, it is open for interpretation and changes. Most common model of the stack includes 8 layers, from hardware in the bottom to an application layer on the top. Layers from the bottom up:

1. Hardware to accelerate computation.


Scaling Spark in the real world: performance and usability

Adrian Colyer, the morning paper blog


from November 04, 2016

A short and easy paper from the Databricks team to end the week. Given the pace of development in the Apache Spark world, a paper published in 2015 about enhancements to Spark will of course be a little dated. But this paper nicely captures some of the considerations in the transition from research project to commercial software – we see two years of that journey.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor in Large-Scale Data Analytics



University of Michigan, School of Information; Ann Arbor, MI
Full-time positions outside academia

Communications Lead, Office of Innovation



UNICEF; New York, NY

Leave a Comment

Your email address will not be published.