NYU Data Science newsletter – April 25, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 25, 2016

GROUP CURATION: N/A

 
Data Science News



Welcome to Google Summer of Code Students

Software Carpentry


from April 23, 2016

Thanks to a lot of hard work from some members of our community and members of others open source scientific projects we are happy to welcome the students and mentors that will participate on Google Summer of Code this year under NumFOCUS umbrela.

 

Case Study: The Ethics of Using Hacked Data: Patreon’s Data Hack and Academic Data Standards

Council for Big Data, Ethics, and Society; Nathaniel Poor & Roei Davidson


from April 04, 2016

Should researchers utilize hacked datasets that have been released in public forums? This case study discusses the ethical arguments for and against utilizing hacked crowdfunding data for academic research.

 

Andrew Lo Study Says Twitter Can Help You Trade Fed Meetings

Bloomberg Technology


from April 21, 2016

In the social media cacophony, some of the noise rises to the level of stock market signal.

That’s the finding of a working paper overseen by Massachusetts Institute of Technology’s Andrew Lo, which says a trading strategy based on views posted on Twitter prior to Federal Reserve policy meetings regularly turned a profit. A one-standard-deviation increase in tweet sentiment can be exploited to boost Fed-day equity returns by 0.62 percent, it found.

Quant investors have been obsessed for years with pulling signals out of social media, where roughly 1.8 billion active users opine in real time on everything from tech valuations to celebrity breakups. The MIT study, which examined 3.9 million tweets over seven years, joins an expanding pool of research examining topics such as the role of tweets in market volatility and the accuracy of crowd-sourced earnings estimates.

 

In first 72 Hours of Science, SFI postdocs test the limits of transdisciplinary science

Santa Fe Institute


from April 20, 2016

In a unique experiment in testing the limits of scientific collaboration, 15 Santa Fe Institute postdocs holed up in a home in the foothills above Tesuque, New Mexico, recently for three days and three nights of intense scientific research.

Their goal: starting virtually from scratch, produce a novel, transdisciplinary scientific research paper in just 72 hours. They called the event “72 Hours of Science,” 72h(S) in shorthand.

 

Establishing a framework for Open Geographic Information science

International Journal of Geographical Information Science


from February 05, 2016

When conducting research within a framework of Geographic Information Science (GISc), the scientific validity of this work can be argued as highly dependent upon the extent to which the methods employed are reproducible, and that, in the strictest sense, can only be fully achieved by implementing transparent workflows that utilize both open source software and openly available data. After considering the scientific implications of non-reproducible methods, we provide a review of both open source Geographic Information Systems (GIS) and openly available data, before describing an integrated model for Open GISc. We conclude with a critical review of this embryonic paradigm, with directions for future development in supporting spatial data infrastructure.

 

Google believes its superior AI will be the key to its future | The Verge

The Verge


from April 21, 2016

Google is beginning to look beyond search to tap into some of the most lucrative and promising businesses in the tech industry: artificial intelligence and cloud computing. The company, the largest and most significant part of Alphabet Inc., has grown to mammoth proportions off the back of its search-based advertising division. But those revenues are starting to slow. The cloud allows companies to manage and sell server space and software that lives inside its data centers, like AI, to other large companies. That type of service-based business is fast becoming the new way to reap profits in the tech industry.

“We’ve always been doing cloud, it’s just that we’ve been consuming it all internally at Google,” CEO Sundar Pichai said on an earnings call with investors today. “But as we’ve grown — really matured in how we handle our data center investments and how we can do this at scale — we’ve definitely crossed over to the other side to where we can thoughtfully serve external customers.”

 

Bayes Hack 2016 · GitHub

GitHub – bayeshack2016


from April 24, 2016

Bayes Impact, a nonprofit backed by the Bill & Melinda Gates Foundation and Y Combinator, is hosting its second data science hackathon to bring together Silicon Valley and the federal government. Projects are listed on GitHub.

 

Facebook Aims to Drive Down Tech Prices to Expand Its Reach

The New York Times


from April 24, 2016

Mark Zuckerberg, Facebook’s co-founder and chief executive, is clear about his vision for his company: He wants to triple the size of his social network, which now has 1.6 billion members.

But to reach that new audience, he has to find a way to change telecommunications networks to make connecting to the Internet more affordable, since many of those would-be Facebook users live in developing countries.

 

American Academy of Arts and Sciences Elects Five NYU Faculty as 2016 Fellows

NYU News


from April 20, 2016

The American Academy of Arts and Sciences (AAAS) has elected five New York University faculty as fellows: Leslie Greengard, a professor at the Courant Institute of Mathematical Sciences; Yusef Komunyakaa, a faculty member in the Creative Writing Program and the Department of English; Trevor Morrison, dean of NYU’s School of Law; Debraj Ray, a Silver Professor in the Department of Economics; and Christopher Wood, a professor in the Department of German.

Other AAAS fellows selected this year include the following: novelist Colm Tóibín; La Opinión Publisher and CEO Monica Lozano; former Botswanan President Festus Mogae; and autism author and spokesperson Temple Grandin. The list of the new members may be found here.

 
Events



2016 New York Scientific Data Summit (NYSDS): Data-Driven Discovery



Led by the Computational Science Initiative’s at Brookhaven National Laboratory (BNL), the New York Scientific Data Summit (NYSDS) aims to accelerate data-driven discovery and innovation by bringing together researchers, developers and end-users from academia, industry, utilities and state and federal governments. Jointly organized by Brookhaven National Laboratory (BNL), Stony Brook University (SBU), and New York University (NYU). The theme of this year’s conference is “Data-Driven Discovery.”

With keynote speakers from industry and international big-science projects, the 2-1/2 day conference is organized into five sessions.

Sunday-Thursday, August 14-17, in New York City.

 
Deadlines



12th International Workshop on Mining and Learning with Graphs

deadline: subsection?

There is a great deal of interest in analyzing data that is best represented as a graph. Examples include the WWW, social networks, biological networks, communication networks, transportation networks, energy grids, and many others. These graphs are typically multi-modal, multi-relational and dynamic. In the era of big data, the importance of being able to effectively mine and learn from such data is growing, as more and more structured and semi-structured data is becoming available. The workshop serves as a forum for researchers from a variety of fields working on mining and learning from graphs to share and discuss their latest findings.

Sunday, August 14, in San Francisco, held in conjunction with KDD 2016.

Deadline to submit papers is Friday, May 27.

 

Open Data Research Symposium 2016

deadline: subsection?

The Open Data Research Symposium program committee is pleased to announce that the second Open Data Research Symposium (#ODRS16) will be held on October 5, 2016 prior to the International Open Data Conference 2016 in Madrid, Spain.

Open data has become firmly cemented in the policy mainstream. However, there remains a pressing need to dig deeper into the dynamics of how open data operates in practice, as well as to nurture theories that explain these dynamics. This symposium offers open data researchers an opportunity to reflect critically on the findings of their completed research and to create cohesion within the research community on what the future challenges are likely to be in unlocking the potential of open data to promote social progress.

Madrid, Spain. Deadline for abstracts’ submissions is Monday, May 30.

 
CDS News



Faculty Interview: Dustin Duncan

NYU Center for Data Science


from April 22, 2016

How did you become interested in social epidemiology?

I became interested in studying the social epidemiology of neighborhoods during my master’s program, when I was a research assistant at Dana-Farber Cancer Institute’s Center for Community-Based Research (DFCI). At DFCI, I co-authored a paper demonstrating that perceiving one’s neighborhood as unsafe is associated with reduced walking among urban predominantly racial/ethnic minority low-income adults. This was my first published peer-reviewed paper, and was published in PLoS Medicine.

In spatial epidemiology, are you using data science to look at the ways in which health outcomes are affected on a micro-level (such as disease mapping and clustering), or is it more of a macro view (which areas are most likely to be affected)?

My research embraces a social ecological perspective, interrogating the impact of different “levels” on people’s health. However, the majority of our work is at the community level, where we investigate how neighborhood characteristics can influence population health and health disparities in vulnerable populations predominantly in urban environments.

 
Tools & Resources



Spreadsheet Thinking vs. Database Thinking

Robert Kosara, eagereyes blog


from April 24, 2016

The shape of a dataset is hugely important to how well it can be handled by different software. The shape defines how it is laid out: wide as in a spreadsheet, or long as in a database table. Each has its use, but it’s important to understand their differences and when each is the right choice.

 

Leaf – Machine Learning for Hackers

Autumn Machine Learning Platform


from April 21, 2016

This short book teaches you how you can build machine learning applications (with Leaf).

Leaf is a Machine Intelligence Framework engineered by hackers, not scientists. It has a very simple API consisting of Layers and Solvers, with which you can build classical machine as well as deep learning and other fancy machine intelligence applications. Although Leaf is just a few months old, thanks to Rust and Collenchyma it is already one of the fastest machine intelligence frameworks available.

 
Careers



Review and Edit, Writing Workshops Planned for JSM
 

Amstat News
 

Detectica – Jobs
 

Detectica
 

Leave a Comment

Your email address will not be published.