NYU Data Science newsletter – September 7, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for September 7, 2016

GROUP CURATION: N/A

 
Data Science News



Facebook taps deep learning for customized feeds

InfoWorld


from September 02, 2016

Serving more than a billion people a day, Facebook has its work cut out for it when providing customized news feeds. That is where the social network giant takes advantage of deep learning to serve up the most relevant news to its vast user base.

Facebook is challenged with finding the best personalized content, Andrew Tulloch, Facebook software engineer, said at the company’s recent @scale conference in Silicon Valley. “Over the past year, more and more, we’ve been applying deep learning techniques to a bunch of these underlying machine learning models that power what stories you see.”

 

‘Deep learning’ quest drives autonomous startup

Automotive News


from September 05, 2016

Imagine a driverless vehicle capable of using a variety of emojis, honks and signs to communicate its intentions to nearby drivers and pedestrians. Drive.ai, a new entrant in the autonomous vehicle race, has begun testing a fleet of such vehicles near its home base in Mountain View, Calif.

The company, staffed by researchers from Stanford University’s artificial intelligence laboratory, is getting an assist from Steve Girsky, a former General Motors executive who has been named to the Drive.ai board.

 

Intel buys Movidius to boost machine vision, deep learning |

SiliconANGLE


from September 06, 2016

Intel Corp. aims to boost its efforts to make all manner of devices from drones to virtual reality headsets able to see better with the acquisition of machine vision processor chip maker Movidius Ltd.

The deal, whose value Intel didn’t reveal but one report estimated at $400 million, marks another big step for the giant chipmaker into deep learning neural networks, the branch of machine learning that roughly emulates some of the activity of neurons in the brain to enable computers to learn to recognize patterns in masses of data representing images, speech and text.

 

NSF sustaining their support for open data

Data Dryad blog


from September 06, 2016

We are pleased to have received a Sustaining Award from the U.S. National Science Foundation. Sustaining Awards are an innovative proposal track, developed within NSF’s Advances in Bioinformatics program, that provides “limited support for the cost of ongoing operations and maintenance of existing cyberinfrastructure that is critical for the continued advance of priority biological research.”

The award is to the University of North Carolina at Chapel Hill with Dryad as a subawardee. The grant provides approximately $762K in funding over three years (starting 1-Sep-2016).

 

Why Data Skills Are Key to Social Media Success

Knowledge@Wharton


from August 25, 2016

There’s hardly a company in the world today that doesn’t have a social media presence, whether it’s a Facebook account, a Twitter feed or a website with an interactive blog. But mere existence in the social media space doesn’t guarantee a boost in customers, sales or brand recognition for a business, says Lynn Wu, a Wharton professors of operations, information and decisions who specializes in the study of information technology and its relationship to productivity.

In her latest paper, “Data Analytics Skills and the Corporate Value of Social Media,” Wu analyzed a large sample of businesses to determine how they derived value from social media. [video, 10:52]

 

The good, the bad, and the ugly: uncovering novel research opportunities in social media mining

SpringerLink, International Journal of Data Science and Analytics


from September 02, 2016

Big data is ubiquitous and can only become bigger, which challenges traditional data mining and machine learning methods. Social media is a new source of data that is significantly different from conventional ones. Social media data are mostly user-generated, and are big, linked, and heterogeneous. We present the good, the bad and the ugly associated with the multi-faceted social media data and exemplify the importance of some original problems with real-world examples. We discuss bias in social media data, evaluation dilemma, data reduction, inferring invisible information, and big-data paradox. We illuminate new opportunities of developing novel algorithms and tools for data science. In our endeavor of employing the good to tame the bad with the help of the ugly, we deepen the understanding of ever growing and continuously evolving data and create innovative solutions with interdisciplinary and collaborative research of data science. [full text]

 

[1608.09010] Statistical physics of vaccination

arXiv, Physics > Physics and Society; Zhen Wang et al.


from August 31, 2016

Historically, infectious diseases caused considerable damage to human societies, and they continue to do so today. To help reduce their impact, mathematical models of disease transmission have been studied to help understand disease dynamics and inform prevention strategies. Vaccination – one of the most important preventive measures of modern times – is of great interest both theoretically and empirically. And in contrast to traditional approaches, recent research increasingly explores the pivotal implications of individual behavior and heterogeneous contact patterns in populations. Our report reviews the developmental arc of theoretical epidemiology with emphasis on vaccination, as it led from classical models assuming homogeneously mixing (mean-field) populations and ignoring human behavior, to recent models that account for behavioral feedback and/or population spatial/social structure. Many of the methods used originated in statistical physics, such as lattice and network models, and their associated analytical frameworks. Similarly, the feedback loop between vaccinating behavior and disease propagation forms a coupled nonlinear system with analogs in physics. We also review the new paradigm of digital epidemiology, wherein sources of digital data such as online social media are mined for high-resolution information on epidemiologically relevant individual behavior. Armed with the tools and concepts of statistical physics, and further assisted by new sources of digital data, models that capture nonlinear interactions between behavior and disease dynamics offer a novel way of modeling real-world phenomena, and can help improve health outcomes. We conclude the review by discussing open problems in the field and promising directions for future research.

 

Surveillance By Default

Medium, Chris Diehl


from August 29, 2016

We find ourselves living through a profound time in history. One where it is possible to observe both online and offline social behavior at a global scale. From a scientific perspective, data collection at scale is undeniably attractive. From an economic perspective, it is undeniably lucrative.

We’ve rushed in to explore and expand this vast technological landscape. In many respects, one could argue we remain at the very beginning of that exploration. Yet some patterns are starting to become clear. There are risks to the rights of individuals and groups within our global society that need to be addressed. It’s time for the technology sector to reflect more deeply on those risks and take steps to mitigate them going forward.

 

Why It’s So Hard To Find The Next Earth, Even If You’re Looking Right At It

FiveThirtyEight, Maggie Koerth-Baker


from September 06, 2016

In 1989, scientists found the first evidence of planets outside our own familiar, little solar neighborhood.1 Today, the list of exoplanets — as measured by NASA’s Exoplanet Archive — has grown to 3,375. That includes the recently discovered Proxima Centauri b, a planet just four light-years away that could be habitable.

But potentially Earth-esque planets remain rare. The Planetary Habitability Laboratory database at the University of Puerto Rico’s Arecibo telescope facility counts just 10 exoplanets that are likely to be habitable.2 Ten, out of 3,375. That statistic creates an image of a universe filled with gas giants, flaming hot lava spheres and frozen snowballs — with nary a cozy, just-right second Earth to be found.

Appearances, however, can be deceiving. The list of habitable exoplanets is so short not because there aren’t many, scientists told me, but because the tools we use to find exoplanets are biased.

 

Rethinking Justice: Can Data-Driven Approaches Remove Politics and Bias from Policing?

Emergency Management


from September 06, 2016

Crime is a societal problem as old as sin itself, but America’s treatment of the issue has proven so careless that some leaders are now willing to try different approaches. Police departments are using data platforms like Socrata for Public Safety to hold officers accountable and keep the public informed of operations. And through programs like the Justice Reinvestment Initiative, states are using data to help case workers, judges and lawmakers make decisions that reduce crime instead of build more prisons. Modern technology is generating data that allows criminal justice to add shades of gray to what has long been viewed as a black and white issue.

 

Perspectives on AI

Medium, Astro Teller


from September 01, 2016

In 1997, when I was a graduate student studying computer science, I published my first novel, a parable about the dangers of approaching technology in general?—?and artificial intelligence (AI) in particular?—?with fear and hostility. Having done a Ph.D. in AI in the 90’s, I’ve been working for over twenty years to help people understand AI and to calm dystopian hysteria that has wormed its way into discussions about the future of AI and robotics. Only by clearing away panicky rhetoric we can have a productive conversation about how those fields can be explored to the maximum benefit of humanity.

Over the ensuing decades, we have seen the rapid evolution of AI (e.g. Self-driving cars, Google Brain), but the public rhetoric of fear hasn’t changed. To make a more sanguine look at the situation more fun to consume, I painted myself blue, and my wife, Danielle Teller, and I gave a talk at the first annual Silicon Valley Comic Con titled “AI vs. SuperBabies”. It’s a satirical look at the doomsday scenarios for both artificial intelligence and genetic engineering that get in the way of logical conversation.

 

Science AMA Series: I’m Dr. Elad Yom-Tov, a Principal Researcher at Microsoft Research. I use Internet data to learn about health and medicine. AMA!

reddit.com/r/science


from September 05, 2016

I’m Elad Yom-Tov, a Principal Researcher at Microsoft Research. I am a Machine Learning and Information Retrieval researcher, and for the past few years my work has focused on using Internet data to study our health. Internet data are all those things that we create while browsing the web: posts on Facebook and Twitter, queries on Google and Bing, blogs, and other content. These data can teach us about aspects of medicine that are hard to learn about in other ways. A few examples include measuring the effect of mainstream media on the development of eating disorders, estimating the effectiveness of flu vaccines, detecting new side effects of medical drugs, and discovering how visiting a dating site can lead to catching an STD.

 

Boston officials use Big Data to find deceptive rental ads, unsafe units

The Boston Globe, Matt Rocheleau


from September 02, 2016

Boston officials are testing a new method that draws on Big Data techniques to find deceptive apartment advertisements — and to identify potentially unsafe units.

The city recently began running computer programs that scan online apartment listings, primarily on Craigslist, and compare the information against city assessing records to make sure the ads don’t oversell the unit — exaggerating the number of bedrooms or bathrooms, for example.

The programs also scan for certain words and phrases and spit out a “risk score,” indicating whether there is a high or low probability that a listing is advertising a property that might violate regulations.

 

Stressed? We’re teaching computers to help handle our feelings

New Scientist, Technology News


from September 05, 2016

Some humans just know when and how to offer a word of emotional support. Now computers are learning too, with the creation of a new algorithm that aims to deliver the right words at the right time.

“There’s a lot of need for emotional support at the moment,” says Judith Masthoff at the University of Aberdeen, UK, who is designing the system. “We have increased rates of mental health issues, and this has led to increased rates of informal care.” Trained professionals are only available for the most extreme cases, so Masthoff suggests that people could get instant support from apps instead.

 

2016 Data Science Salary Survey

O'Reilly Media


from September 06, 2016

In this fourth edition of O’Reilly’s Data Science Salary Survey, 983 respondents working across a variety of industries answered questions about the tools they use, the tasks they engage in, and the salaries they make. This year’s survey includes data scientists, engineers, and others in the data space from 45 countries and 45 US states.

Participate in 2017 survey here.

 
Events



IBM DataFirst Launch Event



New York, NY Hudson Mercantile Building, Tuesday, 27 September 2016. [free]
 

Ethics of Artificial Intelligence



New York, NY Friday-Saturday, 14-15 October 2016, at NYU. [free]
 
Deadlines



Call for Papers – NIPS 2016 Workshop on Machine Learning for Health

deadline: Conference

Barcelona, Spain In this workshop we will bring together clinicians, health data experts, and machine learning researchers working on healthcare solutions. — Friday, December 9, part of NIPS 2016.

Deadline for submissions is Friday, October 28.

Other NIPS 2016 workshops with paper submission deadlines (in parentheses):

  • Approximate Inference (Tuesday, November 1)
  • Interpretable ML for Complex Systems (Thursday, October 20)
  • Neural Abstract Machines & Program Induction (Friday, October 14)
  •  

    Approximate Inference – Call for Papers

    deadline: Conference

    Barcelona, Spain We invite researchers in machine learning and statistics to participate in the NIPS 2016 Workshop on Advances in Approximate Bayesian Inference. — Fridaay-Saturday, December 9-10, part of NIPS 2016

    Deadline for paper submission is Tuesday, November 1.

     
    Tools & Resources



    Tutorials and resources for machine learning and data analysis enthusiasts

    Laura Harding Graesser, Learning Machine Learning


    from September 02, 2016

    “My previous post walked through the code for implementing a general feedforward neural network program in Python. This post shows the program in action, applied to two classification problems.”

     

    Ask better questions

    Xerox, Simplify Work blog, Mike Kuniavsky


    from September 02, 2016

    I’ve done a lot of usability testing over many years; so much that I wrote a book about it. And usability testing is an absolutely essential part of evaluation. Assuming you know what people want and have iteratively developed a product with these things in mind, usability testing makes sure your assumptions are correct. It’s the final check.

    But what if you want to do the exact opposite and evaluate a product before you make it? That’s what we do with our probe method.

     

    Computational Ants: Agent-based Visualization with OD Matrices

    Medium, MIT MEDIA LAB, Ira Winder


    from September 01, 2016

    So you want to make a model of a system, but you don’t think you have all of the data you need? This is actually a very common situation in modeling. Before going on a quest to search for more data, I often remind researchers that a good model might not need as much data as they think. Personally, I think elegant math models operate with minimal data. The less data you need, the easier it is to deploy, calibrate, and scale the use of your model in the Real World, after all.

    In this post we will review a data-light technique for modeling flows of people using a simple and common data set called an origin-destination matrix (OD matrix, for short).

     

    Learn Node.js – A Free Course for Node Beginners

    HyperDev


    from September 05, 2016

    “Through 13 exercises, follow The Node Beginner Book, with live-running code examples on HyperDev. By the end, you’ll know how to create a complete web app that allows users to view web pages and upload files.”

     

    Announcing Quill 1.0

    Quill, Jason Chen


    from September 06, 2016

    Quill is designed as an easy to use edi”tor, to support content creation across the web. It is built on top of consistent and predictable constructs, exposed through a powerful API. With coverage across both ends of the complexity spectrum, Quill aims to be the defacto rich text editor for the web.”

     
    Careers


    Tenured and tenure track faculty positions

    Assistant Professor – Human-Computer Interaction (HCI) and social good



    Information School of the University of Washington; Seattle, WA
     

    Open Rank Professor – Data Science



    New York University – Shanghai; Shanghai, China
     
    Internships and other temporary positions

    Google Brain Resident, 2017 Start (Fixed Term Employee)



    Google; Mountain View, CA
     

    Leave a Comment

    Your email address will not be published.