NYU Data Science newsletter – March 30, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for March 30, 2016

GROUP CURATION: N/A

 
Data Science News



Collaboration and Tribalism in Science

Undark, Veronique Greenwood


from March 28, 2016

“Physics is not conceptually super-interesting anymore, not as interesting as biology and evolution and all things social — at least for me,” says Luis Bettencourt, a physicist at the Santa Fe Institute who once studied the origins of the universe and now studies the growth of cities.

In many cases, these new collaborations have been fueled by an explosion of data pouring in from DNA sequencing, cellphone records and other sources, filled with latent patterns that could reveal more about the systems that created them. “It’s an opportunity for people that are fluent with dealing with data, and modeling data” — in other words, certain kinds of physicists — “to come in and say something,” Bettencourt says.

 

To SQL or NoSQL? That’s the database question

Ars Technica


from March 30, 2016

Poke around the infrastructure of any startup website or mobile app these days, and you’re bound to find something other than a relational database doing much of the heavy lifting. Take, for example, the Boston-based startup Wanderu. This bus- and train-focused travel deal site launched about three years ago. And fed by a Web-generated glut of unstructured data (bus schedules on PDFs, anyone?), Wanderu is powered by MongoDB, a “NoSQL” database—not by Structured Query Language (SQL) calls against traditional tables and rows.

Is the equation really as simple as “Web-focused business = choose NoSQL?” Why do companies like Wanderu choose a NoSQL database? (In this case, it was MongoDB.) Under what circumstances would a SQL database have been a better choice?

 

Future Proofing Data-intensive Research

UW eScience Institute, Ariel Rokem


from March 24, 2016

Ariel Rokem presentation at University of Washington TechConnect, March 24.

The eScience Institute: data-science at the UW — Future proofing:

  • Catalyzing collaborations
  • Building and maintaining the tools
  • Sustaining career paths in data-intensive research
  • Training data-savvy researchers
  •  

    IBM to slash time needed to train AI with new resistive processing tech

    Computer Business Review


    from March 29, 2016

    Tech giant IBM has developed a new technology that can speed up the training for deep neural networks (DNNs).

    Though DNNs can be taught to perform almost any task, training them is time consuming and complex. Training artificial intelligence (AI) systems involves the usage of supercomputers or data centres for a significant number of days.

    In a research paper titled ‘Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices’, authors Tayfun Gokmen and Yurii Vlasov said:”In recent years, DNNs have demonstrated significant business impact in large scale analysis and classification tasks such as speech recognition, visual object detection, pattern extraction, etc.

     

    When open data is a Trojan Horse: The weaponization of transparency in science and governance

    Big Data & Society journal; Karen EC Levy and David Merritt Johns


    from March 23, 2016

    Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts.

     

    Crowd Control? Baidu Has an Algorithm for That

    Wall Street Journal, China Real Time Report blog


    from March 23, 2016

    A unit of Chinese internet giant Baidu Inc. has developed an algorithm that can predict crowd formation, which it says could be used to help warn authorities and individuals of unusually large crowds that could lead to public-safety threats.

    On Tuesday, Baidu’s Big Data Lab published a study that showed that aggregated data from Baidu Map route searches, when correlated with the crowd density of the places people searched for can predict future crowd formations at a certain place and at a certain time.

     

    Is AlphaGo Really Such a Big Deal?

    Quanta Magazine, Michael Nielsen


    from March 29, 2016

    … Will the technical advances that led to AlphaGo’s success have broader implications? To answer this question, we must first understand the ways in which the advances that led to AlphaGo are qualitatively different and more important than those that led to Deep Blue.

     

    Automated Search for new Quantum Experiments

    Physical Review Letters; Mario Krenn, Mehul Malik, Robert Fickler, Radek Lapkiewicz, and Anton Zeilinger


    from March 04, 2016

    Quantum mechanics predicts a number of, at first sight, counterintuitive phenomena. It therefore remains a question whether our intuition is the best way to find new experiments. Here, we report the development of the computer algorithm Melvin which is able to find new experimental implementations for the creation and manipulation of complex quantum states. Indeed, the discovered experiments extensively use unfamiliar and asymmetric techniques which are challenging to understand intuitively. The results range from the first implementation of a high-dimensional Greenberger-Horne-Zeilinger state, to a vast variety of experiments for asymmetrically entangled quantum states—a feature that can only exist when both the number of involved parties and dimensions is larger than 2.

     

    Why UW president Ana Mari Cauce is so hopeful: Students melding entrepreneurship with social good

    GeekWire


    from March 29, 2016

    Ana Mari Cauce — who was appointed president of the University of Washington last fall — is optimistic about the future.

    The Cuban-born psychology professor sees a positive trend emerging on campus. Students are mixing entrepreneurship with social good in new and creative ways.

     

    How AI Is Feeding China’s Internet Dragon

    MIT Technology Review


    from March 28, 2016

    Shortly after walking through the front doors of Baidu in Beijing last November, I was surprised to notice that my face had transformed into that of a cheerful-­looking little dog. As I chatted with one of Baidu’s AI researchers, the version of me shown on his smartphone had sprouted a very realistic-looking wet snout, fluffy ears, and a big pink tongue.

    The trick was performed on an app called Face You, released by Baidu last Halloween, which lets you add all sorts of spooky effects or animal characteristics to a digital image of your face. Face You makes use of an AI technique called deep learning to automatically identify key points on a person’s face, so that software can then position and stretch a virtual mask with amazing accuracy.

    Deep learning is driving a lot more than just goofy apps at Baidu, though. It is making existing products smarter and helping the company’s engineers dream up many entirely new ideas.

     

    Man and Machine

    MIT Technology Review


    from March 29, 2016

    Engineers at Pinterest constantly create new artificial-intelligence algorithms to help its users find what they’re looking for among billions of pictures of food, products, houses, and other items. Matching search queries with relevant images is crucial to keep users coming back. But until last year, it could take days to test the effectiveness of each new algorithm.

    To fine-tune its machine learning and provide better search results faster, Pinterest turned to an unexpected source: human intelligence. It hired crowdsourcing companies such as CrowdFlower to marshal people to quickly do “micro-tasks” such as labeling photos and assessing the quality of search results. In an hour, the workers collectively could test hundreds of search terms to see if results matched well enough.

    For all the recent advances in AI, human beings remain more adept than machines at distinguishing, say, a tile mosaic from a similar pattern on a blanket. “It will be a long way out before machines will be able to do this,” says Pinterest data scientist Mohammad Shahangian.

     

    Despite machines taking over the world, humans still prove useful

    Stitch Fix Technology – Multithreaded blog


    from March 29, 2016

    Human Computation is a new field that’s based on this realization, and researchers within it typically work to harness the strengths of both “systems” by combining them to produce an overall better algorithm. The work in this domain combines traditional machine learning techniques with crowdsourcing, human computer interaction, and cognitive science to invent innovative ways to mesh the two.

    Human Computation is at the core of our business. Our machine algorithms select items that a client might want, then passes them to a human stylist who selects five of those items to send to that client. It’s true that modern recommendation systems are capable of selecting items all on their own and, in fact, so are humans. We choose this workflow because it allows us to optimize our overall algorithm by making use of what both “systems” are naturally good at. Computers are great at crunching numbers and finding patterns, but often struggle with tasks that require an understanding of aesthetics and emotion.

     
    Events



    New York Blockchain Workshop



    The Blockchain Workshops investigate the upcoming challenges and opportunities provided by blockchain technologies, and their impact on the current social, economic and political order.

    Monday-Tuesday, April 4-5, at NYU Stern School of Business. Admission is $750, $75 for students.

     
    CDS News



    Kyunghyun Cho Talks Image Caption Generation

    NYU Center for Data Science


    from March 28, 2016

    Kyunghyun Cho is an Assistant Professor at NYU’s Center for Data Science, and conducts research in the field of natural language processing. His recent paper, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention” proposes to use an attention-based model for image description.

    Can you give us a bit of background on why you choose to look into the subject of image description?

    One big question in the field of machine learning and artificial intelligence research is whether there exists a single, generic learning mechanism that can work with any type of data and task. Can we build an artificial neural network that works both on text and images? Can the deep convolutional neural network—which is widely used in object recognition—also work well with natural language text? These questions motivate much of my research.

     
    Tools & Resources



    Using R packages and education to scale Data Science at Airbnb

    Medium, Airbnb Engineering & Data Science


    from March 29, 2016

    One of my favorite things about being a data scientist at Airbnb is collaborating with a diverse team to solve important real-world problems. We are diverse not only in terms of gender, but also in educational backgrounds and work experiences. Our team includes graduates from Mathematics and Statistics programs, PhDs in fields from Education to Computational Genomics, veterans of the tech and finance worlds, as well as former professional poker players and military veterans. This diversity of training and experience is a tremendous asset to our team’s ability to think creatively and to understand our users, but it presents challenges to collaboration and knowledge sharing. New team members arrive at Airbnb proficient in different programming languages, including R, Python, Matlab, Stata, SAS, and SPSS. To scale collaboration and unify our data science brand, we rely on tooling, education, and infrastructure. In this post, we focus on the lessons we have learned building R tools and teaching R at Airbnb. Most of these lessons also generalize to Python.

    Our approach has two main pillars: package building and education.

     

    Important API Announcement — The Echo Nest Developer Center

    Spotify, Echo Nest


    from March 29, 2016

    As part of our migration of many of The Echo Nest API features over to the Spotify Web API, we’re announcing three new APIs today.

     

    Leave a Comment

    Your email address will not be published.