Data Science newsletter – February 8, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for February 8, 2017

GROUP CURATION: N/A

 
 
Data Science News



Whale choir and cosmic bubbles on Dare Art Prize shortlist

BBC News


from

A whale choir, the exploration of cosmic bubbles and a slime study are among five proposals shortlisted for the Dare Art Prize.

The annual £15,000 prize, in its first year, is aimed at getting artists and scientists to work together creatively.

The other entries are chamber music scored beneath the limits of human hearing plus artworks which classify emotions using scientific rigour.


Setting the Talmud Free

Sefaria, The Sefaria Blog, Daniel Septimus


from

More than fifty years ago, Rabbi Adin Steinsaltz Even-Israel took it upon himself to make the Talmud, the central text of Jewish life, available to all. In 1965, he began translating the 37 tractates of the Talmud from ancient Aramaic into Modern Hebrew, with an English translation published in the Koren Talmud Bavli Noé Edition. Ninety percent of the world’s Jewish population speaks English or Hebrew as a first language, so making the Talmud intelligible in these two languages is a colossal achievement, but until now, this precious content was only available to those with access to a physical volume.

Today, Sefaria is excited and humbled to announce the release of The William Davidson Talmud, a free digital edition of the Babylonian Talmud with parallel translations, interlinked to major commentaries, biblical citations, Midrash, Kabbalah, Halakhah, and an ever-growing library of Jewish texts.


AI Is About to Learn More Like Humans—with a Little Uncertainty

WIRED, Business, Cade Metz


from

“Deep learning has really received a lot of attention, and deservedly,” says Tuomas Sandholm, the Carnegie Mellon computer science professor who helped build Libratus, the AI that recently topped the best humans at poker—without help from neural networks. “But deep learning doesn’t give you guarantees.”

It’s true. But because of these apparent weaknesses in neural networks, some of the world’s biggest tech companies are now broadening the way they think about AI, judging from recent hires, acquisitions, and research, and many startups are moving in the same direction. You can think of this as the rise of the Bayesians, researchers that approach AI through the scientific method—beginning with a hypothesis and then updating this hypothesis based on the data—rather than just relying on the data to drive the conclusions, as neural networks do.


[1702.01119] Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions

arXiv, Computer Science > Social and Information Networks; Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, Jure Leskovec


from

In online communities, antisocial behavior such as trolling disrupts constructive discussion. While prior work suggests that trolling behavior is confined to a vocal and antisocial minority, we demonstrate that ordinary people can engage in such behavior as well. We propose two primary trigger mechanisms: the individual’s mood, and the surrounding context of a discussion (e.g., exposure to prior trolling behavior). Through an experiment simulating an online discussion, we find that both negative mood and seeing troll posts by others significantly increases the probability of a user trolling, and together double this probability. To support and extend these results, we study how these same mechanisms play out in the wild via a data-driven, longitudinal analysis of a large online news discussion community. This analysis reveals temporal mood effects, and explores long range patterns of repeated exposure to trolling. A predictive model of trolling behavior shows that mood and discussion context together can explain trolling behavior better than an individual’s history of trolling. These results combine to suggest that ordinary people can, under the right circumstances, behave like trolls.


Crafting an Artificial Soul

Medium, Snips Blog, Steve Tam


from

For the past 10 years, palm sized touch screens have been our portal to the future. Our interactions have taken the form of infinite scrolls, and right-swipes. But with the imminent rise of AI assistants, conversational interfaces are quickly moving from fiction into daily routine. It’s time for companies at the forefront of assistant development to think about how we’ll design their personalities — and ethics. While today’s AI assistants feel transactional, empty, and even slightly annoying, any uptake in their soft skills could be revolutionary.


How the Social & Behavioral Sciences Get Funded in the US: an Infographic

SAGE Connection – Insight, Social Science Space


from

Donald Trump’s victory of the White House along with Republicans holding onto control of both houses of Congress has left many attempting to imagine a post-Barack Obama government. The former president was a staunch supporter of social sciences, and made it apparent that he would invest in such programs for researchers, scientists, and students. President Trump’s rhetoric and actions on science have left the science community wondering what is in store from the federal government. One thing to watch as Trump beings his term as the 45th president is the budget he will submit to Congress early this year, including potential funding of the sciences. Take a look at the following graphic to get an idea of how this process plays out for the social and behavioral science in particular.


Google and Ivyrevel make ‘Data Dress’

FashionUnited


from

Google is teaming up with H&M-backed digital fashion house Ivyrevel’s fashion tech lab to bring couture into the digital age with its ‘Data Dress’ that is personalised entirely on the user’s activities.

The Coded Couture ‘Data Dress’ will be personalised through an Android application, which is currently under development that utilises Awareness API that Google made available to all developers through Google Play last year. The idea is that the API will passively monitor the user’s daily activity and lifestyle with their permission to create a personalised, custom-made dress that’s ordered through the app.


As Goldman Embraces Automation, Even the Masters of the Universe Are Threatened

MIT Technology Review, Nannete Byrnes


from

At its height back in 2000, the U.S. cash equities trading desk at Goldman Sachs’s New York headquarters employed 600 traders, buying and selling stock on the orders of the investment bank’s large clients. Today there are just two equity traders left.

Automated trading programs have taken over the rest of the work, supported by 200 computer engineers. Marty Chavez, the company’s deputy chief financial officer and former chief information officer, explained all this to attendees at a symposium on computing’s impact on economic activity held by Harvard’s Institute for Applied Computational Science last month.

The experience of its New York traders is just one early example of a transformation of Goldman Sachs, and increasingly other Wall Street firms, that began with the rise in computerized trading, but has accelerated over the past five years, moving into more fields of finance that humans once dominated.


What Programming Languages Are Used Most on Weekends?

Stack Overflow Blog, Julia Silge


from

For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do in my day job here at Stack Overflow, but are not exactly the same. Many developers tinker with side projects for learning or career development (or just for fun!) and at Stack Overflow, we support all types of technologies, from professional to hobbyist. Whenever people are working, we’re available to answer their questions. But what languages tend to be asked about on weekends, as opposed to weekdays?

Let’s use our public StackLite dataset on Kaggle to explore differences between questions that are posted on weekdays and weekends. This dataset is available for anyone to analyze; you can use a Kaggle Kernel to get started on your own question about Stack Overflow questions and tags.


Your Private Browsing History Alone Can Give Away Your Identity

The Atlantic, Kaveh Waddell


from

Companies that compile user profiles generally do so pseudonymously: They may know a lot of demographic details about you, but they don’t usually connect your behavior to your individual identity. But a group of researchers at Stanford and Princeton developed a system that can connect your profile to your name and identity, just by examining your browsing history.


The Major Blind Spots in Macroeconomics

The New York Times Magazine, John Lanchester


from

This isn’t the first time [Andy] Haldane has been critical of the current state of macroeconomics — that’s the big-picture, whole-economy side of the profession. Last fall he gave an important, broad-ranging speech with the elegant title “The Dappled World,” in which he argued that the unexpected global downturn that began in late 2007 has left behind “a crisis in the economics and finance profession.” Economic forecasts on the eve of the credit crunch and the Great Recession were, he says, “not just wrong but spectacularly so.” The overall trajectory of precrisis forecasts was upward; the reality was a brutally deep capital V.

The reason this poses a deep intellectual crisis for macroeconomics is that the entire point of the field, as it has developed since the work of John Maynard Keynes in the 1930s, is to prevent just this sort of severe downturn. Keynes once spoke of a future in which economists would be “humble, competent people on a level with dentists,” while the brilliant up-and-coming French economist Esther Duflo recently gave an admired I.M.F. lecture called “The Economist as Plumber.” It seems to me, though, that what macroeconomists do is really most like bomb disposal. Uniquely in the social sciences and humanities, macroeconomics was developed with a specific, real-world purpose, and a negative purpose to boot: to stop anything like the Great Depression from ever happening again. Given this goal — to avert systemic crises and downturns — the credit crunch and the Great Recession were, for macroeconomics, an intellectual disaster.


New algorithms by U of T researchers may revolutionize drug discoveries

University of Toronto, U of T News


from

A new set of machine learning algorithms developed by U of T researchers that can generate 3D structures of tiny protein molecules may revolutionize the development of drug therapies for a range of diseases from Alzheimer’s to cancer.

“Designing successful drugs is like solving a puzzle,” says U of T PhD student Ali Punjani, who helped develop the algorithms. “Without knowing the three-dimensional shape of a protein, it would be like trying to solve that puzzle with a blindfold on.”


TytoCare data suggests its technology is as accurate as in-person exams

MedCity News, Neil Versel


from

Telehealth company TytoCare has produced peer-reviewed data showing that its remote examination technology can produce outcomes essentially as good as in-person exams.

In a study conducted at a pediatric emergency department, a review of exams of the ears, heart, lungs and throat in patients aged 2-18 found an average rating of 4.4 on a scale of 5 for remote encounters. That compares to a 4.5 rating for in-person exams using standard otoscopes and stethoscopes, according to TytoCare, a statistically insignificant difference.


Data-Driven Mattress Brand Wants To Rethink How We Sleep

PSFK, Ido Lechner


from

Like a fine wine, the Eight Smart Mattress gets better with time: more features and sleep data for a more rejuvenated morning wakeup

 
Events



TensorFlow Dev Summit | Agenda



Mountain View, CA, and Online February 15. [In-person Waiting List available]

Monthly Music Hackathon NYC — Music Visualization Hackathon



New York, NY Saturday, February 18, starting at 12 noon, 45 W 18th St [free, rsvp required]

Stupid Shit No One Needs & Terrible Ideas Hackathon



New York, NY A one-day event where participants conceptualize and create projects that have no value whatsoever. Hosted and sponsored by NYU ITP. [free, registration required]

Leonardo Art/Science Evening Rendezvous



San Francisco, CA Speakers: Maria Makela (California College of the Arts), Theresa Wong (Composer), Gregorij Kurillo (UC-Berkeley), Luciano Chessa (Composer) at 7 p.m., University of San Francisco, Fromm Hall (2130 Fulton Street) [free]

Register now for F8 2017!



San Jose, CA F8 is Facebook’s annual global developer conference. April 18-19. [$$$]
 
NYU Center for Data Science News



How Can We Improve Online Search?

NYU Center for Data Science


from

Juliana Freire, along with a group of research associates from NYU’s Tandon School of Engineering, have set out to streamline web searching by creating the Domain Discovery Tool (DDT), which allows for interactive domain discovery.


Tandon Researchers Develop Database to Reduce Crime and Incarceration

NYU, Washington Square News


from

The website and social network, developed by the Governance Lab, the Laura and John Arnold Foundation and the Justice Management Institute, launched on Jan. 30.


Dashboard Tracks Health Data at City Level

Healthcare Informatics Magazine, David Raths


from

The New York University School of Medicine’s Department of Population Health has developed a City Health Dashboard to help cities understand, compare and take action to improve health status and health risks in their municipalities.

Developed with NYU’s Robert F. Wagner School of Public Service, in partnership with the National Resource Network, the Dashboard includes 26 measures related to health across five domains: Health Outcomes, Health Behaviors, Clinical Care, Social and Economic Factors, and the Physical Environment.

Many of the measures reported in the Dashboard have not previously been readily accessible at the city level. By bringing together data on health status, behavioral risks, and social determinants of health, calculated to the city (and in some cases neighborhood or census tract) level), the Dashboard has the potential to be a new resource for cities seeking to improve population health.

 
Tools & Resources



Harnessing Visual Data Using Google Cloud

Gigaom, Chris Mohritz


from

What if we could extract and process that useful information through computers — what kind of insights and opportunities would that open up for your business?

Well, recent advancements in artificial intelligence have made computer vision possible. And giving computers the power of sight is already having a profound effect on our lives, work, and society as a whole.


Using Apache Spark for large-scale language model training

Facebook, Engineering Blog, Tejas Patil and Jing Zheng


from

In this post, we will describe how we used the imperative side of Spark to redesign a large-scale, complex (100+ stage) pipeline that was originally written in HQL over Hive. In particular, we will describe how to control data distribution, avoid data skew, and implement application-specific optimizations to build performant and reliable data pipelines. This new Spark-based pipeline is modular, readable, and more maintainable compared with the previous set of HQL queries. In addition to the qualitative improvements, we also observed reductions in both resource usage and data landing time.


Announcing TensorFlow Fold: Deep Learning With Dynamic Computation Graphs

Google Research Blog; Moshe Looks, Marcello Herreshoff and DeLesley Hutchins


from

“TensorFlow Fold makes it easy to implement deep-learning models that operate over data of varying size and structure. Furthermore, TensorFlow Fold brings the benefits of batching to such models, resulting in a speedup of more than 10x on CPU, and more than 100x on GPU, over alternative implementations.”


Stanford 2D-3D-Semantics Dataset (2D-3D-S)

Stanford University, Building Parser, Large Scale Parsing


from

“The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. It covers over 6,000 m2 and contains over 100,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information.”


Met Museum Makes 375,000 Images Available for Free

The New York Times, Joshua Barone


from

“All images of public-domain artworks in the Metropolitan Museum of Art’s collection — about 375,000 — are now free for anyone to use however they may please.”


Next-generation 3D Graphics on the Web

Apple, WebKit


from

We’re sharing an API proposal, and a prototype of that API for the WebKit Open Source project. We hope this is a useful starting point, and look forward to seeing the API evolve as discussions proceed in the Community Group.

 
Careers


Full-time positions outside academia

Statistician I – Biostat



Mayo Clinic; Rochester, MN

Leave a Comment

Your email address will not be published.