NYU Data Science newsletter – April 5, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for April 5, 2016

GROUP CURATION: N/A

Data Science News

[1604.00289] Building Machines That Learn and Think Like People

arXiv, Computer Science > Artificial Intelligence; Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman

from April 01, 2016

Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

China’s Companies Poised to Take Leap in Developing a Driverless Car – The New York Times

The New York Times

from April 03, 2016

Gansha Wu was a veteran engineering manager at Intel Corporation and director of Intel Labs China when two events upended his world last year.

First, he listened to the veteran technology writer Michael Malone tell an audience of Intel employees that if they were too cautious they would fail. Then he attended a leadership training session for Intel executives. The trainer told them that “to be a leader is to design a future that is unpredictable and which nobody bets on.”

He couldn’t sleep at night, thinking about his well-ordered, 16-year career at Intel. So he decided to take a risk. With four colleagues, he made the decision to take the uncertain path, which today is becoming more common in China than even in Silicon Valley: He quit his job to begin a start-up that specializes in autonomous, or self-driving, cars.

Toyota partners with Microsoft on new data science company, Toyota Connected

TechRepublic, Conner Forrest

from April 04, 2016

To advance its work in data analytics and connected cars, Toyota recently partnered with Microsoft on a new company based in Plano, Texas called Toyota Connected.

Community-level data science and its spheres of influence: beyond novelty squared

UW eScience Institute, Brittany Fiore-Gartland and Anissa Tanweer

from April 04, 2016

Data science has many characterizations, but in academia it is often talked about as pushing the limits of both methodological and domain science, what Josh Bloom, a Professor of Astronomy at U.C. Berkeley, has referred to as “novelty squared”. Bloom sees this as the “great challenge of modern interdisciplinary scientific collaboration”. The idealized characterization of data science in academia is also represented in the idea of shifting from the traditional T-shaped scientists, who have deep expertise in a single domain, to (Pi)-shaped scientists with deep expertise in both a domain and methodological science (as coined by Alex Szalay and discussed here and here. As (Pi)-shaped data scientists, they are primed to innovate in multiple disciplinary trajectories. Bloom and others have argued that these characterizations of novelty squared and (Pi)-shaped scientists represent the “unicorn” of data science.

Report from LSST/AstroPy Summit, 3/26-3/27/2016 – Data Management – community.lsst.org

Large Synoptic Survey Telescope Community

from March 31, 2016

Following the Python in Astronomy and Science Pipelines Working Group Meetings at UW last week, we held a weekend meeting between various DM participants and a few of the more active AstroPy developers.

This report is a high-level summary, focused on action items; more complete notes can be found on the GitHub wiki4 for the in-progress SPIE Paper on LSST/AstroPy integration.

The Heat Is On at InnoVention

NYU Tandon School of Engineering

from April 04, 2016

The InnoVention Competition allows students from across NYU to prototype and pitch commercially viable solutions to pressing societal problems, and this year’s semi-finalists are out to improve the world in their own ways.

Congratulations to each of them for making it this far, and we’ll be rooting them on as they try to make the finals on April 7.

Recap of Winners from DataFest 2016

Five College ASA DataFest

from April 04, 2016

This is the last recap post from Five College ASA DataFest 2016! There are recaps of Day 1, Day 2, and Day 3, as well as a post full of group photos.

But of course, the main event in day 3 is the Winners Showcase. Our illustrious judges presented six teams with awards. Awards included “Best Business Recommendation”, “Best Methods”, and “Best Insight,” as well as three “Best in Group” prizes from the three parallel sessions and of course the “Best in Show”!

Spring 2016 hackNY Student Hackathon Recap

hackNY

from April 04, 2016

Last weekend we celebrated the 13th hackNY Student Hackathon with a record number of student hackers from all over the East Coast. 260 students hacked on projects over 24 hours, building webapps, games, and hardware hacks. They represented 50 different universities, and many joined with students from several other schools to form their teams. We were extremely impressed by the quality of the hacks built – nearly half our hackers were attending their first hackathon!

Employees from NYC startups came to present their APIs to the students and mentor them throughout the weekend, including Buzzfeed, MongoDB, Foursquare, Clarifai and Giphy. Hacks included both useful and funny webapps, simulations and educational material using civic data, a hilarious but politically relevant game, and innovative hardware hacks using drones, Oculus Rifts and Arduinos. When they needed a break from hacking, hackers attended a workshop on APIs for new hackers, coding competitions with MLH, and our traditional Ladies Storm Hackathons meetup. At the LSH meetup, 30 women hackers enjoyed sharing their experiences and finding common interests while decorating cupcakes.

Can we trust robots to make moral decisions?

Quartz, Olivia Goldhill

from April 03, 2016

Last week, Microsoft inadvertently revealed the difficulty of creating moral robots. Chatbot Tay, designed to speak like a teenage girl, turned into a Nazi-loving racist after less than 24 hours on Twitter. “Repeat after me, Hitler did nothing wrong,” she said, after interacting with various trolls. “Bush did 9/11 and Hitler would have done a better job than the monkey we have got now.”

Of course, Tay wasn’t designed to be explicitly moral. But plenty of other machines are involved in work that has clear ethical implications.

Wendell Wallach, a scholar at Yale’s Interdisciplinary Center for Bioethics and author of “A Dangerous Master: How to keep technology from slipping beyond our control,” points out that in hospitals, APACHE medical systems help determine the best treatments for patients in intensive care units—often those who are at the edge of death. Wallach points out that, though the doctor may seem to have autonomy, it could be very difficult in certain situations to go against the machine—particularly in a litigious society. “Is the doctor really free to make an independent decision?,” he says. “You might have a situation where the machine is the de facto decision-maker.”

The Organizational Spectroscope — Medium

Medium, Duncan Watts

from April 02, 2016

For several decades sociologists have speculated that the performance of firms and other organizations depends as much on the networks of information flow between employees as on the formal structure of the organization [1, 2].

This argument makes intuitive sense, but until recently it has been extremely difficult to test using data. Historically, employee data has been collected mostly in the form of surveys, which are still the gold standard for assessing opinions, but reveal little about behavior such as who talks to whom. Surveys are also expensive and time consuming to conduct, hence they are unsuitable for frequent and comprehensive snapshots of the state of a large organization.

Thanks to the growing ubiquity of productivity software, however, this picture is beginning to change. Email logs, web-based calendars, and co-authorship of online documents all generate digital traces that can be used as proxies for social networks and their associated information flows. In turn, these network and activity data have the potential to shed new light on old questions about the performance of teams, divisions, and even entire organizations.

Sampling Bias in CS Education, and Where’s the Cyber Strategy?

Communications of the ACM blog, Mark Guzdial and John Arquilla

from April 01, 2016

… The Inverse Lake Wobegon Effect is a term I am coining for a fallacy that I see sometimes in computer science (CS) education: we sample from a clearly biased source and assume the sample describes the overall population. We know we are observing a superior sample, but act like we are getting a randomly distributed sample. This is a form of sampling bias (http://bit.ly/1R358iK).

I introduce the term in a book I just published with Morgan & Claypool, Learner-Centered Design of Computing Education: Research on Computing for Everyone. One example of the Inverse Lake Wobegon Effect in CS education is assuming a successful undergraduate introductory curriculum will be similarly successful in high school. Students in undergraduate education are elite. In the U.S., undergraduates are screened in an application process and are in the top half of most scales (such as intellectual achievement and wealth). Elite students can learn under conditions in which average students might not succeed, which educators call aptitude-treatment interactions (http://bit.ly/1PiaGB6).

Criminology’s Wonderland: Why (Almost) Everything You Know About Crime is Wrong

Quillette, Brian Boutwell and Kevin Beaver

from March 31, 2016

… crime is heritable. And yet, there is no crime gene. The fact that we have to write such a qualification speaks to a woeful ignorance of genetics that pervades much of the public and academia. Mercifully, we can sidestep long discussions of molecular biology and skip right to the “law” that takes the possibility of a crime gene off the table. It is in fact, the newly dubbed “4th Law of Behavior Genetics” and it’s quite simple. For complex traits, there are, for the most part, likely hundreds or thousands of genes involved, most of which generally contribute only very small effects to any given outcome. Not only are there many genes involved, but the complexity of how these genes operate is amazing. There are genes that influence other genes, genes that assemble neurons and run them, and genes that perform any number of other banal processes in the body. The link between a gene and some behavioral outcome, moreover, doesn’t have to be linear, straightforward, or deterministic. Most importantly for our discussion, though, is that many of these genes underpin the workings of our central nervous system, an interface with the world that ultimately comes to bear on our behavior, including our behaviors that happen to break the law.

Events

Intro to Machine Learning in H2O | D-Lab

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it’s easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster).

Wednesday, April 6, from 10 a.m. to 12 noon, at D-Lab Convening Room (356 Barrows Hall)

Deadlines

2016 Workshop on Human Interpretability in Machine Learning

deadline: subsection?

The 2016 Workshop on Human Interpretability in Machine Learning (WHI 2016), held in conjunction with ICML 2016, will bring together researchers who study the interpretability of predictive models, develop interpretable machine learning algorithms, and develop methodology to interpret black-box machine learning models (e.g., post-hoc interpretations).

New York, NY. Deadline for submissions is Sunday, May 1.

CDS News

CDS Hosts its First Data Science Showcase

NYU Center for Data Science

from April 04, 2016

On March 9th, NYU’s Center for Data Science hosted its first Data Science Showcase—in conjunction with the Stern School of Business—to highlight the interdisciplinary work in data science taking place across a wide range of departments at the University.

Professor Zaid Harchaoui, a Visiting Assistant Professor at CDS, kicked off the showcase with a presentation titled “The History of AI Research and its Public Perception.”

When Social Movements Reach Their Tipping Point, and Why Scientific Collaboration is Important

Annenberg School for Communication, Bruno Goncalves

from April 05, 2016

Unlike top-down actions like a corporate announcement or a governmental decree that then have rippling effects on the public, events of collective effervescence start small — a tight circle of activists, a maverick trendsetter — and grow organically until they suddenly seem to explode. Protesters pour into the streets. Rumors become national headlines. Americans under 25 suddenly all wear high-waisted jeans.

Finding a way to identify that tipping point where a movement explodes was one of the aims of the study. [video, 1:21]

Using Data Science to Map the Universe

NYU Center for Data Science

from April 01, 2016

Last Wednesday, the Center for Data Science welcomed one of its Affiliated Faculty members, astrophysicist Michael Blanton, to give a luncheon lecture on his work with the Sloan Digital Sky Survey (SDSS). Blanton is the director of the SDSS, where he leads a large team of astrophysicists, using data scientists to map the Universe.

Tools & Resources

Mailtrain

Mailtrain.org

from April 04, 2016

Self hosted newsletter app built on top of Nodemailer

tflearn/dask.py

GitHub – tflearn

from April 01, 2016

Example on how to use Dask with TFLearn. Dask is a simple task scheduling
system that uses directed acyclic graphs (DAGs) of tasks to break up large
computations into many small ones. It can handle large dataset that could
not fit totally in ram memory. Note that this example just give a quick
compatibility demonstration.

Deep learning library featuring a higher-level API for TensorFlow.

GitHub – tflearn/tflearn

from April 01, 2016

TFlearn is a modular and transparent deep learning library built on top of Tensorflow. It was designed to provide a higher-level API to TensorFlow in order to facilitate and speed-up experimentations, while remaining fully transparent and compatible with it.

Careers

With ‘Gigs’ Instead of Jobs, Workers Bear New Burdens

The New York Times, The Upshot blog, Neil Irwin

Short Term Research Assistant — The Governance Lab (GovLab) at NYU

NYU, The Government Lab

Seeking Full-stack Engineer / Data Scientist, Open Syllabus Project

The Open Syllabus Project

Sports.BradStenger.com

NYU Data Science newsletter – April 5, 2016

Leave a Comment Cancel reply