NYU Data Science newsletter – January 15, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for January 15, 2016

GROUP CURATION: N/A

 
Data Science News



AI Polygraph studies your face and voice for signs you’re being deceitful | Daily Mail Online

Daily Mail Online


from January 12, 2016

Looking for tiny facial tics from opponents has been used by poker players for decades as a way to spot a bluff, but now Chinese scientists are using the same technique to develop a lie detector.

They have created an online polygraph that uses facial recognition to study minute changes in facial expression, skin colour, temperature and heart rate – all of which could reveal a lie.

 

The Future of Big Data and Analytics in K-12 Education – Education Week

Education Week


from January 11, 2016

Imagine classrooms outfitted with cameras that run constantly, capturing each child’s every facial expression, fidget, and social interaction, every day, all year long.

Then imagine on the ceilings of those rooms infrared cameras, documenting the objects that every student touches throughout the day, and microphones, recording every word that each person utters.

Picture now the children themselves wearing Fitbit-like devices that track everything from their heart rates to their time between meals. For about a quarter of the day, the students use Chromebooks and learning software that track their every click and keystroke.

What you’re seeing is the future of K-12 education through the eyes of Max Ventilla, the CEO of AltSchool, a Bay Area startup that represents the most aggressive, far-reaching foray into the world of big data and analytics that the K-12 education sector has seen to date.

 

Data analytics startup Looker raises $48 million, led by Kleiner Perkins | VentureBeat | Deals | by Jordan Novet

VentureBeat, Jordan Novet


from January 14, 2016

Looker, a startup selling data analytics software, is announcing today a $48 million funding round led by Kleiner Perkins Caufield & Byers.

Looker can visualize data from many sources, including Amazon Web Services (AWS) Redshift, Cloudera Impala, HP Vertica, MySQL, PostgreSQL, and Snowflake. It’s available as on-premises software or as a cloud service.

 

2016 Google Tracker: Everything Google is working on for the new year

Ars Technica


from January 08, 2016

It’s that time of the year again—welcome to the Google Tracker! This is a bi-annual series where we recap every ongoing project (that we know about, at least) inside of Google’s sprawling empire.

Though from now on, perhaps we should say, “Every ongoing project inside of Alphabet’s sprawling empire.” “Google” is now a mere company inside of “Alphabet,” the newly formed umbrella company created by Google’s founders. Most of the Google side projects we’ve been tracking in the past have been spun off into companies inside of Alphabet, but rest assured we’re still keeping track of everything.

As always, the Tracker is a big roundup of previous announcements, rumors, and a bit of speculation.

 

Data science specialist Michael Franklin to lead computer science at UChicago | UChicago News

UChicago News


from January 11, 2016

As part of a plan to greatly increase the scale, scope and impact of computer science research and education across the University community, the University of Chicago has appointed prominent data science scholar Michael Franklin to chair its Department of Computer Science and to serve as senior advisor to the provost on computation and data science.

 

New analysis offers more evidence against student evaluations of teaching

Inside Higher Ed


from January 11, 2016

There’s mounting evidence suggesting that student evaluations of teaching are unreliable. But are these evaluations, commonly referred to as SET, so bad that they’re actually better at gauging students’ gender bias and grade expectations than they are at measuring teaching effectiveness? A new paper argues that’s the case, and that evaluations are biased against female instructors in particular in so many ways that adjusting them for that bias is impossible.

 

The Unreasonable Reputation of Neural Networks | [ thinking machines ]

Luke Hewitt, thinking machines blog


from January 12, 2016

It is hard not to be enamoured by deep learning nowadays, watching neural networks show off their endless accumulation of new tricks. There are, as I see it, at least two good reasons to be impressed:

  • Neural networks can learn to model many natural functions well, from weak priors.
  • Neural networks can learn surprisingly useful representations
  •  

    The Future of Real-Time SLAM and “Deep Learning vs SLAM”

    Tomas Malisiewicz, Tombone's Computer Vision Blog:


    from January 13, 2016

    Last month’s International Conference of Computer Vision (ICCV) was full of Deep Learning techniques, but before we declare an all-out ConvNet victory, let’s see how the other “non-learning” geometric side of computer vision is doing. Simultaneous Localization and Mapping, or SLAM, is arguably one of the most important algorithms in Robotics, with pioneering work done by both computer vision and robotics research communities. Today I’ll be summarizing my key points from ICCV’s Future of Real-Time SLAM Workshop, which was held on the last day of the conference (December 18th, 2015).

    Today’s post contains a brief introduction to SLAM, a detailed description of what happened at the workshop (with summaries of all 7 talks), and some take-home messages from the Deep Learning-focused panel discussion at the end of the session.

     

    Google Owns Up to Its Self-Driving Cars’ Near-Misses – IEEE Spectrum

    IEEE Spectrum


    from January 13, 2016

    Google has given some details on the near-miss experiences that its self-driving cars have had. Most, it turns out, were noted by the car’s own software, but a few required that the driver jump in without warning.

    Chris Urmson, head of the driverless car program of Google’s parent company, Alphabet, said yesterday in a statement that there had been 341 incidents over 14 months. In 272 incidents, the self-driving system knew that it had failed and immediately alerted the driver; the humans took an average of less than a second to take the wheel.

     

    An Algorithm Could Know You Have A Genetic Disease Before You Do | FiveThirtyEight

    FiveThirtyEight


    from January 13, 2016

    As a biomedical informatics researcher, Nigam Shah spends his days using math to try to make sense of giant, unwieldy data sets. He’s used data mining to identify off-label drug use and to create widgets to add to electronic health records that may help doctors learn about best practices for the treatment of a disease in the absence of a randomized controlled trial. In one of his current projects, he’s using machine learning, a field of study that has helped develop self-driving cars and speech recognition software, to try to identify people with an under-diagnosed and potentially deadly genetic disorder.

     
    Events



    Global Game Jam



    Join 280,000 other game makers from 58 countries around the world to make a game January 29-31! The Global Game Jam is a forty eight hour game jam that invites anyone and everyone to come make a game.

    The NYU Jam site is annually one of the biggest in the world, and brings together game makers from all over the northeast to experiment with new ideas. People of all skill levels and experiences are welcome!

    Friday-Sunday, January 29-31, at 2 Metrotech Center, 8th Floor

     

    Version Control with Git and Github – LibCal – NYU Libraries



    Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

    This workshop introduces the basic concepts of Git version control. Whether you’re new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control.

    Tuesday, January 26, at 4 p.m., 619 Bobst Library

     
    Deadlines



    MassMutual Data Science Development Program

    deadline: subsection?

    The Data Science Development Program (DSDP) trains promising, recent graduates to become well-rounded data scientists for MassMutual. The DSDP is an intense, three-year program that combines academic coursework with the practical experience of working on data science projects for MassMutual. At the end of three years, this would be equivalent to earning a Master’s degree in data science with deep practical work experience. By the end of three years, members should be capable of leading their own data science projects for the company.

    Deadline to Apply is Tuesday, March 1

     

    Application to Attend the The 33rd Annual Summer Meeting of the Society for Political Methodology at Rice University

    deadline: subsection?

    Rice University is proud to host the 33rd annual meeting of the Society for Political Methodology. The conference will take place from July 21-23, 2016 at Rice University in the Bioscience Research Collaborative building.

    The conference will include panel presentations of a single project and poster sessions for faculty members and graduate students that will provide ample opportunity for rich interaction and scholarly interchange.

    Deadline to Apply is Monday, March 28

     
    CDS News



    Law and Ethics for Data Managers (Spring 2016; DS-GA 3001.003)

    NYU Center for Data Science, Laura Noren


    from January 13, 2016

    Following the code fraud committed by Volkswagen and in the midst of unregulated malware in advertising technology and ethically dubious use of algorithmic trading, the legal and ethical dimensions of data management and programming are fundamentally important components of comprehensive data scientific education.

    This course begins by examining existing legal structures – copyright law, intellectual property, university technology office policies, patent law, patient protection (HIPAA) – in the US as they apply to data. International legal practices like France’s ‘right to be forgotten’ and internet censorship practices in countries including China and Turkey will also be investigated.

     
    Tools & Resources



    Getting Started with Data Science: Storytelling with Data

    Murtaza Haider, eKonometrics


    from January 13, 2016

    Earlier this month, IBM Press and Pearson have published my book titled: Getting Started with Data Science: Making Sense of Data with Analytics. You can download sample pages, including a complete chapter. There are 104 pages in the sample. You can also watch a brief interview about the book recorded earlier at the IBM Insight2015 Conference.

    The very purpose of authoring this book was to rethink the way we have been teaching statistics and analytics to students and practitioners. It is no secret that most students required to take the mandatory stats course dislike it. I believe it has something to do with the way we have been teaching the subject than to do with the aptitude of our students.

     

    A Software Release Checklist

    Berkeley Institute for Data Science, Katy Huff


    from January 11, 2016

    One step in the development process for all software is the release. Some projects continually release, while others demarcate certain states of the codebase in a more formal, numbered way. Scientists leading the development of research software may be interested in how I manage my own software releases and encapsulate some reproducibility best practices along the way. A quick Google search turns up a plethora of “software release checklists.” This blog post summarizes mine.

     

    A Large Data Workflow with Pandas – Data Analysis of 8.2 Million Rows with Python and SQLite

    Plot.ly


    from January 13, 2016

    This notebook explores a 3.9Gb CSV file containing NYC’s 311 complaints since 2003. It’s the most popular data set in NYC’s open data portal.

     

    Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers

    Yahoo Labs


    from January 14, 2016

    Today, we are proud to announce the public release of the largest-ever machine learning dataset to the research community. The dataset stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.

     

    Leave a Comment

    Your email address will not be published.