Data Science newsletter – October 27, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 27, 2017

GROUP CURATION: N/A

 
 
Data Science News



Yale holds inaugural data science workshop

Yale Daily News, Amy Xiong


from

Multiple Yale departments on Friday hosted the Data Science Workshop on Computational Social Science, the first in a series of annual conferences that will focus on different fields within data science.

Eight speakers — three from Yale and five from other institutions — presented at the event, discussing topics in computational social science, a field within data science that uses mathematics and programming to research social science. More than 100 people attended the daylong conference held at Luce Hall.

“One of the most exciting things about this conference was to see people working with rich, complex human data in different fields with different modeling techniques,” said Brian Scassellati, a computer science, cognitive science and mechanical engineering professor and one of the speakers at the event.


​A crowdsourcing platform opens up research on a global scale

Stanford University, School of Engineering


from

Millions of students worldwide find themselves in Sumanth’s position, with academic ambitions that might exceed the options open to them by virtue of where they go to school. That same predicament once described Rajan Vaish, who oversaw the Crowd Research Initiative as a postdoctoral scholar with Michael Bernstein, a Stanford computer scientist whose research focus is on using the internet as new medium to organize human activity.

“To find a position as an engineering or science researcher you need letters of recommendation,” said Vaish. “We realized that social media platforms and crowdsourcing models provided a way to expand access to mentorship, while also producing valuable research in the process.”


When Classroom Technology Impedes Student Learning

Education Week, Rick Hess Straight Up blog


from

New research in the latest issue of Education Next does an elegant job of capturing the perils of ed tech. Researchers Susan Payne Carter, Kyle Greenberg, and Michael Walker report intriguing but disquieting findings from a randomized controlled classroom experiment conducted at West Point (for the in-the-weeds version of their study, check out the February 2017 Economics of Education Review).

Payne Carter and her colleagues examined the performance of West Point sophomores in a core economics course. During spring and fall 2015, the researchers assigned participating class sections to one of three groups: technology-free (no use of laptops or tablets during class), technology-at-will (students could use what they liked, as they liked), and tech-limited (tablet-only with restrictions that made it tough for students to text, shop, or update social media). The study wound up encompassing 726 students in 50 classrooms over the two terms.

What did the researchers find? On the three-and-a-half-hour final exam—which included multiple-choice, short-answer, and essay questions—students in the technology-free group fared best.


Graduate survey: A love–hurt relationship

Nature, Naturejobs Blog, Chris Woolston


from

Nature’s 2017 PhD survey reveals that, despite many problems with doctoral programmes, PhD students are as committed as ever to pursuing research careers.


Walmart store robot program expands

Business Insider, Dennis Green


from

The robots autonomously roam up and down the aisles of the store, picking out errors it sees.


Machine intelligence for content distribution, logistics, smarter cities, and more

O'Reilly Radar, O’Reilly Data Show Podcast, Ben Lorica


from

In this episode of the Data Show, I spoke with Rhea Liu, analyst at China Tech Insights, a new research firm that is part of Tencent’s Online Media Group. If there’s one place where AI and machine learning are discussed even more than the San Francisco Bay Area, that would be China. Each time I go to China, there are new applications that weren’t widely available just the year before. This year, it was impossible to miss bike sharing, mobile payments seemed to be accepted everywhere, and people kept pointing out nascent applications of computer vision (facial recognition) to identity management and retail (unmanned stores). [audio, 36:10]


Announcing OpenFermion: The Open Source Chemistry Package for Quantum Computers

Google Research Blog, Ryan Babbush and Jarrod McClean


from

“we announce the release of OpenFermion, the first open source platform for translating problems in chemistry and materials science into quantum circuits that can be executed on existing platforms. OpenFermion is a library for simulating the systems of interacting electrons (fermions) which give rise to the properties of matter. Prior to OpenFermion, quantum algorithm developers would need to learn a significant amount of chemistry and write a large amount of code hacking apart other codes to put together even the most basic quantum simulations. While the project began at Google, collaborators at ETH Zurich, Lawrence Berkeley National Labs, University of Michigan, Harvard University, Oxford University, Dartmouth College, Rigetti Computing and NASA all contributed to alpha releases. You can learn more details about this release in our paper, OpenFermion: The Electronic Structure Package for Quantum Computers.”


UC Berkeley Uses Optical Scanning to Recover Indigenous Voices from Wax Cylinders

Hyperallergic, Allison Meier


from

A project underway at UC Berkeley is using innovative optical scan technology to transfer and digitally restore these recordings, even from cylinders that are broken. Supported by the National Science Foundation (NSF) and the National Endowment for the Humanities (NEH) through a NEH-NSF partnership called “Documenting Endangered Languages,” the initiative aims to preserve about 100 hours of audio. The collaborative restoration project involves linguist Andrew Garrett, digital librarian Erik Mitchell, and anthropologist Ira Jacknis, all at UC Berkeley, and utilizes a non-invasive scanning technique developed by Lawrence Berkeley National Laboratory physicists Carl Haber and Earl Cornell.


India Warily Eyes AI

MIT Technology Review, Samanth Subramanian


from

Technology outsourcing has been India’s only reliable job creator in the past 30 years. Now artificial intelligence threatens to wipe out those gains.


Making Sense Of Insurance Cards Using Deep Learning

ZocDoc


from

Understanding insurance coverage and using it to find a relevant doctor is an important but complex task for patients. In the U.S., we have hundreds of insurance carriers and many of these have hundreds of plans available. Patients have a lot to sort through in order to identify the right information they need to find a doctor and book an appointment. To help patients have an easier time deciphering their insurance, we decided to build the Zocdoc Insurance Checker to extract key pieces of information from a photograph of a patient’s insurance card.


University Data Science News

Harvard’s math department has a pronounced gender imbalance according to a new exposé by Hannah Natanson. The department’s senior faculty are 100% men; three potential senior faculty who were women turned their Harvard offers down over the past several years. They have had a senior math faculty who was a woman, but not until 2009, and she left for Princeton shortly after her promotion. Undergraduate women report behavior and comments that they believe stem from casual, unconscious sexism of the sort sociologists might call ‘toxic masculinity’. Recently former chair, Peter B. Kronheimer noted that, “When criticism is offered, or difficulties become apparent, the department must act and must seek to address the problems wherever it can. It has done so before, and will continue to do so, now and in the future.” It’s hard to believe the department actually knows how to fix it’s culture if they’ve been working to address it and still have no senior women faculty and students reporting widespread gender discrimination.

Paul Vines, a former PhD student at UW-Seattle’s Paul G. Allen School of Computer Science and Engineering is lead author on a new paper that shows how standard online advertising practices can be used to track individuals for only about $1000. The team found that by serving ads to specific phones, they could pinpoint a user’s location to within 8 meters. They also noted that “the individual being tracked does not need to actually click on an ad in order for ADINT to work, because purchasers can see where the ad is being served regardless of whether the target interacts with it.” While these types of services are designed for advertisers, they could easily be coopted by stalkers, jealous spouses, or even more nefarious agents.



Independently, Kirsten Ostherr Americans should also be concerned about how health apps collect, store, and share data. Only a tiny fraction are regulated by the FDA. “If your app carefully sidesteps claiming any kind of medical intervention, then it’s a health and wellness app and not a medical device — and it is not regulated,” Ostherr said. The apps often make suggestions related to health and wellness activities, though they are not considered medical advice, so they aren’t regulated as such.



Cornell University has a new Technology and Humanity program in which students select any major from the College of Arts and Sciences and receive computer science training alongside the coursework in their major. In a unique twist, they spend the academic year in Ithaca and then relocate to the Cornell Tech New York City campus to “immerse themselves in the innovation economy”. I can see that combining traditional Arts and Sciences degrees with computer science is likely to infuse technical training with the breadth of the liberal arts curriculum, hence living up to the Technology and Humanity combo. What is less clear to me is how the innovation immersion will maintain the joint focus, but it is certainly possible.



Most citizens are unaware of the technological and ethical implications of the next generation genetic sequencing being taken up across the life sciences. In an interesting move, Jonathan Sobel, Henry Luc and colleagues working in Switzerland are publishing on Beer DeCoded, revealing the genetics of the living organisms used to make beer. The project, which surfaced this past summer on Kickstarter, certainly wins with respect to public appeal and I’m curious to see if it can provide the teachable moment its authors intend.

Jelani Nelson and Kasper Green Larsen have a new paper proving that the JL lemma algorithm for reducing dimensionality in big data sets is, in fact, the best algorithm to use. Nelson noted that there may be better dimensionality reduction methods for specific datasets.



University of Houston has just announced it will form an Institute for Data Science to be headed by Andrea Prosperetti. Prosperetti is a Distinguished Professor of Mechanical Engineering and directs yet another institute, the UH Center for Advanced Computing and Data Systems. Her mechanical engineering background sets her somewhat apart from the math, stats, and computer science fields that are frequently tapped for leadership roles in data science.

BioMedCentral has launched a new publication type that they are calling Data Notes. A data note is a “data descriptor – without analysis or interpretation – that aims to increase data visibility and support the reuse of valuable research data. They are short (1,000 words) and therefore quick and easy to write, undergo peer review by an editorial board member and are fully citable through a DOI.” The goal is to speed the data sharing process and make it worthwhile to share data by accruing citations for these short data notes.



The Allen Center for Brain Science received three separate five year grants totaling ~$100 million from the NIH to study cell types in mouse and human brains. They will be working on a whole brain atlas for the mouse, a multimodal atlas of human brain cell types, and they will build a web portal for sharing single cell brain data, tools, and knowledge.



The University of Colorado Boulder has yet more evidence that scientific output does not decrease with age among professors. Academia may not be perfect, but we are definitely doing something right if people with tenure continue to be productive.



UC-Berkeley got two new NSF grants to support data science. One of them “will deepen the theoretical foundations of data science in a new transdisciplinary institute” and the other will focus on strengthening educational strategies through national workshops. Cathryn Carson will be heading up the educational grant, following on her multi-year coalition effort to build a robust, interdisciplinary undergraduate data science curriculum. Elsewhere at Berkeley EECS, there’s a new technical report outlining open problems in AI.

Speaking of teaching data science-y stuff, Lisa Dierker, at Wesleyan University has revamped the way they teach statistics. The program is called Passion-Driven Statistics and it “offers a multidisciplinary project-based approach that is both supportive and engaging for students at all levels of statistical mastery and those coming from diverse educational backgrounds”. One of the great things about the rising importance of data science is the increased attention paid to statistics, though I am not claiming Dierker was motivated by data science hype. Most statisticians aren’t.

New York University Abu Dhabi will build a data center in the UAE for archiving and processing datasets from space missions.



Caltech opened its new Center for Autonomous Systems and Technologies (CAST), where scientists “improve the ability of drones and robots to think and react independently”.

Yale University hosted its first Data Science workshop, focusing on Computational Social Science.

A new Nature survey out shows that despite the “many problems” with graduate programs, PhD students are as committed as ever to pursuing research careers. I happen to be about to submit a paper on graduate degree seekers and holders in data science. In my findings, I saw that grad students were as committed as ever, but postdocs are more likely to be disaffected.



University of
California San Diego
researchers developed technology that looked at Twitter data to identify entities illegally selling prescription opioids online. They, “detected 1,778 posts that were marketing the sale of controlled substances, 90 percent included hyperlinks to online sites for purchase.” Score one for data science for good.


Can we train machines without examples? Interview with Ian Goodfellow

RE•WORK, Blog, Yazmin How


from

Currently leading a small group of researchers studying adversarial techniques in machine learning, Ian’s work at Google Brain is exploring how it can be possible ‘to make algorithms that work well even when an adversary intentionally tries to make the algorithm fail.’ He explained how they are working on real-world scenarios for instance a spammer trying to send an email that will get through a filter, in other cases they study imaginary adversaries that we invented to give the machine learning algorithm more exercise and force it to get better. GANs for example ‘learn to generate realistic images by playing a game where a generator network has to make images that fool an object recognition network into thinking the fake images are real.’ … Upon speaking with RE•WORK, Ian answered four key questions about his current work and the impact of GANs in DL research and progressions:


Facebook’s AI chief says the public doesn’t know how dumb AI really is

The Verge, James Vincent


from

LeCun has been working in AI for decades, and is one of the co-creators of convolutional neural networks — a type of program that’s proved particularly adept at analyzing visual data, and powers everything from self-driving cars to facial recognition. Now, as head of Facebook’s AI research facility FAIR, he helps AI make the journey from the lab to the real world. His team’s software automatically captions photos for blind users and performs 4.5 billion AI-powered translations a day.

“We had a bigger impact on products than Mark Zuckerberg expected,” LeCun told The Verge over Skype recently. But, as he explained during the interview, it’s clear to him that AI still has a long, long way to go before it approaches anything near the intelligence of a baby, or even an animal. Oh, and if you don’t mind, he’d really like it if we all stopped using Terminator pictures on AI articles.

 
NYU Center for Data Science News



Mysterious stars: physicists tackle anomalies in star data

Medium, NYU Center for Data Science


from

Did you know that stars — like humans — typically couple up, too?Astrophysicists estimate that up to 85% of all the stars in our universe actually orbit each other around a common center of mass.

These are called binary stars, and astrophysicists study them to answer questions like: What are the physical properties (masses or sizes) of the stars? How far away are the stars from Earth? What are their chemical compositions?

These are the kinds of questions that David W Hogg, a Professor at NYU’s Department of Physics and the Center for Data Science, is trying to solve.


Enabling & Evaluating Real-Time Collaboration in Online Learning Platforms

Medium, NYU Center for Data Science


from

Peter Halpin & Yoav Bergner, with a seed grant from the Moore-Sloan Data Science Environment, will prototype a new tool for bringing collaborative problem solving to online learning at scale

 
Tools & Resources



How to unit test machine learning code.

Medium, Chase Roberts


from

“Over the past year, I’ve spent most of my working time doing deep learning research and internships. And a lot of that year was making very big mistakes that helped me learn a lot about not just about ML, but about how to engineer these systems correctly and soundly. One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.”

“However, there doesn’t seem to be a solid tutorial online on how to actually write unit tests for neural network code.”


A Call to Tweets (& Blog Posts)!

Bob Rudis, rud.is


from

I blog today to suggest there is a better way to work with Twitter data from R, especially if your central use-case is searching Twitter and mining tweet data. This new way is rtweet by Michael Kearney. It popped up on the scene back in August of 2016 and receives quite a bit of [heart symbol] from the developer, especially on GitHub.

This post is short and mostly designed to convince you to (a) try out the package and (b) blog and tweet about the package — if you do agree that it’s the best modern way to work with Twitter from R — to raise awareness about it. Because of that focus, I won’t be delving into all of rtweet‘s seekrits, but you can explore them yourself on it’s spiffy pkgdown site.


Allen Institute shares first open database of live human brain cells

Allen Institute for Brain Science


from

“The Allen Institute for Brain Science has added the first data from human nerve cells to the Allen Cell Types Database: a publicly available tool for researchers to explore and understand the building blocks of the human brain.”


Announcing the Release of Anaconda Distribution 5.0

Anaconda


from

“We’re thrilled to announce the release of Anaconda Distribution 5.0!”


Learning a Hierarchy

OpenAI; Kevin Frans, Jonathan Ho, Peter Chen, Pieter Abbeel & John Schulman.


from

“We’ve developed a hierarchical reinforcement learning algorithm that learns high-level actions useful for solving a range of tasks, allowing fast solving of tasks requiring thousands of timesteps. Our algorithm, when applied to a set of navigation problems, discovers a set of high-level actions for walking and crawling in different directions, which enables the agent to master new navigation tasks quickly.”


New DataCite Metadata Updates Support Software Citation

DataCite


from

“The DataCite Metadata Schema 4.1 has been released today. If the 4.1 release has a theme, it is support for software citation.”

 
Careers


Internships and other temporary positions

CRUX Development Associate



Northeast Big Data Hub; remote: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont

CRUX Web Developer



Northeast Big Data Hub; remote: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont

Leave a Comment

Your email address will not be published.