Newsletter features journalism, research papers, events, tools/software, and jobs for June 20, 2017
GROUP CURATION: N/A
Data Science News
University Data Science News
Publisher PLOS ONE is serious about reproducibility, wrote a little ditty about checklists for reporting guidelines, and has a new partnership with protocols.io to “enable a direct link of published work with the detailed laboratory protocols used to obtain the results”. Does your journal use reporting checklists? It’s still not all that common.
Columbia University launched the Collaboratory as its attempt to teach students and faculty how to employ data science. The Collaboratory offers 14 data science courses for students. Researchers can make appointments with data scientists at the Collaboratory to improve their computational research in situ, as it were.
University of California – San Diego will offer a new undergraduate major and minor in data science. In what appears to me to be a display of academic politics, “[t]he minor will be open to all students excluding Jacobs School of Engineering majors and Math-Computer Science majors.”
We are all special snowflakes! It has been confirmed by science! Computational biologists at the Technical University of Munich have shown through simulations that our proteomics contain more variation than previously thought. According to Professor Burkhard Rost, “None of these effects might be detected in everyday life,” says Prof. Rost, under some conditions these variations “could become significant, for example when we are given a certain drug or are exposed to a certain influence for the first time,” even if the differences aren’t impactful on an everyday basis. I think that is a sound explanation of how to apply the ‘special snowflake’ identity: not everyday, only when on the rare occasion when it sets an individual apart.
Is it possible that there is a unified model of data science already?? A team at Google Mountain View and Google Research published, “One Model to Rule Them All”, claiming that they have solved image recognition, speech recognition, and speech translation. In one model. Probably want to at least skim the abstract.
Susan Athey has a new paper out in which she discovered that students will give away their friends’ email addresses for free pizza. This is what I’d call a failure of the (hungry) human in the privacy protection loop.
Quid has hairball network graphs and stacked histograms that attempt to display which areas of machine learning are generating the most publications. Spoiler: text analysis and natural language processing are in the top spot.
University of North Carolina and Washington University researchers have found a way to predict which infants will develop autism…but since there currently is no treatment available for infants who might develop autism, it’s unclear how and for whom to deploy this technology. Ethical dilemmas abound.
Kenny Jones and Derrick Bonafilia, undergrads at Williams College, used a Generative Adversarial Network (GAN) to generate art. They have both landed jobs at Facebook starting this fall.
Klara Nahrstedt of the University of Illinois has built an amazing piece of data infrastructure for science. Awkwardly named (oh, academia, we love you), 4CeeD “connects microscopes and other scientific instruments to a cloud infrastructure…with easy drag-and-drop uploading…advanced data management, annotation, and analytics capabilities…[and] semantic understanding.” You don’t have to take my word for how excellent this project is. It won Best Paper at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
One of the coolest areas of data science research that I hardly ever mention (note to self: do better) is precision agriculture. This week researchers in Georgia collected 36 million data points from a peanut field to create a 4D map. What for? To make decisions about “irrigation, pest control, harvesting, crop rotations” and that’s just the obvious applications. In the Indus Valley there are more rice paddies than peanut fields, but there is a similar desire to better utilize irrigation resources. New satellite mapping, not quite as advanced as the 4D peanut field, has enabled farmers to ensure scarce groundwater is used efficiently.
You shall be rewarded for reading this far by the hardware breakthrough of the week. Nicholas Harris and Yichen Shen of MIT have a functioning transistor-free silicon chip that uses beams of light in a way that mimics neurons. Science writer Matthew Hutson explains, “Optical computation is efficient because once light rays are generated, they travel and interact on their own. You can guide them—without energy—using regular glass lenses, whereas transistors require electricity to operate.” This is a huge step towards cheap, fast, efficient implementation of AI everywhere. Imagine what integration with IoT could do. Ethicists, please help us all by asking what could go wrong in that future world.
Want a better way to search for AI and data science tools? I know you do. I have long wanted to build a searchable database of all the tools, tutorials, and packages we feature in the newsletter. Last week Allen AI announced they would join Microsoft, Google, Baidu, and the Gates Foundation on Open Academic Search. Open Academic Search helps “researchers by having the companies, institutes and nonprofits involved make their AI and analysis tools open-source, or freely available to other groups to use and tweak.” Presumably with Google and Microsoft involved, there’s also going to be a good search feature.
If future artificial intelligence systems are to interact with us effectively, Mark Riedl believes we need to teach them “common sense.” I interviewed Riedl to discuss how AIs can use stories and creativity to understand and exhibit culture and ethics, while also gaining “common sense reasoning.” We also discuss the “big red button” problem with AI safety, the process of teaching rationalization to AIs, and computational creativity. Riedl is an associate professor at the Georgia Tech School of Interactive Computing, where his recent work focuses on human-AI interaction and how humans and AI systems can understand each other.
“There is a transformation in peoples’ mindsets that changes the way they operate,” said Adam Cannon, Senior Lecturer in Machine Learning at Columbia, at a recent Collaboratory debrief.
Cannon was referring to the impact of innovative curricula that weaves data and computational science into courses taught in all disciplines across Columbia.
The Collaboratory, jointly founded by Columbia Entrepreneurship and the Data Science Institute, now has 14 new courses that teach data and computational science in the context of disciplines ranging from urban planning and advertising to humanities and storytelling.
Every human being has a unique DNA “fingerprint”. In other words, the genetic material of any two individuals can be clearly distinguished. Computational biologists at the Technical University of Munich (TUM) have now determined that the impact of these variations has been greatly underestimated. The new insights could importantly impact advances in personalized medicine.
arXiv, Computer Science > Learning; Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
from
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.
Government Data Science News
Are you a Republican? Your data – including your home address and birth date – may be out there in the scummy wilds of the internet. One of the GOP’s campaign marketing firms, Deep Root, had an oops-leak resulting from a failure to lock the front door with a password. Even if you are merely a potential Republican, whatever that means, you may have been included in the 198 million records leaked.
There’s a Kagglecontest to improve the Department of Homeland Security’s threat recognition ability.
Microsoft is rolling out an update to its AI-powered photo editing app, Microsoft Pix, that aims to give Prisma and others like it some new competition. While the app was originally designed to enhance your iPhone photos by tweaking things like color, exposure and other variables, the newly updated Microsoft Pix will now let you have a little more fun with your photos, too – this time, by turning them into art.
Similar to Prisma, the new app introduces a feature called Pix Styles, which allows you to transform your photos into works of art, and use other effects. For example, one effect will make the picture look like it’s on fire. These are not photo filters, to be clear – the styles actually transfer texture, pattern and tones to the photo, explains Microsoft.
Low level task-based AI gets commoditized quickly and more general AI is decades off. In the meanwhile, will new AI startups succeed or will the value accrue to Google, Facebook, and Amazon?
While most of the machine learning talent works in big tech companies, massive and timely problems are lurking in every major industry outside tech.
University of California-San Diego, THE TRITON, Mo Elew
from
The Computer Science and Engineering (CSE) department held a town hall in Ledden Auditorium last Wednesday to discuss the newly established Data Science undergraduate program. The town hall aimed to answer questions about the program’s requirements and philosophy.
The Data Science program, open to all students starting September 25, 2017, is a new interdisciplinary major and minor administered by the CSE department and co-sponsored by Cognitive Science, and Mathematics departments. The new major, along with the minor, is expected to add 200 students per year and then grow to around 1000 students as it rolls out over the next four years. The minor will be open to all students excluding Jacobs School of Engineering majors and Math-Computer Science majors.
Researchers from the Massachusetts Institute of Technology and Stanford University have found that people say they want privacy but make choices suggesting the opposite, and can be easily manipulated through interface design, reassuring statements, and pizza.
In “Digital Privacy Paradox: Small Money, Small Costs, Small Talk,” a paper published Monday through the National Bureau of Economic Research, authors Susan Athey, Christian Catalini, and Catherine Tucker explore a phenomenon that has been widely observed: The disconnect between what people say about privacy and what they do.
It’s a discrepancy that calls into question the validity of notice and consent, the foundation of privacy rules.
Apple is working on a secret plan to turn the iPhone into a personal hub for all your medical information, CNBC reported last week.
But it isn’t doing it alone. The company’s health team has been working with a tiny start-up called Health Gorilla, according to two people familiar with the initiative.
Sources said that Health Gorilla is specifically working with Apple to add diagnostic data to the iPhone, including blood work, by integrating with hospitals, lab-testing companies such as Quest and LabCorp and imaging centers.
Auto supplier Robert Bosch GmbH will build a 1 billion-euro ($1.1 billion) semiconductor plant, the biggest single investment in its history, as the maker of brakes and engines prepares for a surge in demand for components used in self-driving vehicles.
The factory in Dresden, Germany will start producing chips needed for autonomous vehicles, smart homes and Internet-linked city infrastructure in 2021, the world’s biggest car-parts supplier said Monday in a statement. It’ll employ as many as 700 workers once construction is complete in 2019.
The data constantly collected and reported by smartphones can find numerous applications. An SNSF-funded project devoted to crowdsensing has found ways to improve privacy and localisation accuracy as well as reduce the impact on hardware.
Connecting data from the world’s smartphones could put a global supercomputer into all of our pockets. Tapping into that processing power would improve the real-time collection and analysis of data, but technical hurdles and privacy concerns linger. Scientists from SwissSenseSynergy, a project funded by the Swiss National Science Foundation (SNSF), have addressed issues and proposed new ways to collect and use such information.
UCHealth on Tuesday launched a virtual lab in which it will combine the expertise of physicians with the know-how of information analysts to determine ways for doctors to deliver the best care to patients.
The new Applied Decision Science Lab, part of the Aurora-based health system’s CARE Innovation Center, will focus on using big data, artificial intelligence and established best practices to create solutions to existing challenges.
The body work produced by this interdisciplinary effort can be broadly divided into three main topics that are often intertwined: i) Description: discovery of patterns and general statistical laws in human mobility data, as well as identifying relationships between human mobility, human activity and socio-economic interactions; ii) Modelling: develop generative models of individual and collective human mobility to explain the emergent patterns; iii) Prediction: forecast and nowcast human mobility and related phenomena. Deadline for submissions is December 31, 2017.
With the emerging number of experiences built using WebGL, and all the improvements made in the WebVR/AR space, it is critical to have efficient debugging tools. Whether you are just starting out or are already an experienced developer of 3D applications with WebGL, you likely know how tools can be important for productivity. Looking for such tools, you probably came across Patrick Cozzi’s blog post highlighting the most common ones. Unfortunately, many of these tools are no longer compatible with your project, due to missing WebGL2 features or extensions, such as draw buffers, 3D textures, and so on.
Creating a data-driven organization can require big changes in culture and process. Some nonprofits, like Crisis Text Line and Watsi, started off boldly by building their own data science teams. But for the many other organizations wondering how to best use data to advance their mission, we’ve found that one ingredient works better than all the software and tech that you can throw at a problem: collaboration.
As a nonprofit dedicated to applying data science for social good, DataKind has run more than 200 projects in collaboration with other nonprofits worldwide by connecting them to teams of volunteer data scientists. What do the most successful ones have in common? Strong collaborations on three levels: with data science experts, within the organization itself, and across the nonprofit sector as a whole.
On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.