Data Science newsletter – June 20, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 20, 2017

GROUP CURATION: N/A

Data Science News

University Data Science News

Publisher PLOS ONE is serious about reproducibility, wrote a little ditty about checklists for reporting guidelines, and has a new partnership with protocols.io to “enable a direct link of published work with the detailed laboratory protocols used to obtain the results”. Does your journal use reporting checklists? It’s still not all that common.

Columbia University launched the Collaboratory as its attempt to teach students and faculty how to employ data science. The Collaboratory offers 14 data science courses for students. Researchers can make appointments with data scientists at the Collaboratory to improve their computational research in situ, as it were.

University of California – San Diego will offer a new undergraduate major and minor in data science. In what appears to me to be a display of academic politics, “[t]he minor will be open to all students excluding Jacobs School of Engineering majors and Math-Computer Science majors.”

Harvard is starting a new MS in Data Science this fall as part of the new Harvard Data Science Initiative.

We are all special snowflakes! It has been confirmed by science! Computational biologists at the Technical University of Munich have shown through simulations that our proteomics contain more variation than previously thought. According to Professor Burkhard Rost, “None of these effects might be detected in everyday life,” says Prof. Rost, under some conditions these variations “could become significant, for example when we are given a certain drug or are exposed to a certain influence for the first time,” even if the differences aren’t impactful on an everyday basis. I think that is a sound explanation of how to apply the ‘special snowflake’ identity: not everyday, only when on the rare occasion when it sets an individual apart.

Is it possible that there is a unified model of data science already?? A team at Google Mountain View and Google Research published, “One Model to Rule Them All”, claiming that they have solved image recognition, speech recognition, and speech translation. In one model. Probably want to at least skim the abstract.

Susan Athey has a new paper out in which she discovered that students will give away their friends’ email addresses for free pizza. This is what I’d call a failure of the (hungry) human in the privacy protection loop.

Stanford is anchoring themselves in the data science for good space with the Open Policing Project and fostering partnerships with non-profits too small to house their own data science team.

Quid has hairball network graphs and stacked histograms that attempt to display which areas of machine learning are generating the most publications. Spoiler: text analysis and natural language processing are in the top spot.

University of North Carolina and Washington University researchers have found a way to predict which infants will develop autism…but since there currently is no treatment available for infants who might develop autism, it’s unclear how and for whom to deploy this technology. Ethical dilemmas abound.

Kenny Jones and Derrick Bonafilia, undergrads at Williams College, used a Generative Adversarial Network (GAN) to generate art. They have both landed jobs at Facebook starting this fall.

Klara Nahrstedt of the University of Illinois has built an amazing piece of data infrastructure for science. Awkwardly named (oh, academia, we love you), 4CeeD “connects microscopes and other scientific instruments to a cloud infrastructure…with easy drag-and-drop uploading…advanced data management, annotation, and analytics capabilities…[and] semantic understanding.” You don’t have to take my word for how excellent this project is. It won Best Paper at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

One of the coolest areas of data science research that I hardly ever mention (note to self: do better) is precision agriculture. This week researchers in Georgia collected 36 million data points from a peanut field to create a 4D map. What for? To make decisions about “irrigation, pest control, harvesting, crop rotations” and that’s just the obvious applications. In the Indus Valley there are more rice paddies than peanut fields, but there is a similar desire to better utilize irrigation resources. New satellite mapping, not quite as advanced as the 4D peanut field, has enabled farmers to ensure scarce groundwater is used efficiently.

You shall be rewarded for reading this far by the hardware breakthrough of the week. Nicholas Harris and Yichen Shen of MIT have a functioning transistor-free silicon chip that uses beams of light in a way that mimics neurons. Science writer Matthew Hutson explains, “Optical computation is efficient because once light rays are generated, they travel and interact on their own. You can guide them—without energy—using regular glass lenses, whereas transistors require electricity to operate.” This is a huge step towards cheap, fast, efficient implementation of AI everywhere. Imagine what integration with IoT could do. Ethicists, please help us all by asking what could go wrong in that future world.

Want a better way to search for AI and data science tools? I know you do. I have long wanted to build a searchable database of all the tools, tutorials, and packages we feature in the newsletter. Last week Allen AI announced they would join Microsoft, Google, Baidu, and the Gates Foundation on Open Academic Search. Open Academic Search helps “researchers by having the companies, institutes and nonprofits involved make their AI and analysis tools open-source, or freely available to other groups to use and tweak.” Presumably with Google and Microsoft involved, there’s also going to be a good search feature.

Ethics And Creativity In Artificial Intelligence: An Interview With Mark Riedl

Huffington Post, Ariel Conn

from June 01, 2017

If future artificial intelligence systems are to interact with us effectively, Mark Riedl believes we need to teach them “common sense.” I interviewed Riedl to discuss how AIs can use stories and creativity to understand and exhibit culture and ethics, while also gaining “common sense reasoning.” We also discuss the “big red button” problem with AI safety, the process of teaching rationalization to AIs, and computational creativity. Riedl is an associate professor at the Georgia Tech School of Interactive Computing, where his recent work focuses on human-AI interaction and how humans and AI systems can understand each other.

The Collaboratory Creates a Platform for Pedagogical Innovation Across Columbia

Columbia University, Columbia Entrepreneurship

from June 02, 2017

“There is a transformation in peoples’ mindsets that changes the way they operate,” said Adam Cannon, Senior Lecturer in Machine Learning at Columbia, at a recent Collaboratory debrief.

Cannon was referring to the impact of innovative curricula that weaves data and computational science into courses taught in all disciplines across Columbia.

The Collaboratory, jointly founded by Columbia Entrepreneurship and the Data Science Institute, now has 14 new courses that teach data and computational science in the context of disciplines ranging from urban planning and advertising to humanities and storytelling.

We are much more unique than assumed

Technical University of Munich

from June 19, 2017

Every human being has a unique DNA “fingerprint”. In other words, the genetic material of any two individuals can be clearly distinguished. Computational biologists at the Technical University of Munich (TUM) have now determined that the impact of these variations has been greatly underestimated. The new insights could importantly impact advances in personalized medicine.

[1706.05137] One Model To Learn Them All

arXiv, Computer Science > Learning; Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

from June 16, 2017

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

Government Data Science News

Are you a Republican? Your data – including your home address and birth date – may be out there in the scummy wilds of the internet. One of the GOP’s campaign marketing firms, Deep Root, had an oops-leak resulting from a failure to lock the front door with a password. Even if you are merely a potential Republican, whatever that means, you may have been included in the 198 million records leaked.

There’s a Kaggle contest to improve the Department of Homeland Security’s threat recognition ability.

Fields Medalist Cedric Villani, has been elected to the French National Assembly. Villani wears a silk ascot, a large spider brooch, and has a TED Talk on the sexiness of math which may be the single most unique campaign messages of our time.

Kepler found ten exoplanets among 219 planet candidates. Also, NASA is running low on orbiters to relay deep space exploration data.

NIH is proud to announce its Ten Advances in Data Science at National Library of Medicine.

Reporting from Reuters found rampant lead poisoning in California water supplies. The larger fear is that government might not be able to collect enough data to document the scope of the problem across the U.S.

Microsoft Pix can now turn your iPhone photos into art, thanks to artificial intelligence

TechCrunch, Sarah Perez

from June 15, 2017

Microsoft is rolling out an update to its AI-powered photo editing app, Microsoft Pix, that aims to give Prisma and others like it some new competition. While the app was originally designed to enhance your iPhone photos by tweaking things like color, exposure and other variables, the newly updated Microsoft Pix will now let you have a little more fun with your photos, too – this time, by turning them into art.

Similar to Prisma, the new app introduces a feature called Pix Styles, which allows you to transform your photos into works of art, and use other effects. For example, one effect will make the picture look like it’s on fire. These are not photo filters, to be clear – the styles actually transfer texture, pattern and tones to the photo, explains Microsoft.

Vertical AI Startups: Solving Industry-specific Problems by Combining AI and Subject Matter Expertise

Bradford Cross

from June 14, 2017

Low level task-based AI gets commoditized quickly and more general AI is decades off. In the meanwhile, will new AI startups succeed or will the value accrue to Google, Facebook, and Amazon?

While most of the machine learning talent works in big tech companies, massive and timely problems are lurking in every major industry outside tech.

UCSD to Offer Data Science Major and Minor Fall 2017

University of California-San Diego, THE TRITON, Mo Elew

from June 07, 2017

The Computer Science and Engineering (CSE) department held a town hall in Ledden Auditorium last Wednesday to discuss the newly established Data Science undergraduate program. The town hall aimed to answer questions about the program’s requirements and philosophy.

The Data Science program, open to all students starting September 25, 2017, is a new interdisciplinary major and minor administered by the CSE department and co-sponsored by Cognitive Science, and Mathematics departments. The new major, along with the minor, is expected to add 200 students per year and then grow to around 1000 students as it rolls out over the next four years. The minor will be open to all students excluding Jacobs School of Engineering majors and Math-Computer Science majors.

Pizza proffer punctures privacy protection, prompts pals’ perfidy

The Register, Thomas Claburn

from June 13, 2017

Researchers from the Massachusetts Institute of Technology and Stanford University have found that people say they want privacy but make choices suggesting the opposite, and can be easily manipulated through interface design, reassuring statements, and pizza.

In “Digital Privacy Paradox: Small Money, Small Costs, Small Talk,” a paper published Monday through the National Bureau of Economic Research, authors Susan Athey, Christian Catalini, and Catherine Tucker explore a phenomenon that has been widely observed: The disconnect between what people say about privacy and what they do.

It’s a discrepancy that calls into question the validity of notice and consent, the foundation of privacy rules.

Apple working with start-up Health Gorilla on iPhone EHR plan

CNBC, Christina Farr

from June 19, 2017

Apple is working on a secret plan to turn the iPhone into a personal hub for all your medical information, CNBC reported last week.

But it isn’t doing it alone. The company’s health team has been working with a tiny start-up called Health Gorilla, according to two people familiar with the initiative.

Sources said that Health Gorilla is specifically working with Apple to add diagnostic data to the iPhone, including blood work, by integrating with hospitals, lab-testing companies such as Quest and LabCorp and imaging centers.

Bosch to Build $1.1 Billion Chip Plant for Self-Driving Cars

Bloomberg Technology, Elisabeth Behrmann

from June 19, 2017

Auto supplier Robert Bosch GmbH will build a 1 billion-euro ($1.1 billion) semiconductor plant, the biggest single investment in its history, as the maker of brakes and engines prepares for a surge in demand for components used in self-driving vehicles.

The factory in Dresden, Germany will start producing chips needed for autonomous vehicles, smart homes and Internet-linked city infrastructure in 2021, the world’s biggest car-parts supplier said Monday in a statement. It’ll employ as many as 700 workers once construction is complete in 2019.

Smarter use of mobile data

Swiss National Science Foundation

from June 12, 2017

The data constantly collected and reported by smartphones can find numerous applications. An SNSF-funded project devoted to crowdsensing has found ways to improve privacy and localisation accuracy as well as reduce the impact on hardware.

Connecting data from the world’s smartphones could put a global supercomputer into all of our pockets. Tapping into that processing power would improve the real-time collection and analysis of data, but technical hurdles and privacy concerns linger. Scientists from SwissSenseSynergy, a project funded by the Swiss National Science Foundation (SNSF), have addressed issues and proposed new ways to collect and use such information.

UCHealth using artificial intelligence to improve physician decision-making

Denver Business Journal, Ed Sealover

from June 14, 2017

UCHealth on Tuesday launched a virtual lab in which it will combine the expertise of physicians with the know-how of information analysts to determine ways for doctors to deliver the best care to patients.

The new Applied Decision Science Lab, part of the Aurora-based health system’s CARE Innovation Center, will focus on using big data, artificial intelligence and established best practices to create solutions to existing challenges.

Events

netsci2018

Network Science Society

from June 11, 2018

Paris, France June 11-15, 2018 [save the date]

Deadlines

EPJ Data Science | Call for papers – Individual and Collective Human Mobility: Description, Modelling, Prediction

The body work produced by this interdisciplinary effort can be broadly divided into three main topics that are often intertwined: i) Description: discovery of patterns and general statistical laws in human mobility data, as well as identifying relationships between human mobility, human activity and socio-economic interactions; ii) Modelling: develop generative models of individual and collective human mobility to explain the emergent patterns; iii) Prediction: forecast and nowcast human mobility and related phenomena. Deadline for submissions is December 31, 2017.

Tools & Resources

Real-Time Rendering · Debugging WebGL with SpectorJS

Real-Time Rendering, Sebastien Vandenberghe

from June 14, 2017

With the emerging number of experiences built using WebGL, and all the improvements made in the WebVR/AR space, it is critical to have efficient debugging tools. Whether you are just starting out or are already an experienced developer of 3D applications with WebGL, you likely know how tools can be important for productivity. Looking for such tools, you probably came across Patrick Cozzi’s blog post highlighting the most common ones. Unfortunately, many of these tools are no longer compatible with your project, due to missing WebGL2 features or extensions, such as draw buffers, 3D textures, and so on.

Using Collaboration to Harness Big Data for Social Good

Stanford Social Innovation Review, Jake Porway

from June 14, 2017

Creating a data-driven organization can require big changes in culture and process. Some nonprofits, like Crisis Text Line and Watsi, started off boldly by building their own data science teams. But for the many other organizations wondering how to best use data to advance their mission, we’ve found that one ingredient works better than all the software and tech that you can throw at a problem: collaboration.

As a nonprofit dedicated to applying data science for social good, DataKind has run more than 200 projects in collaboration with other nonprofits worldwide by connecting them to teams of volunteer data scientists. What do the most successful ones have in common? Strong collaborations on three levels: with data science experts, within the organization itself, and across the nonprofit sector as a whole.

The Stanford Open Policing Project

Sharad Goel and Cheryl Phillips

from June 19, 2017

On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.

Careers

Postdocs

Postdoctoral Researcher

University of Washington,, Labs of B. Brunton and R. Rao; Seattle, WA

Full-time positions outside academia

Partner Marketing Manager

Factual; Los Angeles, CA

Senior Computational Linguist

United Nations; New York, NY

Sports.BradStenger.com

Data Science newsletter – June 20, 2017

Leave a Comment Cancel reply