Stack Overflow Considered Helpful!
Deep Learning Security Nudges Towards Stronger Cryptography
Proceedings of the 28th USENIX Security Symposium 2019
Stack Overflow is the most popular discussion platform for software developers. However, recent research identified a large amount of insecure encryption code in production systems that has been inspired by examples given on Stack Overflow. By copying and pasting functional code, developers introduced exploitable software vulnerabilities into security-sensitive high-profile applications installed by millions of users every day. Proposed mitigations of this problem suffer from usability flaws and push developers to continue shopping for code examples on Stack Overflow once again. This motivates us to fight the proliferation of insecure code directly at the root before it even reaches the clipboard. By viewing Stack Overflow as a market, implementation of cryptography becomes a decision-making problem. In this context, our goal is to simplify the selection of helpful and secure examples. More specifically, we focus on supporting software developers in making better decisions on Stack Overflow by applying nudges, a concept borrowed from behavioral economics and psychology. This approach is motivated by one of our key findings: For 99.37% of insecure code examples on Stack Overflow, similar alternatives are available that serve the same use case and provide strong cryptography. Our system design that modifies Stack Overflow is based on several nudges that are controlled by a deep neural network. It learns a representation for cryptographic API usage patterns and classification of their security, achieving average AUC-ROC of 0.992. With a user study, we demonstrate that nudge-based security advice significantly helps tackling the most popular and error-prone cryptographic use cases in Android.
Demo
Overview
Our system design for security advice on Stack Overflow is based on the learn-to-nudge loop, which shows the interaction and interference of the community behavior, code classification models and proposed security nudges on Stack Overflow. We learn similarity-preserving code representations from large training data sets extracted from GitHub. By applying transfer learning, we retrain the representations for use case and security classification with very short training time. The resulting models allow us to design a nudge-based choice architecture: for each insecure code snippet on Stack Overflow, we recommend a similar but secure alternative.
Security Nudges
The goal of our system design is not to stop developers from reusing code from Stack Overflow in general. Instead, we intend to nudge them away from copy-and-pasting insecure examples, towards snippets that provide strong security. We apply popular nudges like warnings, recommendations, reminders, and secure defaults and translate them to security advice for software developers.
Deep Learning Code Patterns
Our deep learning framework learns usage patterns of cryptographic APIs by embedding them into a vector space. These patterns are moved closer together or far away from each other in the space, depending on their properties and the classification task to be solved. e.g., similarity, use case or security.
User Study
We performed a user study where participants had to implement the two most popular and error-prone cryptographic use cases in Android. Nudged participants significantly outperformed the control group in submitting secure implementations. We were able to show that our approach does not interfere with the primary goals of Android developers and the usability of Stack Overflow. The different treatments had no significant impact on the functionality of the solutions which was equally high in both groups.