Our paper "A Preliminary Study on Using Text- and Image-Based Machine Learning to Predict Software Maintainability" was awarded the Best Research Paper Award at the Software Quality Days in Vienna. It describes our novel idea to use screenshots of source code to predict its understandability, readability, and complexity. Furthermore, it investigates the usage of classification models, which have orginally been developed to analyze texts in natural language. This research was conducted by Markus Schnappinger (TUM), Simon Zachau (formerly TUM), Arnaud Fietzke (itestra GmbH), and Alexander Pretschner (TUM).
Machine learning has emerged as a useful tool to aid software quality control. It can support identifying problematic code snippets or predicting maintenance efforts. The majority of these frameworks rely on code metrics as input.
However, evidence suggests great potential for text- and image-based approaches to predict code quality as well. Using a manually labeled dataset, this preliminary study examines the use of five text- and two image-based algorithms to predict the readability, understandability, and complexity of source code. While the overall performance can still be improved, we find Support Vector Machines (SVM) outperform sophisticated text transformer models and image-based neural networks. Furthermore, text-based SVMs tend to perform well on predicting readability and understandability of code, while image-based SVMs can predict code complexity more accurately.
Our study both shows the potential of text- and image-based algorithms for software quality prediction and outlines their weaknesses as a starting point for further research.
FInd the publication here: https://dx.doi.org/10.1007/978-3-031-04115-0_4.