Code Vulnerability Detection with Large Language Models Based on Konwledge Graph
Bachelor & Master Thesis
Despite the impressive capabilities of large language models (LLMs) in software engineering tasks, their application in vulnerability detection remains limited by a critical issue: hallucination. LLMs often generate plausible-sounding but factually incorrect predictions, especially when reasoning about complex program semantics or subtle security flaws. This poses significant risks in safety-critical systems, where even minor misjudgments can lead to severe consequences. To address this limitation, we propose integrating knowledge graphs into the vulnerability detection pipeline. Knowledge graphs offer a structured and explicit representation of code entities, data flows, and known vulnerability patterns, providing LLMs with an external source of factual grounding. By aligning LLM predictions with the relationships and constraints encoded in the knowledge graph, we aim to correct hallucinated outputs and enhance the reliability of LLM-based vulnerability analysis. This hybrid approach not only mitigates hallucinations but also enables explainable and context-aware detection, marking a promising step toward trustworthy AI-assisted software security.
In this project, students will be required to investigate and summarize existing literature on using large language models for code vulnerability detection. They will also learn about cutting-edge technologies and apply these techniques to code vulnerability detection.
Required knowledge:
- Strong programming background, especially proficient in python.
- Familiar with static analysis techniques.