Query Refinement for Addressing Hallucination Problems in Code Generation with Large Language Models
Bachelor & Master Thesis
While large language models currently possess powerful code generation capabilities, researchers have been troubled by the hallucination problem, which hinders further research applications. The hallucination problem refers to the situation where the model- generated content appears reasonable but is actually fabricated and does not meet user requirements. Existing research has also demonstrated widespread illusion problems in code generation by large language models [1]. These illusions may arise due to improper user query formats. Therefore, one approach to addressing the illusion problem is to reconstruct user queries to help large language models understand user needs correctly.
In this project, students will be tasked with exploring the hallucination problems in code generation by existing large language models and investigating effective methods to standardize user inputs to ensure the robustness of code generated by large language models. Through this process, students will gain comprehensive knowledge of the current development status and cutting-edge technologies in large language models.
Required knowledge:
- Strong programming background, especially proficient in python.
- Experience of training deep learning models with Pytorch.
Reference:
[1] Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817.