Security of Large Language Models
Bachelor & Master Thesis
Large language models (LLMs) are usually referred to as transformer-based machine learning models that contain up to billions of parameters and a well-trained tokenizer in understanding and generating natural language text. The security of LLMs is vital to whether the LLM can correctly understand users’ input as well as produce desired and appropriate output in natural language. Unfortunately, LLMs are found vulnerable to many streams of attack or manipulation, threatening their further adoption in the industry and people’s daily lives. Through this thesis, students are offered an opportunity to immerse in the world of LLMs (e.g., BERT, ChatGPT, etc) or even a wider extend of foundation models, and explore a specific security-related area to dive in.
This is an umbrella topic. Interested students are welcomed to discussed with the team for more detailed and specific topics or directions for their thesis project.
Topics:
This thesis project allows the students to explore a wide range of security, trustworthy, and privacy topics for LLMs or foundation models (models for speech, image, code, etc.). The topics includes but not be limited to:
* Jailbreaking
* Backdoor attack
* Poisoning attack
* Glitch tokens
* Privacy leakage
* Dataset leakage
You may also refer to an online repository for a comprehensive but incomplete list of relevant literature: github.com/corca-ai/awesome-llm-security
Notes:
This is by-default a research focused thesis. Enrolled students, particularly for master students, should comprehensively investigate the state-of-the-art techniques in a specific area and aims to propose a novel approach that address specific research question(s). For example, how to effectively detect and fix glitch tokens [1], or how to effectively test the LLM's hallucinations [2].
For undergraduate applicant, you can also consider emipirical study and evaluation/benchmarking options. The former request the enrolled students to reproduce the state-of-the-art techniques (e.g., jailbreaking, poisoning attack, etc.) and carry out an indepth discussion in their thesis report. The latter option allows students to either conduct a large-scale comparative evaluation of existing techniques (e.g., a survey of attacks of LLMs [3]), or creating a dataset tailored for a specific objective to facilitate future research (e.g., a dataset for effective hate speech detection [4]).
Reference:
[1] Zhang, Zhibo, Wuxia Bai, Yuxi Li, Mark Huasong Meng, Kailong Wang, Ling Shi, Li Li, Jun Wang, and Haoyu Wang. "Glitchprober: Advancing effective detection and mitigation of glitch tokens in large language models." In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pp. 643-655. 2024.
[2] Li, Ningke, Yuekang Li, Yi Liu, Ling Shi, Kailong Wang, and Haoyu Wang. "Drowzee: Metamorphic testing for fact-conflicting hallucination detection in large language models." Proceedings of the ACM on Programming Languages 8, no. OOPSLA2 (2024): 1843-1872.
[3] Chowdhury, Arijit Ghosh, Md Mofijul Islam, Vaibhav Kumar, Faysal Hossain Shezan, Vinija Jain, and Aman Chadha. "Breaking down the defenses: A comparative survey of attacks on large language models." arXiv preprint arXiv:2403.04786 (2024).
[4] Guo, Keyan, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vishwamitra, and Hongxin Hu. "An investigation of large language models for real-world hate speech detection." In 2023 International Conference on Machine Learning and Applications (ICMLA), pp. 1568-1573. IEEE, 2023.