Privacy Preservation & Compliance of Large Language Models
Bachelor & Master Thesis
Nowadays people are heavily rely on using LLMs to perform diverse tasks ranging from writing improvement to code generation. One of the biggest challenges that hinder the industry adoptions is the uncertainty of LLMs in protecting the users' privacy and confidential data. For example, when users resort to LLM to improve a cover letter for job seeking, it is inevitable to include some sensitive personal information in the prompt, or in the documents that served for retrival purposes like Retrieval Augmented Generation (RAG). Will these sensitive information be saved or even uploaded to the backend of LLMs? Will these privacy will used for future model training or improvement? How to address users' concern? Is there a way to secure the privacy or confidential data from being accessed or utilized by LLMs without compromising the performance?
By selecting this thesis, students are expected to comprehensively (1) investigate the privacy mechanism of mainstream LLMs and (2) assess their privacy preservation on a large scale. After that, students can choose either (3a) a research-focused option to propse a novel privacy-preserving framework or mitigating technique for mainstream LLMs (e.g., a tailored homomorphic encryption technique), or (3b) a indurstry-oriented option to design and implement a toolkit or plugin that maximize privacy assurance in the conventional use scenarios of LLMs (e.g., automatically obfuscate or replace the sensitive data before passing them to the prompt, or applying off-the-shelf encryption techniques to the user input).