Investigating the Reasoning Process of Large Language Model
Bachelor & Master Thesis
The introduction of the powerful ChatGPT o1 model kicks off the era of “reasoning models”. Large language models (LLMs) in this stream aim to perform inference based on step-by-step self-reasoning, bringing great relief to users for handcrafting their prompt template and designing the chain-of-thought (CoT) instructions. The recently announced DeepSeek R1 becomes a milestone in the history of LLM as it is the first-ever open-sourced reasoning model, and therefore allows us to investigate the reasoning process. Especially, as shared by many users, the reasoning models are found to have “self-reflection capability”, e.g., “Aha”, “Wait”, “However”, when they solve logical and mathematic problems. These phenomena raise our curiosity about how can we carry out a more comprehensive assessment and evaluation of LLMs especially when their inference result in a wrong answer. Simply put, we want to study “how reason models reason.” To move even one step further, a more in-depth question awaits us to explore, i.e., can human beings learn from the LLMs' self-reflection when solving mathematical and logical problems that are widely known to be hard and unsolvable?
In this project, the students are supposed to conduct a comprehensive and large-scale assessment of the reasoning process of LLMs and investigate the quality, effectiveness, and robustness of the reasoning.
This project could be research intensive and is designed to be publication-oriented. Students with strong motivation to perform academic research are preferred.
If you have an interest, please follow the instructions on https://www.cs.cit.tum.de/en/seai/open-theses/ and lodge your application by sending an email to application(at)seai.cit.tum.de. If you have any questions or ideas, please feel free to contact Mark Meng (Dr.) via mark.meng(at)tum.de.