Multimodal Reasoning and Human-Computer Interaction for UI/Code Generation

Bachelor & Master Thesis

The emergence of large multimodal models (LMMs) has unlocked exciting opportunities in intelligent UI and code generation. However, challenges remain in enabling LMMs to understand complex development contexts, maintain code consistency, and support real-world design scenarios. Current methods often lack robustness in handling multimodal inputs or adapting to human workflows.

This project aims to advance human-AI collaboration in software development by combining multimodal reasoning with interaction design to meet real world development needs. We will explore how AI agents can generate UI and code based on multimodal inputs (e.g., sketches, screenshots, natural language), improve the interpretability of their decisions, and adapt to user feedback in iterative workflows.

Required knowledge:

Proficiency in python development
Experience in reading and reproducing LLM related projects
Strong Interest in UI/code generation and multimodal systems
A self-motivated and collaborative research mindset

What you will do may include:

Multimodal approaches to UI and code generation
Explainable AI systems with hierarchical knowledge structures
Human-computer interaction paradigms for development workflows
Evaluation frameworks and optimization methods for AI-generated workflows

Reference:

DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models https://arxiv.org/pdf/2411.01606
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? https://arxiv.org/abs/2410.03859

To top

Chair of Software Engineering & AI

Prof. Chunyang Chen

Technische Universität München
TUM School of Computation, Information and Technology
Bildungscampus 2
74076 Heilbronn