Multimodal Reasoning and Human-Computer Interaction for UI/Code Generation
Bachelor & Master Thesis
The emergence of large multimodal models (LMMs) has unlocked exciting opportunities in intelligent UI and code generation. However, challenges remain in enabling LMMs to understand complex development contexts, maintain code consistency, and support real-world design scenarios. Current methods often lack robustness in handling multimodal inputs or adapting to human workflows.
This project aims to advance human-AI collaboration in software development by combining multimodal reasoning with interaction design to meet real world development needs. We will explore how AI agents can generate UI and code based on multimodal inputs (e.g., sketches, screenshots, natural language), improve the interpretability of their decisions, and adapt to user feedback in iterative workflows.
Required knowledge:
- Proficiency in python development
- Experience in reading and reproducing LLM related projects
- Strong Interest in UI/code generation and multimodal systems
- A self-motivated and collaborative research mindset
What you will do may include:
- Multimodal approaches to UI and code generation
- Explainable AI systems with hierarchical knowledge structures
- Human-computer interaction paradigms for development workflows
- Evaluation frameworks and optimization methods for AI-generated workflows
Reference:
- DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models https://arxiv.org/pdf/2411.01606
- SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? https://arxiv.org/abs/2410.03859