We’re thrilled to announce that ten of our papers have been accepted to CVPR 2025! Congrats to all the authors!
- MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments
Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, Nassir Navab
MM-OR is a realistic and large-scale multimodal spatiotemporal operating room dataset, designed to enhance surgical assistance and patient safety in operating rooms by capturing comprehensive, realistic scenes with RGB-D data, audio, robotic logs, and more, complete with detailed annotations like scene graphs and panoptic segmentations. Paired with MM2SG, the first multimodal vision-language model for scene graph generation, our work establishes a new benchmark for holistic OR understanding, and open the path towards multimodal scene analysis in complex, high-stakes environments. - GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter Yu, Nassir Navab, Benjamin Busam
We propose a Global Context Enhancement (GCE) approach that integrates global context with both geometric and semantic cues for category-level object pose estimation. The introduced Semantic Shape Reconstruction module addresses partially observed input by reconstructing both object geometry and semantics through deforming categorical prototypes using deep linear shape model. - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
Yousef Yeganeh, Ioannis Charisiadis, Marta Hasny, Martin Hartenberger, Björn Ommer, Nassir Navab, Azade Farshad, Ehsan Adeli
We propose Latent Drifting and a generalized reformulation for diffusion models, to adapt general purpose diffusion models for counterfactual image generation in medical imaging. - ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari, Mahdi Saleh - LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
EnisSimsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag - One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari - UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu*, Gu Wang*, Ruida Zhang, Chenyangguang Zhang, Federico Tombari, and Xiangyang Ji - Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segu, Pier Luigi Dovesi, Jussi Karlgren, Daniel Cremers, Federico Tombari, Matteo Poggi - Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Ace Kulshrestha, Federico Tombari - RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi, Narunas Vaskevicius, Pedro Hermosilla, Federico Tombari, Timo Ropinski