Our group will present 9 papers at ICLR 2025. Congratulations to the entire team and all co-authors. More details on the papers will follow soon.
- Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
(Yan Scholten, Stephan Günnemann)
Conformal prediction is a powerful method for providing model-agnostic and distribution-free uncertainty quantification by guaranteeing that prediction sets include the correct answer with high confidence. But what if the data itself is compromised? Poisoning attacks, where adversaries tamper with training and calibration data, can disrupt these guarantees, making predictions unreliable. As a response, we introduce Reliable Prediction Sets (RPS) -- a novel method designed to withstand such attacks. By introducing smoothed score functions and majority-voting prediction sets calibrated on distinct data subsets, RPS ensures reliability even under malicious poisoning manipulations. Tested on image classification tasks, we find that this method delivers trustworthy predictions. With RPS, uncertainty quantification takes a critical step forward, becoming more reliable in the face of adversarial threats.
- Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space
(Mohamed Amine Ketata*, Nicholas Gao*, Johanna Sommer*, Tom Wollschläger, Stephan Günnemann)
Recent diffusion models for 2D molecule generation represent molecules as graphs and apply discrete diffusion to generate atoms and chemical bonds. However, these methods have struggled to match the success seen in other fields, such as image generation, due to the immense size of the molecular space and the challenges of discrete optimization. In this paper, we overcome these limitations by introducing Synthetic Coordinate Embedding (SyCo), a novel framework that maps 2D molecular graphs to continuous 3D point clouds and learns a continuous diffusion model on this latent space. By leveraging EDM, a popular 3D molecule diffusion model, our approach outperforms the best discrete diffusion models by over 26% in unconditional generation and achieves up to 3.9x improvement in conditional generation on the ZINC250K benchmark.
- MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
(Leon Hetzel*, Johanna Sommer*, Bastian Rieck, Fabian Theis, Stephan Günnemann)
Current molecular generative models rely on predefined motif vocabularies, limiting their ability to explore novel structural compositions. We introduce MAGNet, a hierarchical generative model that separates structural scaffolds from atomic features, enabling the free learning of motifs. By factorising the molecular data distribution, MAGNet constructs a scaffold vocabulary independent of motif constraints and learns atom and bond assignments dynamically. This mechanism significantly improves structural diversity while maintaining generative performance. Our evaluation of existing motif-based models reveals their inherent limitations in reconstructing and generating molecules with complex, unseen structures, emphasising the need for a more flexible generative approach. MAGNet's approach challenges the traditional reliance on motif priors and opens the door to a broader exploration of chemical space, beyond what predefined fragment sets allow.
- Learning Equivariant Non-Local Electron Density Functionals
(Nicholas Gao*, Eike Eberhard*, Stephan Günnemann)
Contemporary quantum chemistry is built upon density functional theory (DFT). But, DFT's accuracy hinges on the approximation of non-local contributions to the exchange-correlation (XC) functional. To date, machine-learned and human-designed approximations suffer from insufficient accuracy, limited scalability, or dependence on costly reference data. To address these issues, we introduce Equivariant Graph Exchange Correlation (EG-XC), a novel non-local XC functional based on equivariant graph neural networks. In our empirical evaluation, we find EG-XC to accurately reconstruct `gold-standard' CCSD(T) energies on MD17. On out-of-distribution conformations, EG-XC reduces errors by up half. Remarkably, EG-XC excels in data efficiency and molecular size extrapolation, matching force fields trained on 5 times more and larger molecules. On identical training sets, EG-XC halfs the error.
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
(Yan Scholten, Stephan Günnemann, Leo Schwinn)
How well do we really understand the capabilities of Large Language Models (LLMs)? For years, evaluations have relied on rigid, deterministic methods -- essentially judging a model based on a single, fixed response. But this narrow approach fails to capture the full range of what these models can do, especially in critical areas like unlearning sensitive data or ensuring ethical behavior. Furthermore, current evaluation practices often diverge from practical LLM applications, such as conversational assistants. In these scenarios, probabilistic sampling results in non-deterministic outputs, meaning the same prompt can yield diverse responses. Now, we introduce a novel probabilistic evaluation framework that looks at the entire output distribution of a model, offering a far more accurate picture of its capabilities. Our findings are striking: deterministic methods often give a false sense of success, especially in unlearning, where sensitive data may still lurk beneath the surface. Alongside this new evaluation method, we also provide a novel loss for LLMs that significantly improves results in probabilistic settings. This shift from point estimates to probabilistic evaluations is a major step forward for ensuring LLMs are safer, smarter, and more reliable.
- Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance
(Dominik Fuchsgruber, Tim Postuvan, Stephan Günnemann, Simon Geisler)
Many applications require modelling signals on the edges of a graph. These signals can have an inherent direction, like the water flow in a pipe network, or be undirected, for example, the pipe's diameter. Previous methods only model one of these modalities and can not represent the direction of an edge itself irrespective of associated signals. We establish a formal framework for direction-aware GNNs that can handle both signal types and develop a new GNN: EIGN. It provably meets these criteria by composing novel edge-level graph shift operators. Our evaluation shows that EIGN outperforms prior work in edge-level problems, for example, by improving RMSE on flow simulation tasks by up to 43.5%.
- Unlocking Point Processes through Point Set Diffusion
(David Lüdke, Enric Rabasseda Raventós, Marcel Kollovieh, Stephan Günnemann)
Point processes model the distribution of random point sets in mathematical spaces, such as spatial and temporal domains. Existing methods are predominantly constrained by reliance on intensity functions, creating an efficiency-flexibility trade-off. We introduce Point Set Diffusion, a diffusion-based latent variable model that can represent arbitrary point processes on general metric spaces without relying on the intensity function. By directly learning to stochastically interpolate between noise and data point sets, our approach enables efficient, parallel sampling and flexible generation for complex conditional tasks.
- Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
(Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, Stephan Günnemann)
Recent advances in diffusion models have opened new directions for time series modeling, achieving state-of-the-art performance in both forecasting and synthesis. However, the mismatch between data distributions and simple, fixed priors remains a key challenge. We introduce TSFlow, a generative model that leverages conditional flow matching (CFM) alongside Gaussian processes, optimal transport paths, and data-dependent priors to better align the prior with the temporal structure of the data. This design enables high-quality unconditional generation and strong forecasting across diverse real-world datasets. Experimental results show that TSFlow outperforms competing methods on various benchmarks, providing a framework for synthesizing and forecasting time series.
- Exact Certification of (Graph) Neural Networks Against Label Poisoning
(Mahalakshmi Sabanayagam*, Lukas Gosch*, Stephan Günnemann, Debarghya Ghoshdastidar)
Machine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: we establish hierarchies of GNNs on different benchmark graphs; quantify the effect of architectural choices such as activations, depth and skip-connections; and surprisingly, uncover a novel phenomenon of the robustness plateauing for intermediate perturbation budgets across all investigated datasets and architectures. While we focus on GNNs, our certificates are applicable to sufficiently wide NNs in general through their NTK. Thus, our work presents the first exact certificate to a poisoning attack ever derived for neural networks, which could be of independent interest.
Furthermore, we have one paper at Knowledge and Information Systems:
- Evaluating the transferability of adversarial robustness to target domains
(Anna-Kathrin Kopetzki, Aleksandar Bojchevski, Stephan Günnemann)
Knowledge and Information Systems (KAIS), 2025.
Knowledge transfer is an effective method for learning, particularly useful when labeled data is limited or when training a model from scratch is too expensive. Most of the research on transfer learning focuses on achieving accurate models, overlooking the crucial aspect of adversarial robustness. However, ensuring robustness is vital, especially when applying transfer learning in safety-critical domains. We compare robustness of models obtained by different training and retraining procedures, including normal, adversarial, contrastive and Lipschitz constrained training variants. We employ adversarial attacks targeting two different transfer learning model outputs and analyze robustness with respect to both: (i) the latent representations and (ii) the predictions. Our results show that preserving adversarial robustness during transfer requires techniques that ensure robustness independent of the training data.