This is an overview of the most prominent scientific AI projects currently being carried out within the framework of BayernKI:
Boosting Abstract Screening: Combining LLMs and Neural Re-Rankers

Project Manager: Christian Jaumann
Principal Investigators: Prof. Dr. Annemarie Friedrich
Affiliation: University Augsburg
HPC Platform used: NHR@FAU: Alex GPU cluster
The scientific literature is growing rapidly on a daily basis, making it hard to keep track of the state-of-the-art. Systematic literature reviews (SLRs) aim to identify and evaluate all relevant literature on a topic. After retrieving a set of candidate papers, the abstract screening phase determines initial relevance according to a set of criteria. Large Language Models (LLMs) provide strong text interpretation skills, but are not naturally suited for creating ranked lists of large sets of documents due to scalability issues. We combine LLM-based relevance judgments with neural dense re-rankers for abstract screening, outperforming existing ranking systems on the task by a large margin.
Efficient Architectures for Meta-Reinforcement Learning

Principal Investigator: Prof. Dr. Josif Grabocka
Affiliation: University of Technology Nuremberg
HPC Platform Used: Alex GPU cluster (NHR@FAU)
Even though neural networks have been long deployed in applications involving tabular data, still existing neural architectures are not explainable by design. In this project, we developed a new class of interpretable neural networks for tabular data that are both deep and linear at the same time (i.e. mesomorphic). We optimize deep hypernetworks to generate explainable linear models on a per-instance basis. As a result, our models retain the accuracy of black-box deep networks while offering free-lunch explainability for tabular data by design. Through extensive experiments, we demonstrate that our explainable deep networks have comparable performance to state-of-the-art classifiers on tabular data and outperform current existing methods that are explainable by design.
FMBoost: Boosting Latent Diffusion with Flow Matching

Project Manager: Johannes Schusterbauer-Fischer
Principal Investigator: Prof. Dr. Björn Ommer
Affiliation: Ludwig Maximilian University Munich (LMU Munich)
HPC Platform Used: Alex GPU cluster (NHR@FAU)
Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate that introducing flow matching between a frozen diffusion model and a convolutional decoder enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space. These latents are then projected to high-resolution images by the subsequent convolutional decoder of the latent diffusion approach. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 1K and 2K resolution with minimal computational cost. Importantly, our approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.
Interpretable Neural Networks for Tabular Data

Principal Investigator: Prof. Dr. Josif Grabocka
Affiliation: University of Technology Nuremberg
HPC Platform Used: Alex GPU cluster (NHR@FAU)
Even though neural networks have been long deployed in applications involving tabular data, still existing neural architectures are not explainable by design. In this project, we developed a new class of interpretable neural networks for tabular data that are both deep and linear at the same time (i.e. mesomorphic). We optimize deep hypernetworks to generate explainable linear models on a per-instance basis. As a result, our models retain the accuracy of black-box deep networks while offering free-lunch explainability for tabular data by design. Through extensive experiments, we demonstrate that our explainable deep networks have comparable performance to state-of-the-art classifiers on tabular data and outperform current existing methods that are explainable by design.
LLäMmlein 1B & 120M

Project Manager: Jan Pfister, Julia Wunderle
Principal Investigators: Prof. Dr. Andreas Hotho
Affiliation: Julius-Maximilians University Of Würzburg (JMU)
HPC Platform used: Alex GPU cluster (NHR@FAU)
We create two German-only decoder models, LLäMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models’ learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LLäMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models’ quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable insights into resource allocation for future model development.
nxt AIM

Project Managers: Katharina Winter & Erik Schütz
Principal Investigator: Prof. Dr. Fabian Flohr
Affiliation: University of Applied Sciences Munich (HM)
HPC Platform Used: Alex GPU cluster (NHR@FAU)
Generative models are increasingly used in Autonomous Driving applications to improve decision-making, safety, and generalization to unseen scenarios and corner cases. At the Intelligent Vehicles Lab, we focus on Encoder-Decoder architectures and Large Language Models to enhance prediction and planning in urban driving environments and simulations. Powered by NHR, our research develops solutions that help autonomous systems navigate complex, real-world situations while making Autonomous Driving more explainable, ensuring trustworthiness and interpretability. As part of the nxtAIM project, a nationwide initiative focusing on Generative AI for Autonomous Driving, we contribute to shaping the future of AI in Germany.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Project Manager: Tanveer Hannan
Principal Investigator: Dr. Thomas Seidl
Affiliation: Ludwig Maximilian University of Munich (LMU Munich)
HPC Platform used: Alex GPU cluster (NHR@FAU)
Large Language Models (LLMs) are continuously improving in handling text, but Vision-Language Models (VLMs) struggle with long videos, particularly for temporal reasoning. Existing VLMs downsample frames and lose temporal details, limiting temporal event comprehension. We introduce ReVisionLLM, a recursive vision-language model inspired by human search strategies. Our model identifies broad segments of interest and progres-sively refines focus to pinpoint precise temporal boundaries. This approach enables seamless handling of video durations ranging from minutes to hours. Additionally, our hierarchical training strategy starts with short clips to capture distinct events before scaling to longer videos. ReVisionLLM is the first VLM designed for hour-long video temporal event localization.
SuperGLEBer – The first comprehensive German-language benchmark for LLMs

Project Manager: Jan Pfister
Principal Investigators: Prof. Dr. Andreas Hotho
Affiliation: Julius-Maximilians University Würzburg (JMU)
HPC Platform used: Alex GPU cluster (NHR@FAU)
Large Language Models (LLMs) are continuously being developed and improved, and there is no shortage of benchmarks that quantify how well they work; LLM benchmarking is indeed a long-standing practice, especially in the NLP research community. However, the majority of these benchmarks are not designed for German-language LLMs. We assembled a broad Natural Language Understanding benchmark suite for the German language and evaluated a wide array of existing German-capable models. This allows us to comprehensively chart the landscape of German LLMs.
- TBA