Bioinformatics

Our modern bioinformatic methods help to precisely identify genetic changes, uncover disease-relevant signalling pathways and develop personalized therapeutic approaches. Multi-omics analysis - the combined analysis of genome, transcriptome, proteome and metabolome data - opens up new opportunities to arrive at a diagnosis more quickly and in a more targeted manner, particularly in the case of rare diseases, where conventional diagnostics often reach their limits - and thus offer affected children and their families support at an early stage.

Our main research focus is the analysis of sequencing data of patients suffering from rare diseases like inflammatory bowel disease or severe congenital neutropenia. We have developed an accurate and reliable state-of-the-art pipeline to process the data generated by our High throughput sequencing facility.

Before determining genomic variation in patient sequencing data we perform a unique quality control procedure that ensures not only a high quality of the underlying sequencing but also consistent information on the patient. We are using the sequencing data to compute the patient’s relationship to other individuals in our cohort and to predict its blood group and its ethnic origin. We can use any identified contradiction to remove and solve sample mixups at an early stage without corrupting any downstream analysis.

In order to detect disease causing genomic variants, we developed a high quality workflow that goes beyond standard analysis. In contrast to sequential analysis of single patients, we increased the accuracy of the variant detection step by performing joint analyses of all collected data. We developed a sophisticated frequency filtering approach that removes non-pathogenic population-specific variants before generating candidate lists. We accelerated the evaluation of candidate lists by applying our tool SmartPhase to efficiently reduce the number of potential compound heterozygous variant pairs. We extended the focus on small variation by screening for deletions of whole exomes and are developing routines to identify more complex structural variations. We expanded the search space for pathogenic variants by also analyzing variants in 5’ untranslated regions that create upstream open-reading frames or disrupt the stop codons of existing upstream open-reading frames. We are investigating synergistic effects of multiple variants in the same patient by integrating multi-nucleotide variants in our analysis and developing a machine learning approach to identify complex disease causing variant patterns.

We operate a dedicated Linux-based network with two Proxmox clusters at the KUBUS facility, hosting numerous virtual machines and containers. Our high-performance servers, equipped with Nvidia A6000 GPUs, provide the necessary computing power for advanced bioinformatics workflows. With 1PB of storage, based on ZFS-powered TrueNAS SCALE and Proxmox, and network connections of up to 40 Gbit/s, this infrastructure is specifically designed to enable time-efficient genome and whole-genome analyses, supporting large-scale computational research.

Our computational environment is optimized for bioinformatics pipelines, utilizing Snakemake, Singularity, Docker, and Graph-based workflows to streamline and automate analyses. The infrastructure is designed for scalability and flexibility, allowing researchers to deploy customized environments efficiently.

In Großhadern, we maintain a large-scale Proxmox cluster with high-performance servers and an additional 2PB of ZFS-based storage. To ensure data integrity and disaster recovery, we maintain a structured backup system with regular offsite backups to the LRZ. The combination of high-speed networking, redundant storage solutions, and a modular software stack ensures reliable and scalable computing power for diverse research projects.

KNIME4NGS

Description: We present a toolbox to enable the analysis of NGS data using the graphical user interface of the KNIME workflow management platform.

Hastreiter, M., Jeske, T., Hoser, J., Kluge, M., Ahomaa, K., Friedl, M. S., ... & Küffner, R. (2017). KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis. Bioinformatics, 33(10), 1565-1567.

Link to article: https://academic.oup.com/bioinformatics/article/33/10/1565/2871246

Link to website: http://ibisngs.github.io/knime4ngs/index.html

DEUS

Description: We describe the implementation and results of an R package that is tailored for the analysis of small non-coding RNA sequencing data having several advantages in contrast to mapping based analysis approaches.

Jeske, T., Huypens, P., Stirm, L., Höckele, S., Wurmser, C. M., Böhm, A., ... & Hastreiter, M. (2019). DEUS: an R package for accurate small RNA profiling based on differential expression of unique sequences. Bioinformatics.

Link to article: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz495/5522007

Link to website: http://ibis.helmholtz-muenchen.de/deus

SmartPhase

Description: We present a modular Java-based phasing tool that enables targeted phasing of heterozygous variants tailored for the use in precision medicine.

Hager, P., Mewes, H.-W., Rohlfs, M., Klein, C. & Jeske, T. (2019). SmartPhase: accurate and fast phasing of potentially compound heterozygous variant pairs for genetic diagnosis of rare diseases. Submitted.

Link to website: http://ibis.helmholtz-muenchen.de/smartphase/

Comprehensiv...

Bioinformatics

Daniel Weiß

Rare disease genetics

KNIME4NGS

DEUS

SmartPhase

CCRC Hauner - Comprehensive Childhood Research Center