Bioinformatics
Rare disease genetics
Our main research focus is the analysis of sequencing data of patients suffering from rare diseases like inflammatory bowel disease or severe congenital neutropenia. We have developed an accurate and reliable state-of-the-art pipeline to process the data generated by our High throughput sequencing facility .
Before determining genomic variation in patient sequencing data we perform a unique quality control procedure that ensures not only a high quality of the underlying sequencing but also consistent information on the patient. We are using the sequencing data to compute the patient’s relationship to other individuals in our cohort and to predict its blood group and its ethnic origin. We can use any identified contradiction to remove and solve sample mixups at an early stage without corrupting any downstream analysis.
In order to detect disease causing genomic variants, we developed a high quality workflow that goes beyond standard analysis. In contrast to sequential analysis of single patients, we increased the accuracy of the variant detection step by performing joint analyses of all collected data. We developed a sophisticated frequency filtering approach that removes non-pathogenic population-specific variants before generating candidate lists. We accelerated the evaluation of candidate lists by applying our tool SmartPhase to efficiently reduce the number of potential compound heterozygous variant pairs. We extended the focus on small variation by screening for deletions of whole exomes and are developing routines to identify more complex structural variations. We expanded the search space for pathogenic variants by also analyzing variants in 5’ untranslated regions that create upstream open-reading frames or disrupt the stop codons of existing upstream open-reading frames. We are investigating synergistic effects of multiple variants in the same patient by integrating multi-nucleotide variants in our analysis and developing a machine learning approach to identify complex disease causing variant patterns.
We use and maintain an internal Linux network consisting of three workstation PCs, two powerful server computers and four cluster computers to enable time-efficient analyses. In addition, we have a storage system that provides more than 200 terabytes of space to store data.
KNIME4NGS
Description: We present a toolbox to enable the analysis of NGS data using the graphical user interface of the KNIME workflow management platform.
Hastreiter, M., Jeske, T., Hoser, J., Kluge, M., Ahomaa, K., Friedl, M. S., ... & Küffner, R. (2017). KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis. Bioinformatics, 33(10), 1565-1567.
Link to article: https://academic.oup.com/bioinformatics/article/33/10/1565/2871246
Link to website: http://ibisngs.github.io/knime4ngs/index.html
DEUS
Description: We describe the implementation and results of an R package that is tailored for the analysis of small non-coding RNA sequencing data having several advantages in contrast to mapping based analysis approaches.
Jeske, T., Huypens, P., Stirm, L., Höckele, S., Wurmser, C. M., Böhm, A., ... & Hastreiter, M. (2019). DEUS: an R package for accurate small RNA profiling based on differential expression of unique sequences. Bioinformatics.
Link to article: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz495/5522007
Link to website: http://ibis.helmholtz-muenchen.de/deus
SmartPhase
Description: We present a modular Java-based phasing tool that enables targeted phasing of heterozygous variants tailored for the use in precision medicine.
Hager, P., Mewes, H.-W., Rohlfs, M., Klein, C. & Jeske, T. (2019). SmartPhase: accurate and fast phasing of potentially compound heterozygous variant pairs for genetic diagnosis of rare diseases. Submitted.
Link to website: http://ibis.helmholtz-muenchen.de/smartphase/