Computational Modelling for System Genetics

Module: CIT4230001

Credit: 6 ECTS

Room (lecture and exercise): Seminar room Taurus 1, Galileo (8120.EG.002)

Lecturer: Matthias Heinig, Julien Gagneur, Gabi Kastenmueller, Michael Menden, Paolo Casale, Elefteria Zeggini

Lecture: Thursdays, 14:00 - 15:00, Seminar room Taurus 1, Galileo (8120.EG.002)

Exercise: Thursdays, 15:00 - 17:00, Seminar room Taurus 1, Galileo (8120.EG.002)

Lecture Language: English

Prerequisite:

  • A course in Probability and Statistics. To check your level, go through before the lecture our primer in statistics, probability, and the multivariate Gaussian, i.e. make sure to be able to do the exercises available here: https://drive.google.com/drive/u/0/folders/1EuqIQCT0e407aGycjZpUaPHxW1Q3Wwol
  • Basics in biology / genetics (refresh your high-school level)
  • Recommended: An introductory course in data science, e.g.: Data analysis and visualization in R (IN2339)

Intended Learning Outcomes: At the end of the module, students understand and are able to practically implement:

  • Challenges of complex trait genetics
  • Statistical models for QTL mapping and GWAS
  • Methods for adjustment for multiple testing
  • Regularized linear models and its applications in genetics
  • Linear mixed models to deal with population structure
  • Experimental techniques to measure gene expression
  • Efficient algorithms for expression QTL analysis
  • Statistical concepts for causal inference such as Mendelian randomization
  • Network inference methods such as Graphical Gaussian models and application to omics data (metabolome, transcriptome)
  • Apply some of the above-mentioned techniques on an actual problem from systems genetics. Evaluate model performance, calibration and provide biological interpretation of its application to real data.

Content:

This is a two-part-module: (1) Seven lectures introduce basics of systems genetics, and statistical models employed. The seven lectures are supported with tutorials in R or python. In addition, we will provide a script covering statistical concepts required for the lectures that we expect the students to be familiar with before the start of the lectures.

This is followed by (2) an eight-week hands-on-project which will be suggested and supervised by lab members from the lecturers. This way, you will be given a unique opportunity to work on ongoing research projects currently addressed by the respective lab.

During the lectures, the following topics will be covered:

  • Introduction to human genetics and genome-wide association studies (GWAS)
  • Population structure
  • Polygenic risk score
  • Gene-mapping and variant fine-mapping
  • Gene expressions QTLs (eQTLs)
  • Causal inference with omics data (metabolomics, transcriptomics, etc.)
  • Omics approaches for rare diseases

 Over these lectures, various machine learning methods are introduced including:

  • Linear regression and hypothesis testing
  • Linear mixed models
  • Regularized linear models
  • Multiple testing correction
  • Graphical Gaussian models