Machine Learning for Regulatory Genomics
Module: IN2393.
Credit: 6 ECTS
Room (lecture and exercise): 00.08.038
Lecturer: Julien Gagneur, Matthias Heinig, Maria Colomé-Tatché, Annalisa Marsico
Lecture: Tuesdays, 14:00 - 15:30, starting on 26th April 2022
Exercise: Tuesdays, 15:30 - 17:00, starting on 26th April 2022
Lecture Language: English
Prerequisite (recommended):
- One introductory lecture on machine learning (e.g IN2064; MA4802)
- Strong interest in biological and biomedical research questions
- Basics in python programming
Who can attend
Generally, the module is geared toward students from bioinformatics, computer science, as well as other students with a quantitative training (physics, applied maths) and an interest to dive into molecular biology. Students from biology or medicine are welcome guaranteed they have some background in machine learning (see above) and no inhibition with basic programming.
The module is an elective module in the catalogue of:
- MSc Bioinformatics
- MSc Informatics
- MSc Information Systems
- MSc Informatics: Games Engineering
- MSc Data Engineering and Analytics
- MSc Physics
Intended Learning Outcomes:
At the end of the module students are able to:
- Describe major steps of gene expression from accessing DNA to determining protein abundance.
- Describe genome-wide assays employed to assess various steps of gene expression
- Describe the concept of massively parallel reporter assays
- Describe and apply deep learning methods to perform sequence-based predictions
- Describe and apply the concept of model interpretation
- Describe and apply the concept of convolutional neural network
- Describe and apply the concept of transformers
- Apply deep learning for sequence-based modeling of a genome-wide assay. Evaluate model performance and provide biological interpretation of its application to real data.
Content:
Gene expression refers to how cells read the information encoded in genomes. This lecture introduces biological and computational concepts to study gene expression. It consists of two parts:
(1) 6 lectures introduce biological mechanisms, experimental assays, and computational models for regulatory genomics. The six lectures are supported with modeling exercises in python.
(2) A 7-8 week hands-on project
The lectures are organized around steps of gene expression:
- Introduction to gene regulation and sequence-based computational models of gene regulation
- Transcriptional regulation
- Chromatin-mediated regulation
- RNA splicing
- RNA modification and degradation
- Translation
Over these lectures, computational methods are introduced including:
- Fitting procedures of deep neural network
- Convolutional Neural Networks
- LSTM and transformers
- Embeddings for sequence data
- Multi-task learning and transfer learning
- End-to-end learning
- Analytical and visualisation techniques for model interpretation