Business Analytics and Machine Learning (IN2028), WS 24/25
Prof. Dr. Martin Bichler
Julius Durmann · Markus Ewert · Yutong Chao
Description
This is an introductory course in data analysis and machine learning with a focus on methods relevant to management and economics. The participants will learn widespread methods for regression, classification, clustering, and dimensionality reduction. Although we occasionally discuss exemplary applications to motivate specific methods, applications are not our primary focus!
The course comprises weekly lectures, exercise sheets, homework sheets, and tutorials in smaller groups. The exercises consist of theoretical considerations, applications, and programming exercises in Python.
During the semester, we will offer a midterm exam where students can apply their knowledge to a dataset. They will prepare the data, analyze it, and train a prediction model based on it.
The course is an elective for students in the BSc Mathematics. Students from IN, GE, and DE&A can choose only one of the following classes:
- Data Mining, IN2023, 2V, WS, Prof. Runkler
- Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
- Data Analysis and Visualization in R, IN2339, 2V+4Ü, WS, Prof. Gagneur
You can usually find more information on mutually incompatible courses on your study programs' websites.
Prerequisites
This course targets Bachelor students. We invite Master students to join if they have not yet taken a machine learning or data mining course.
Students who enroll for this course should bring the following prerequisites.
- For the initial classes, we expect students to know basic inferential statistics (statistical estimation, statistical testing, and simple linear regression).
- For later classes, you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).
We will provide some recapitulation but cannot revisit complete statistics, linear algebra, and calculus courses. We recommend taking respective courses first if you are uncomfortable with the above mathematical basics.
Please note that examples and exercises in this course will use Python. Also, there might be tasks in the exam that ask you to interpret small Python code. We will provide a short introduction for self-study (!) at the beginning of the course if you are new to the language. Nonetheless, prior knowledge of Python (or other programming languages) is beneficial.
Learning Outcome
After completing this course, students can apply and analyze widespread machine learning methods for numerical prediction, classification, clustering, and dimensionality reduction. They understand the assumptions of the methods and reproduce their functionality. Successful participants know the algorithms' mathematical foundations. Students can explain the fundamentals of neural networks and reinforcement learning. Moreover, participants can implement solutions for analyzing data sets with Python and interpret the results.
Organization
Introduction: Please attend our first lecture on October 14, 2024 for organizational details.
Important links (registration and information):
Lecture | Tutorials | Moodle |
Lecture: Monday, 2 pm - 4 pm, Lecture hall in the Galileo building (Garching)
Tutorials: Room 01.10.011, MI building, Garching.
Group 1 | Tuesday | 10 am - 12 pm | online |
Group 2 | Tuesday | 2 pm - 4 pm | online |
Group 3 | Wednesday | 10 am - 12 pm | onsite |
Group 4 | Wednesday | 12 pm - 2 pm | onsite |
Group 5 | Thursday | 10 am - 12 pm | onsite |
Group 6 | Thursday | 2 pm - 4 pm | onsite |
Group 7 | Thursday | 4 pm - 6 pm | onsite |
Group 8 | Friday | 10 am - 12 pm | onsite |
Group 9 | Friday | 12 pm - 2 pm | onsite |
Group 10 | Friday | 2 pm - 4 pm | onsite |
Note: For the exact dates of lecture and tutorials, please check the schedule in TUMOnline. Some dates might be subject to change due to holidays or university events.
Exam: There will be two exam opportunities (endterm and retake) in early 2025. Both exams are planned as on-site exams. There will be no online exam option.
Syllabus*
- Regression Analysis
Statistical estimation, Test theory, Linear regression (Ordinary Least Squares) - Regression Diagnostics
Gauss-Markov theorem, GM assumptions, Omitted variable bias, Panel data analysis - Logistic and Poisson Regression
Generalized linear models, Logit / Probit / Poisson regression - Naïve Bayes and Bayes Nets
Bayes rule, Bayesian Networks, d-separation - Decision Tree Classifiers
Decision trees, Entropy, Information gain, C4.5, CART, Tree pruning - Data Preparation and Causal Inference
CRISP-DM, Practical data preparation,
Causal inference, Internal validity, Differences in differences, Propensity Score Matching, Multiple imputation - Model Selection and Evaluation
Bias-variance tradeoff, (Cross-) Validation, Gain / Lift / ROC curves - Ensemble Methods and Clustering
Bagging, Random forests, Boosting, Stacking
Hierarchical clustering, K-means, Expectation maximization - Dimensionality Reduction
PCA, SVD, PCA regression, PLS regression, Regularization, Ridge regression, LASSO - Convex Optimization
Gradient descent, Momentum, Newton's method - Neural Networks
Feed-forward networks, Backpropagation, Gradient descent - Reinforcement Learning
Markov Decision Processes, Policies, Value functions, Value Iteration
Q-Learning, REINFORCE - Summary, Q&A
*may be subject to change
Literature
The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:
- Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016.
- Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016
- James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014
- Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. Second edition. The MIT Press, 2018.
Contacts
Please use the moodle forum for general questions!
The contact mail below is only meant for personal questions.
Mail: ba(at)dss.cit.tum.de
Julius Durmann, M.Sc. | Markus Ewert, M.Sc. |
Yutong Chao, M.Sc. |