Business Analytics and Machine Learning WS 24/25

Business Analytics and Machine Learning (IN2028), WS 24/25

Prof. Dr. Martin Bichler

Julius Durmann · Markus Ewert · Yutong Chao · Dr. Mete Ahunbay

Description

This is an introductory course in data analysis and machine learning with a focus on methods relevant to management and economics. The participants will learn widespread methods for regression, classification, clustering, and dimensionality reduction. Although we occasionally discuss exemplary applications to motivate specific methods, applications are not our primary focus!

The course comprises weekly lectures, exercise sheets, homework sheets, and tutorials in smaller groups. The exercises consist of theoretical considerations, applications, and programming exercises in Python.
During the semester, we will offer a midterm exam where students can apply their knowledge to a dataset. They will prepare the data, analyze it, and train a prediction model based on it.

The course is an elective for students in the BSc Mathematics. Students from IN, GE, and DE&A can choose only one of the following classes:

Data Mining, IN2023, 2V, WS, Prof. Runkler
Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
Data Analysis and Visualization in R, IN2339, 2V+4Ü, WS, Prof. Gagneur

You can usually find more information on mutually incompatible courses on your study programs' websites.

Prerequisites

This course targets Bachelor students. We invite Master students to join if they have not yet taken a machine learning or data mining course.

Students who enroll for this course should bring the following prerequisites.

For the initial classes, we expect students to know basic inferential statistics (statistical estimation, statistical testing, and simple linear regression).
For later classes, you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).

We will provide some recapitulation but cannot revisit complete statistics, linear algebra, and calculus courses. We recommend taking respective courses first if you are uncomfortable with the above mathematical basics.

Please note that examples and exercises in this course will use Python. Also, there might be tasks in the exam that ask you to interpret small Python code. We will provide a short introduction for self-study (!) at the beginning of the course if you are new to the language. Nonetheless, prior knowledge of Python (or other programming languages) is beneficial.

Learning Outcome

After completing this course, students can apply and analyze widespread machine learning methods for numerical prediction, classification, clustering, and dimensionality reduction. They understand the assumptions of the methods and reproduce their functionality. Successful participants know the algorithms' mathematical foundations. Students can explain the fundamentals of neural networks and reinforcement learning. Moreover, participants can implement solutions for analyzing data sets with Python and interpret the results.

Organization

Introduction: Please attend our first lecture on October 14, 2024 for organizational details.

Important links (registration and information):

Lecture

Tutorials

Moodle

Lecture: Monday, 2 pm - 4 pm, Lecture hall in the Galileo building (Garching)

Tutorials: Room 01.10.011, MI building, Garching.

Group 1	Tuesday	10 am - 12 pm	online
Group 2	Tuesday	2 pm - 4 pm	online
Group 3	Wednesday	10 am - 12 pm	onsite
Group 4	Wednesday	12 pm - 2 pm	onsite
Group 5	Thursday	10 am - 12 pm	onsite
Group 6	Thursday	2 pm - 4 pm	onsite
Group 7	Thursday	4 pm - 6 pm	onsite
Group 8	Friday	10 am - 12 pm	onsite
Group 9	Friday	12 pm - 2 pm	onsite
Group 10	Friday	2 pm - 4 pm	onsite

Note: For the exact dates of lecture and tutorials, please check the schedule in TUMOnline. Some dates might be subject to change due to holidays or university events.

Exam: There will be two exam opportunities (endterm and retake) in early 2025. Both exams are planned as on-site exams. There will be no online exam option.

Syllabus*

Regression Analysis
Statistical estimation, Test theory, Linear regression (Ordinary Least Squares)
Regression Diagnostics
Gauss-Markov theorem, GM assumptions, Omitted variable bias, Panel data analysis
Logistic and Poisson Regression
Generalized linear models, Logit / Probit / Poisson regression
Naïve Bayes and Bayes Nets
Bayes rule, Bayesian Networks, d-separation
Decision Tree Classifiers
Decision trees, Entropy, Information gain, C4.5, CART, Tree pruning
Data Preparation and Causal Inference
CRISP-DM, Practical data preparation,
Causal inference, Internal validity, Differences in differences, Propensity Score Matching, Multiple imputation
Model Selection and Evaluation
Bias-variance tradeoff, (Cross-) Validation, Gain / Lift / ROC curves
Ensemble Methods and Clustering
Bagging, Random forests, Boosting, Stacking
Hierarchical clustering, K-means, Expectation maximization
Dimensionality Reduction
PCA, SVD, PCA regression, PLS regression, Regularization, Ridge regression, LASSO
Convex Optimization
Gradient descent, Momentum, Newton's method
Neural Networks
Feed-forward networks, Backpropagation, Gradient descent
Reinforcement Learning
Markov Decision Processes, Policies, Value functions, Value Iteration
Q-Learning, REINFORCE
Summary, Q&A

*may be subject to change

Literature

The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:

Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016.
Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016
James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014
Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. Second edition. The MIT Press, 2018.

Contacts

Please use the moodle forum for general questions!
The contact mail below is only meant for personal questions.

Mail: ba(at)dss.cit.tum.de

Julius Durmann, M.Sc. Room 01.10.054 (Garching)	Markus Ewert, M.Sc. Room 01.10.055 (Garching)
Yutong Chao, M.Sc. Room 01.10.036 (Garching)