Business Analytics and Machine Learning (IN2028), WS 24/25

Prof. Dr. Martin Bichler

Julius Durmann  ·   Markus Ewert  ·  Yutong Chao

Description

This is an introductory course in data analysis and machine learning with a focus on methods relevant to management and economics. The participants will learn widespread methods for regression, classification, clustering, and dimensionality reduction. Although we occasionally discuss exemplary applications to motivate specific methods, applications are not our primary focus!

The course comprises weekly lectures, exercise sheets, homework sheets, and tutorials in smaller groups. The exercises consist of theoretical considerations, applications, and programming exercises in Python.
During the semester, we will offer a midterm exam where students can apply their knowledge to a dataset. They will prepare the data, analyze it, and train a prediction model based on it.

The course is an elective for students in the BSc Mathematics. Students from IN, GE, and DE&A can choose only one of the following classes:

  • Data Mining, IN2023, 2V, WS, Prof. Runkler
  • Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
  • Data Analysis and Visualization in R, IN2339, 2V+4Ü, WS, Prof. Gagneur

You can usually find more information on mutually incompatible courses on your study programs' websites.

Prerequisites

This course targets Bachelor students. We invite Master students to join if they have not yet taken a machine learning or data mining course.

Students who enroll for this course should bring the following prerequisites.

  • For the initial classes, we expect students to know basic inferential statistics (statistical estimation, statistical testing, and simple linear regression).
  • For later classes, you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).

We will provide some recapitulation but cannot revisit complete statistics, linear algebra, and calculus courses. We recommend taking respective courses first if you are uncomfortable with the above mathematical basics.

Please note that examples and exercises in this course will use Python. Also, there might be tasks in the exam that ask you to interpret small Python code. We will provide a short introduction for self-study (!) at the beginning of the course if you are new to the language. Nonetheless, prior knowledge of Python (or other programming languages) is beneficial.

Learning Outcome

After completing this course, students can apply and analyze widespread machine learning methods for numerical prediction, classification, clustering, and dimensionality reduction. They understand the assumptions of the methods and reproduce their functionality. Successful participants know the algorithms' mathematical foundations. Students can explain the fundamentals of neural networks and reinforcement learning. Moreover, participants can implement solutions for analyzing data sets with Python and interpret the results.
 

Organization

Introduction: Please attend our first lecture on October 14, 2024 for organizational details.

Important links (registration and information):

Lecture: Monday, 2 pm - 4 pm, Lecture hall in the Galileo building (Garching)

Tutorials: Room 01.10.011, MI building, Garching.

Group 1 Tuesday 10 am - 12 pm online
Group 2 Tuesday 2 pm - 4 pm online
Group 3 Wednesday 10 am - 12 pm onsite
Group 4 Wednesday 12 pm - 2 pm onsite
Group 5 Thursday 10 am - 12 pm onsite
Group 6 Thursday 2 pm - 4 pm onsite
Group 7 Thursday 4 pm - 6 pm onsite
Group 8 Friday 10 am - 12 pm onsite
Group 9 Friday 12 pm - 2 pm onsite
Group 10 Friday 2 pm - 4 pm onsite

Note: For the exact dates of lecture and tutorials, please check the schedule in TUMOnline. Some dates might be subject to change due to holidays or university events.

Exam: There will be two exam opportunities (endterm and retake) in early 2025. Both exams are planned as on-site exams. There will be no online exam option.

Syllabus*

  1. Regression Analysis
    Statistical estimation, Test theory, Linear regression (Ordinary Least Squares)
  2. Regression Diagnostics
    Gauss-Markov theorem, GM assumptions, Omitted variable bias, Panel data analysis
  3. Logistic and Poisson Regression
    Generalized linear models, Logit / Probit / Poisson regression
  4. Naïve Bayes and Bayes Nets
    Bayes rule, Bayesian Networks, d-separation
  5. Decision Tree Classifiers
    Decision trees, Entropy, Information gain, C4.5, CART, Tree pruning
  6. Data Preparation and Causal Inference
    CRISP-DM, Practical data preparation, 
    Causal inference, Internal validity, Differences
    in differences, Propensity Score Matching, Multiple imputation
  7. Model Selection and Evaluation
    Bias-variance tradeoff, (Cross-) Validation, Gain / Lift / ROC curves
  8. Ensemble Methods and Clustering
    Bagging, Random forests, Boosting, Stacking
    Hierarchical clustering, K-means, Expectation maximization
  9. Dimensionality Reduction
    PCA, SVD, PCA regression, PLS regression, Regularization, Ridge regression, LASSO
  10. Convex Optimization
    Gradient descent, Momentum, Newton's method
  11. Neural Networks
    Feed-forward networks, Backpropagation, Gradient descent
  12. Reinforcement Learning
    Markov Decision Processes, Policies, Value functions, Value Iteration
    Q-Learning, REINFORCE
  13. Summary, Q&A

*may be subject to change

Literature

The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:

  • Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016.
  • Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016
  • James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014 
  • Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. Second edition. The MIT Press, 2018.

Contacts

Please use the moodle forum for general questions!
The contact mail below is only meant for personal questions.

Mail: ba@dss.cit.tum.de

Julius Durmann, M.Sc.
Room 01.10.054 (Garching)

Markus Ewert, M.Sc.
Room 01.10.055 (Garching)

Yutong Chao, M.Sc.
Room 01.10.036 (Garching)