Regression · Trees · Boosting · SVM · Clustering · Evaluation
Assumptions: linear relationship, homoscedasticity, no multicollinearity, normally distributed residuals. Check with residual plots.
Output is a probability. Threshold at 0.5 by default but tune for imbalanced data. Use ROC-AUC to pick threshold.
Pick split that maximizes information gain (entropy reduction) or minimizes weighted Gini.
n_estimators=200+, max_features='sqrt', min_samples_leaf=5.Each tree corrects errors of the previous ensemble.
early_stopping_rounds=50. Tune learning rate last — lower LR + more trees usually wins.Find the hyperplane that separates classes with maximum margin. Only the support vectors (points on the margin) matter.
Split into k folds, train on k-1, validate on 1. Repeat k times. Standard choice: k=5.
Preserve class distribution in each fold. Always use for classification with imbalanced classes.
Never shuffle! Train on past, validate on future. Use TimeSeriesSplit from sklearn.