ML Key concepts

Supervised learning

Supervised learning develops predictive models from both input data and output data. The training set contains pairs $(x_{i}, y_{i})$ : $x_{i}$ is the feature vector and $y_{i}$ is the target. The target determines the task type.

Classification: $y_{i}$ is a category, e.g. spam/not spam, image class, disease class, “purchase yes/no”.
Regression: $y_{i}$ is a continuous value, e.g. price, weight, temperature, flux, risk score, or a measured physical quantity.

The PDF’s central split is exactly this: classification predicts labels from known labels; regression predicts continuous values from known values.

Classification

Classification maps an input to a categorical label or to probabilities over labels. It is useful when the desired output is discrete and interpretable, such as “Class A vs Class B” or “yes vs no”. Many classifiers output a score first, then a decision rule converts the score into a label.

Advantages from the cheat sheet:

Results can be easy to interpret because each input receives a clear class.
It matches categorical-output problems such as spam detection, purchase prediction, image recognition, and medical diagnosis.

Disadvantages / traps:

Some simple classifiers assume the data are linearly separable; real data often are not.
Complex classifiers can overfit and generalise badly if validation is weak.
Accuracy alone can hide poor performance on rare classes.

Regression

Regression maps an input to a continuous value. The output is not a class but a number. The cheat sheet emphasises that regression can model a wide variety of relationships, not only straight lines, and is suitable whenever the target is continuous.

Advantages:

Natural for quantities: price, size, weight, time, distance, physical fields, brightness, or risk.
Can be linear, polynomial, non-parametric, tree-based, kernel-based, or probabilistic.

Disadvantages / traps:

Outliers can strongly affect some fitted lines and predictions.
A linearity assumption can be too restrictive.
Residual patterns often reveal missing structure.

Model families in the cheat sheet

Support vector machines: find a separating or fitting hyperplane, with the classification version maximising margin.
K-nearest neighbours: predict from the nearest training examples; classification votes, regression averages.
Decision trees: recursively split feature space; leaves hold class labels or average target values.
Random forests: combine many trees trained on randomised data/features; classification votes, regression averages.
Gradient boosting: add weak models sequentially so each new model corrects previous errors/residuals.
Lasso: adds an $L^{1}$ penalty, often shrinking unhelpful coefficients to exactly zero.
Ridge: adds an $L^{2}$ penalty, shrinking coefficients smoothly toward zero.
Logistic regression: binary/multiclass classification via probabilities.
Linear discriminant analysis: finds linear feature combinations that separate classes.
Naive Bayes: uses Bayes’ theorem with simplifying conditional-independence assumptions.
Linear regression: best-fitting straight-line/linear relationship.
Polynomial regression: fits curved relationships by using powers of the input as features.
Gaussian process regression: probabilistic regression that gives predictions with uncertainty intervals.

Generalisation

Generalisation means doing well on unseen data. Overfitting means training error is low but validation/test error is high. Underfitting means the model is too simple to capture the structure even on training data. The practical loop is: choose a baseline, split the data honestly, train, tune on validation data, inspect errors, then report final performance on a held-out test set.

Regularisation

Regularisation adds a preference for simpler models. Lasso and Ridge are the canonical cheat-sheet examples: Lasso penalises absolute coefficient size and can do feature selection; Ridge penalises squared coefficient size and reduces sensitivity to noise.

Probabilistic view

Many losses are negative log-likelihoods from Stats Equations and definitions. Squared error corresponds to Gaussian noise; cross-entropy corresponds to categorical likelihoods; Naive Bayes explicitly uses Bayes’ theorem; Gaussian process regression directly models uncertainty.

Knowledge Garden

Explorer

ML Key concepts

ML Key concepts

Supervised learning

Classification

Regression

Model families in the cheat sheet

Generalisation

Regularisation

Probabilistic view

Graph View

Table of Contents

Backlinks