Machine learning types and training
How machines learn
- Machine learning comes in three paradigms.
- Each suits a different kind of problem.
- And neural networks learn by backpropagation.
The three paradigms
- Supervised learning — the data has labels (images tagged "cat"/"dog"); the model learns input → label. Used for classification and regression.
- Unsupervised learning — no labels; the model finds structure, e.g. clusters of similar customers.
- Reinforcement learning — learning by reward (below).
Supervised learning uses:
Supervised learning trains on labelled examples (e.g. images tagged cat/dog).
Unsupervised learning is used to:
With no labels, unsupervised learning discovers patterns/clusters in the data.
Match each ML paradigm to its data.
Supervised = labels; unsupervised = no labels; reinforcement = reward feedback.
Reinforcement learning
- An agent acts in an environment; each action changes the state and returns a reward.
- The agent learns a policy (a strategy) that maximises total reward over time — by trial and error, with no labels up front.
- Used for sequential-decision problems: games, robot control, self-driving cars.
In reinforcement learning, the agent learns by:
The agent tries actions, receives rewards, and learns a policy that maximises long-term reward.
Training by backpropagation
- Training adjusts the weights so outputs match the targets, using backpropagation with gradient descent:
- forward pass (input → output) → compute the error with a loss function → backward pass (propagate the error back to find each weight's gradient) → update the weights by a small step (the learning rate).
- Repeat over many examples and passes (epochs) until the error stops shrinking. After training, a prediction is just one forward pass.
Backpropagation trains a network by:
The error flows from the output back through the network (chain rule) so every weight's gradient is found, then weights step downhill.
In gradient descent, the learning rate controls:
The learning rate sets the step size when nudging weights to reduce the error.
You've got it
- supervised = labelled data (classification/regression); unsupervised = find structure (clustering)
- reinforcement = an agent learns a policy by maximising reward through trial and error
- backpropagation: forward pass → loss → propagate error back → update weights (gradient descent)
- training takes many epochs; inference is one forward pass