Machine learning types and training

Vocabulary

English	Chinese	Pinyin
backpropagation	反向传播	fǎn xiàng chuán bō
supervised learning	监督学习	jiān dū xué xí
classification	分类	fēn lèi
unsupervised learning	无监督学习	wú jiān dū xué xí
clusters	聚类	jù lèi
reinforcement learning	强化学习	qiáng huà xué xí
reward	奖励	jiǎng lì
agent	智能体	zhì néng tǐ
policy	策略	cè lüè
gradient descent	梯度下降	tī dù xià jiàng
loss function	损失函数	sǔn shī hán shù
epochs	训练轮次	xùn liàn lún cì

How machines learn

Machine learning comes in three paradigms.
Each suits a different kind of problem.
And neural networks learn by backpropagation 反向传播.

Explore

AI learning type lab

Classify AI examples by the type of learning or concern involved.

The three paradigms

Supervised learning 监督学习 — the data has labels (images tagged "cat"/"dog"); the model learns input → label. Used for classification 分类 and regression.
Unsupervised learning 无监督学习 — no labels; the model finds structure, e.g. clusters 聚类 of similar customers.
Reinforcement learning 强化学习 — learning by reward 奖励 (below).

Deep learning is part of machine learning, which is part of AI

Supervised learning: a model is trained on labelled data, then recognises new data

Practice

Supervised learning uses:

labelled data, learning to map input → label
unlabelled data, finding hidden structure
rewards from an environment
no data at all

Practice

Unsupervised learning is used to:

find structure in unlabelled data, such as clusters
map inputs to known labels
maximise a reward signal
compile code

Practice

Match each ML paradigm to its data.

labelled examples (input → label)

unlabelled data (find structure)

rewards from acting in an environment

Supervised
Unsupervised
Reinforcement

Reinforcement learning

An agent 智能体 acts in an environment; each action changes the state and returns a reward.
The agent learns a policy 策略 (a strategy) that maximises total reward over time — by trial and error, with no labels up front.
Used for sequential-decision problems: games, robot control, self-driving cars.

A humanoid robot uses AI to see, listen and respond like a person

Practice

In reinforcement learning, the agent learns by:

trial and error, maximising the total reward over time
reading labelled examples
clustering the data
compiling a program

Training by backpropagation

Training adjusts the weights so outputs match the targets, using backpropagation with gradient descent 梯度下降:
forward pass (input → output) → compute the error with a loss function 损失函数 → backward pass (propagate the error back to find each weight's gradient) → update the weights by a small step (the learning rate).
Repeat over many examples and passes (epochs 训练轮次) until the error stops shrinking. After training, a prediction is just one forward pass.

Industrial robot arms on a production line: reinforcement learning can teach a robot to control its movements

Practice

Backpropagation trains a network by:

propagating the output error backwards to find each weight's gradient, then updating the weights
randomly guessing the weights
copying the input to the output
deleting hidden layers

Practice

In gradient descent the learning rate sets how big a step each weight update takes — too large overshoots the minimum, too small makes training slow.

You've got it

Key idea

supervised = labelled data (classification/regression); unsupervised = find structure (clustering)
reinforcement = an agent learns a policy by maximising reward through trial and error
backpropagation: forward pass → loss → propagate error back → update weights (gradient descent)
training takes many epochs; inference is one forward pass

How machines learn

AI learning type lab

The three paradigms

Reinforcement learning

Training by backpropagation

You've got it

Handout

Log in or create account

Feedback & help