
Why Supervised Learning Works?
In essence, "Low Training Error + Sufficient Data Relative to Model Complexity → Low Test Error".
Supervised learning must work on the SAME distribution.
Key Concepts
* Training Error (\(\text{Train}_S(f)\)): The error of a model \(f\) evaluated on the training dataset \(S\).
* Test Error (\(\text{Test}_D(f)