- split the training set into training set and a test set
- the test set is used to evaluate the model
1. linear regression
compute test error
J t e s t ( w ? , b ) = 1 2 m t e s t ∑ i = 1 m t e s t [ ( f ( x t e s t ( i ) ) ? y t e s t ( i ) ) 2 ] J_{test}(\vec w, b) = \frac{1}{2m_{test}}\sum_{i=1}^{m_{test}} \left [ (f(x_{test}^{(i)}) - y_{test}^{(i)})^2 \right ] Jtest?(w,b)=2mtest?1?i=1∑mtest??[(f(xtest(i)?)?ytest(i)?)2]
2. classification regression
compute test error
J t e s t ( w ? , b ) = ? 1 m t e s t ∑ i = 1 m t e s t [ y t e s t ( i ) l o g ( f ( x t e s t ( i ) ) ) + ( 1 ? y t e s t ( i ) ) l o g ( 1 ? f ( x t e s t ( i ) ) ] J_{test}(\vec w, b) = -\frac{1}{m_{test}}\sum_{i=1}^{m_{test}} \left [ y_{test}^{(i)}log(f(x_{test}^{(i)})) + (1 - y_{test}^{(i)})log(1 - f(x_{test}^{(i)}) \right ] Jtest?(w,b)=?mtest?1?i=1∑mtest??[ytest(i)?log(f(xtest(i)?))+(1?ytest(i)?)log(1?f(xtest(i)?)]
- split the training set into training set, cross-validation set and test set
- the cross-validation set is used to automatically choose the better model, and the test set is used to evaluate the model that chosed
- high bias: J t r a i n J_{train} Jtrain? and J c v J_{cv} Jcv? is both high
- high variance: J t r a i n J_{train} Jtrain? is low, but J c v J_{cv} Jcv? is high
- if high bias: get more training set is helpless
- if high variance: get more training set is helpful
- if λ \lambda λ is too small, it will lead to overfitting(high variance)
- if λ \lambda λ is too large, it will lead to underfitting(high bias)
- fix high variance:
- get more training set
- try smaller set of features
- reduce some of the higher-order terms
- increase λ \lambda λ
- fix high bias:
- get more addtional features
- add polynomial features
- decrease λ \lambda λ
- a bigger network means a more complex model, so it will solve the high bias
- more data is helpful to solve high variance
- it turns out that a bigger(may be overfitting) and well regularized neural network is better than a small neural network