"Variance refers to the amount by which ˆ f would change if we
estimated it using a different training data set. Since the training data
are used to fit the statistical learning method, different training data sets
will result in a different ˆ f. But ideally the estimate for f should not vary
too much between training sets. However, if a method has high variance
then small changes in the training data can result in large changes in ˆ f. In
general, more flexible statistical methods have higher variance."
"On the other hand, bias refers to the error that is introduced by approximating
a real-life problem, which may be extremely complicated, by a much
simpler model. For example, linear regression assumes that there is a linear
relationship between Y and X1,X2, . . . , Xp. It is unlikely that any real-life
problem truly has such a simple linear relationship, and so performing linear
regression will undoubtedly result in some bias in the estimate of f."
From ISLR
No comments:
Post a Comment