By the numbers...: How to determine how good your regression model is?

Sunday, January 26, 2014

How to determine how good your regression model is?

It comes down to model variance - model bias trade off.

What's model variance?
Let's have our model fit the data as much as possible.
If we randomly pick a certain % of our data and fit as much as possible, we have model 1.
We do this over and over again for n times and end up with n models.
The model variance is basically the variance of the predicted value (y) of these n models for a given x.
Typically, if we use a less flexible model, say linear, the model variance is gonna be less.

What's model bias?
That's the difference between average predicted y from n models for a given x and the actual y for the same x.
Typically, if we use a less flexible model, say linear, the model bias is gonna be bigger.

We'd like to arrive at a better model by trading off between model variance and model bias. We don't want our model be too flexible to overfit the data and introduce a huge model variance while we don't want our model be to rigid and have the predicted value be too far away from actual value.

How to tell if u are overfitting?
Use out of sample data.
Let's say u fit the training data and come up with a model with a very small mean sq error on the training data.
Have this model to predict the out of sample data and compare with the actual result. If u have a large mean sq error on the out of sample data, you are probably overfitting your training data.

By the numbers...

Sunday, January 26, 2014

How to determine how good your regression model is?

No comments:

Post a Comment