Friday, March 7, 2014

State-of-the-art prediction models

Random forest
1) simple decision tree - easy to interpret, but not that accurate in predictions
2) bagging
- generate many data set by bootstrapping the training data, grow a tree on each data set, new model = average of the outcome of the individual trees.
- idea is that with n observations, each having variance sigma squared, the variance of the mean of n observations will be sigma squared / n
- ie. We can reduce the model variance by averaging the results of many trees
3) Random forest
- the trees described above with bagging have high correlation and thus doesn't reduce the variance that much
- random forest reduces correlation among trees by only considering a random subset of predictors  (typically square root of full set) at each split 
- by reducing correlation among trees, we reduce more of our model variance
- wouldn't it hurt the tree accuracy by reducing predictors at each split? Not really,  since all predictors will get a chance at some depth of the tree while at the same time, this arrangement gives a better chance for predictors dominated by other predictors.

Boosting
Sequential shrunken tree on the RESIDUAL
Build a tree with depth = 1, say
Shrink the tree by a factor of lambda (0.01 , say) and add the shrunken tree to the model
Calculate the residual and fit a tree on the new residual.
Repeat the above for the number of trees u want
Idea is that we learn and correct the residual slowly by shrinking each tree on the residuals

No comments:

Post a Comment