Wednesday, March 5, 2014

why (not) linear regression?

why?
  • simple
  • easy to interpret (more important than you can imagine, especially when you need to explain the model to someone)

why not?
  • relationship is never linear (well, almost never)
    • example, usually wage varies linearly with age, but it kinda flattens up to a certain age
  • try smooth spline, which would capture non-linearity.  
    • Limit the degree of freedom to avoid excessive model variance (ie. over fitting ) 
      • smooth.spline(age,wage,df=16) 
    • "we can use LOO cross-validation to select the smoothing parameter for us automatically" 
      • smooth.spline(age,wage,cv=TRUE)

No comments:

Post a Comment