Monday, February 17, 2014

How to find the variability of your estimate (aka standard error)?

summary of the lm function will give u the answer, but it's standard error of the TRAINING data.
for X1, that's 0.02593

How to find the standard error of TEST data?

bootstrap - concept is to
  1. randomly sample the TRAINING data with replacement to produce a new data set
  2. do the above over and over again and you'll have many data sets
  3. for each data set, you get an estimate and you'll end up with many estimates
  4. standard deviation of the sample estimates will be your standard errors
(quite clever!)

> summary(lm(y~X1+X2, Xy))

Call:
lm(formula = y ~ X1 + X2, data = Xy)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.44171 -0.25468 -0.01736  0.33081  1.45860 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.26583    0.01988  13.372  < 2e-16 ***
X1           0.14533    0.02593   5.604 2.71e-08 ***
X2           0.31337    0.02923  10.722  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.5451 on 997 degrees of freedom
Multiple R-squared: 0.1171, Adjusted R-squared: 0.1154 
F-statistic: 66.14 on 2 and 997 DF,  p-value: < 2.2e-16 

> coef(summary(lm(y~X1+X2, Xy[1:10,])))
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) -5.691507  1.4521776 -3.919292 0.0057542755
X1          -3.573758  0.6417983 -5.568350 0.0008434796
X2          13.189553  2.8350903  4.652252 0.0023356811
> coef(summary(lm(y~X1+X2, Xy[1:10,])))["X1", "Estimate"]
[1] -3.573758

> est.fn=function(data, index){
+ coef(summary(lm(y~X1+X2, data[index,])))["X1", "Estimate"]}
> est.fn(Xy, 1:10)
[1] -3.573758
> est.fn(Xy, 1:1000)
[1] 0.1453263
> boot.out=boot(Xy,est.fn,R=1000)
> boot.out

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Xy, statistic = est.fn, R = 1000)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.1453263 0.0002518276   0.0302769
> plot(boot.out) 


 
 


for time series, ie the data are not idd, we do boostrap in a block of certain size:

> tsboot(Xy, est.fn, 1000, sim="fixed", l=100)
BLOCK BOOTSTRAP FOR TIME SERIES
Fixed Block Length of 100 
Call:tsboot(tseries = Xy, statistic = est.fn, R = 1000, l = 100, sim = "fixed")

Bootstrap Statistics :     original      bias    std. errort1* 0.1453263 0.001404265   0.1983411

No comments:

Post a Comment