Regression diagnostics. — regression.diagnostics • rnorsk

The assumption of homoskedasticity is examined using the Breusch-Pagan test (Gujarati, 2012, pp. 86-87). Since the Ho:homoskedastic residuals, p-value < 0.05 would show that there is a heterokedasticity problem in the model.

regression.diagnostics(
  mod,
  crit.bp = 0.05,
  crit.ncv = 0.05,
  crit.vif = 5,
  crit.shapiro = 0.01,
  crit.reset = 0.05,
  crit.linktest = 0.05,
  crit.cook = 1,
  crit.outlier = 0.05,
  crit.dwt = 0.05
)

Arguments

model: lm-model object

Value

tibble

Details

The assumption of no severe multicollinearity is examined using VIF (Variance Inflation Factor)-values. A VIF value above 5.0 is used as a sign of severe multicollinearity in the model (Studenmund, 2006, p.271).
The assumption of normally-distributed residuals is examined using Shapiro-Wilk W test. Since the Ho:residuals are normmally distributed, p-value < 0.01 would indicate that residuals are not normally distributed. The reason why I propose 0.01 as a cutoff is that in almost every case, we reject the Ho at 0.05. Further, Shapiro-Wilk W test is, like any other, sensitive to large sample sizes. I still suggest that one additionnaly examines the residual plots.
The assumption of correctly specified model is examined using the linktest (Stata Manual, pp. 1041-1044). A statistically significant _hatsq (p < 0.05) would show a specification problem.
The assumption of appropriate functional form is examined using Ramsey's regression specification error test (RESET) (Wooldridge, pp. 303-305). Since the Ho: appropriate functional form, p-value < 0.05 would indicate a functional form problem.
Influence is based on both leverage and the extent to which the observation is an outlier. Cook's distance (D) is used to locate any influential observations. An observation with D > 1 would often be considered an influential case and should thus be removed from the analysis (Pardoe, 2006, p. 171).

Examples

mod=lm(Sepal.Length ~ Sepal.Width * Petal.Length, data=iris)
regression.diagnostics(mod)
#> there are higher-order terms (interactions) in this model
#> consider setting type = 'predictor'; see ?vif
#> Tests of linear model assumptions
#> ---------------------------------
#> 
#> 7/11 (63.6 %) checks failed
#> 
#> 
#> Identified problems: 
#> 	heteroskedasticity
#> 	multicollinearity
#> 	model specification
#> 	functional form
#> Summary:
#> # A tibble: 11 × 8
#>    assumption          variable  test  statistic  p.value  crit problem decision
#>    <chr>               <chr>     <chr>     <dbl>    <dbl> <dbl> <chr>   <chr>   
#>  1 heteroskedasticity  global    stud…   13.0     0.00460  0.05 Problem -       
#>  2 heteroskedasticity  global    Non-…   10.2     0.00138  0.05 Problem -       
#>  3 multicollinearity   Sepal.Wi… Vari…    6.46   NA        5    Problem -       
#>  4 multicollinearity   Petal.Le… Vari…   81.8    NA        5    Problem -       
#>  5 multicollinearity   Sepal.Wi… Vari…   69.2    NA        5    Problem -       
#>  6 normality           global    Shap…    0.992   0.565    0.01 No Pro… +       
#>  7 model specification global    Stat…    0.119   0.0114   0.05 Problem -       
#>  8 functional form     global    RESE…    5.85    0.00360  0.05 Problem -       
#>  9 outliers            global    Cook…    0.142  NA        1    No Pro… +       
#> 10 outliers            global    Bonf…    3.13    0.314    0.05 No Pro… +       
#> 11 autocorrelation     global    Durb…   -0.0346  0.842    0.05 No Pro… +       
#> 
#> Outliers:
#> -----------
#> Cook's distance (criterion=1.00): No outliers
#> Outlier test (criterion=0.05): No outliers
#> 

cars1 <- cars[1:30, ]  # original data
cars_outliers <- data.frame(speed=c(19,19), dist=c(190, 1806))  # introduce outliers.
cars2 <- rbind(cars1, cars_outliers)  # data with outliers.
mod=lm(speed ~ dist, data=cars2)
regression.diagnostics(mod)
#> Tests of linear model assumptions
#> ---------------------------------
#> 
#> 5/9 (55.6 %) checks failed
#> 
#> 
#> Identified problems: 
#> 	model specification
#> 	functional form
#> 	outliers
#> 	autocorrelation
#> Summary:
#> # A tibble: 9 × 8
#>   assumption          variable test    statistic  p.value  crit problem decision
#>   <chr>               <chr>    <chr>       <dbl>    <dbl> <dbl> <chr>   <chr>   
#> 1 heteroskedasticity  global   studen…     0.371  5.42e-1  0.05 No Pro… +       
#> 2 heteroskedasticity  global   Non-co…     0.364  5.46e-1  0.05 No Pro… +       
#> 3 multicollinearity   NA       Varian…    NA     NA       NA    No Pro… +       
#> 4 normality           global   Shapir…     0.964  3.56e-1  0.01 No Pro… +       
#> 5 model specification global   Stata …    -1.85   6.19e-4  0.05 Problem -       
#> 6 functional form     global   RESET …    14.5    4.77e-5  0.05 Problem -       
#> 7 outliers            global   Cook's…   454.    NA        1    Problem -       
#> 8 outliers            global   Bonfer…     3.68   2.99e-2  0.05 Problem -       
#> 9 autocorrelation     global   Durbin…     0.814  0        0.05 Problem -       
#> 
#> Outliers:
#> -----------
#> Cook's distance (criterion=1.00):
#>  index   cooksd
#>     32 454.4288
#> Outlier test (criterion=0.05):
#>     rstudent unadjusted p-value Bonferroni p
#> 32 -3.684385         0.00093564      0.02994