The assumption of homoskedasticity is examined using the Breusch-Pagan test (Gujarati, 2012, pp. 86-87). Since the Ho:homoskedastic residuals, p-value < 0.05 would show that there is a heterokedasticity problem in the model.
regression.diagnostics(
mod,
crit.bp = 0.05,
crit.ncv = 0.05,
crit.vif = 5,
crit.shapiro = 0.01,
crit.reset = 0.05,
crit.linktest = 0.05,
crit.cook = 1,
crit.outlier = 0.05,
crit.dwt = 0.05
)
lm-model object
tibble
The assumption of no severe multicollinearity is examined using VIF (Variance Inflation Factor)-values. A VIF value above 5.0 is used as a sign of severe multicollinearity in the model (Studenmund, 2006, p.271).
The assumption of normally-distributed residuals is examined using Shapiro-Wilk W test. Since the Ho:residuals are normmally distributed, p-value < 0.01 would indicate that residuals are not normally distributed. The reason why I propose 0.01 as a cutoff is that in almost every case, we reject the Ho at 0.05. Further, Shapiro-Wilk W test is, like any other, sensitive to large sample sizes. I still suggest that one additionnaly examines the residual plots.
The assumption of correctly specified model is examined using the linktest (Stata Manual, pp. 1041-1044). A statistically significant _hatsq (p < 0.05) would show a specification problem.
The assumption of appropriate functional form is examined using Ramsey's regression specification error test (RESET) (Wooldridge, pp. 303-305). Since the Ho: appropriate functional form, p-value < 0.05 would indicate a functional form problem.
Influence is based on both leverage and the extent to which the observation is an outlier. Cook's distance (D) is used to locate any influential observations. An observation with D > 1 would often be considered an influential case and should thus be removed from the analysis (Pardoe, 2006, p. 171).
mod=lm(Sepal.Length ~ Sepal.Width * Petal.Length, data=iris)
regression.diagnostics(mod)
#> there are higher-order terms (interactions) in this model
#> consider setting type = 'predictor'; see ?vif
#> Tests of linear model assumptions
#> ---------------------------------
#>
#> 7/11 (63.6 %) checks failed
#>
#>
#> Identified problems:
#> heteroskedasticity
#> multicollinearity
#> model specification
#> functional form
#> Summary:
#> # A tibble: 11 × 8
#> assumption variable test statistic p.value crit problem decision
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 heteroskedasticity global stud… 13.0 0.00460 0.05 Problem -
#> 2 heteroskedasticity global Non-… 10.2 0.00138 0.05 Problem -
#> 3 multicollinearity Sepal.Wi… Vari… 6.46 NA 5 Problem -
#> 4 multicollinearity Petal.Le… Vari… 81.8 NA 5 Problem -
#> 5 multicollinearity Sepal.Wi… Vari… 69.2 NA 5 Problem -
#> 6 normality global Shap… 0.992 0.565 0.01 No Pro… +
#> 7 model specification global Stat… 0.119 0.0114 0.05 Problem -
#> 8 functional form global RESE… 5.85 0.00360 0.05 Problem -
#> 9 outliers global Cook… 0.142 NA 1 No Pro… +
#> 10 outliers global Bonf… 3.13 0.314 0.05 No Pro… +
#> 11 autocorrelation global Durb… -0.0346 0.842 0.05 No Pro… +
#>
#> Outliers:
#> -----------
#> Cook's distance (criterion=1.00): No outliers
#> Outlier test (criterion=0.05): No outliers
#>
cars1 <- cars[1:30, ] # original data
cars_outliers <- data.frame(speed=c(19,19), dist=c(190, 1806)) # introduce outliers.
cars2 <- rbind(cars1, cars_outliers) # data with outliers.
mod=lm(speed ~ dist, data=cars2)
regression.diagnostics(mod)
#> Tests of linear model assumptions
#> ---------------------------------
#>
#> 5/9 (55.6 %) checks failed
#>
#>
#> Identified problems:
#> model specification
#> functional form
#> outliers
#> autocorrelation
#> Summary:
#> # A tibble: 9 × 8
#> assumption variable test statistic p.value crit problem decision
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 heteroskedasticity global studen… 0.371 5.42e-1 0.05 No Pro… +
#> 2 heteroskedasticity global Non-co… 0.364 5.46e-1 0.05 No Pro… +
#> 3 multicollinearity NA Varian… NA NA NA No Pro… +
#> 4 normality global Shapir… 0.964 3.56e-1 0.01 No Pro… +
#> 5 model specification global Stata … -1.85 6.19e-4 0.05 Problem -
#> 6 functional form global RESET … 14.5 4.77e-5 0.05 Problem -
#> 7 outliers global Cook's… 454. NA 1 Problem -
#> 8 outliers global Bonfer… 3.68 2.99e-2 0.05 Problem -
#> 9 autocorrelation global Durbin… 0.814 0 0.05 Problem -
#>
#> Outliers:
#> -----------
#> Cook's distance (criterion=1.00):
#> index cooksd
#> 32 454.4288
#> Outlier test (criterion=0.05):
#> rstudent unadjusted p-value Bonferroni p
#> 32 -3.684385 0.00093564 0.02994