Courses‎ > ‎R worksheets‎ > ‎

### R worksheet: influential datapoints

 Influential datapoints are datapoints whose inclusion has an outsize influence on the estimated coefficients. Functions like dfbeta() and which.influence() let you test whether you have such datapoints. What to do if you do find such datapoints? Then you report the coefficients for you whole dataset, and also state that you had influential datapoints, and how the coefficients would change without each of them.`# influential datapoints in regression``# linear regression``library(languageR)``# Hinton "smiling" data again``# which was created explicitly to demonstrate outsize influence of datapoints``hinton.smile = data.frame(smile.time = c(0.4, 0.8, 0.8, 1.2, 1.4, 1.8, 2.2, 2.6, 3.0),``                          sold = c(16, 12, 20, 16, 34, 30, 26, 22, 38))``lm.obj = lm(smile.time  ~ sold, data = hinton.smile)``summary(lm.obj)``# coefficients:``# Intercept -0.07252``# sold 0.06941``# we are looking for datapoints``# whose elimination would lead to a change in a coefficient``# that is t times  the size of the coefficient:``# where some recommend t=0.5, others t=0.2``# dfbeta gives the adjustment for each coefficient``# for each datapoint.``# We transform its output to a data frame``# so we can more conveniently check whether``# any datapoint leads to an outsized adjustment``# there is none.``dfbeta.df = data.frame(dfbeta(lm.obj))``# large adjustment: |adjustment| > 0.2 * |coefficient|``dfbeta.df[abs(dfbeta.df\$sold) > 0.2 * abs(lm.obj\$coefficient),]``# with ols, we would use the functions``# which.influence and show.influence``library(rms)``# We have to specify x=T and y=T to be able to run which.influence``ols.obj = ols(smile.time ~ sold, data = hinton.smile, x = T, y = T)``which.influence(ols.obj, cutoff = 0.2)``# inspecting the values for influential datapoints``show.influence(which.influence(ols.obj, cutoff = 0.2), hinton.smile)``# logistic regression: we can again use``# which.influence and show.influence``lrm.obj = lrm(Regularity ~ WrittenFrequency + Auxiliary, data = regularity, x = T, y = T)``which.influence(lrm.obj)``# again, influential datapoints``show.influence(which.influence(lrm.obj), regularity)`