R code: model comparison

# Multiple possible models for predicting a dependent variable: Which one should we choose?

# This is a problem for linear regression. Here are three models for Hinton's study time data.

studytimedata = data.frame(participant = c(1,2,3,4,5,6,7,8,9,10),

study.time = c(40,43,18,10,25,33,27,17,30,47),


iq = c(118,128,110,114,138,120,106,124,132,130))

ml1 = lm(exam.score ~ study.time, data = studytimedata)

ml2 = lm(exam.score ~ iq, data = studytimedata)

ml3 = lm(exam.score ~ study.time + iq, data = studytimedata)


# We can inspect the models using "summary". But what do we conclude from this inspection?

# We have the same problem for logistic regression, demonstrated here for the problem of

# predicting the dative alternation -- what influences whether we will see "John gave Mary the book"

# or "John gave the book to Mary"?



md1 = lrm(RealizationOfRecipient ~ AnimacyOfRec, data = dative)

md2 = lrm(RealizationOfRecipient ~ DefinOfRec, data = dative)

md3 = lrm(RealizationOfRecipient ~ AnimacyOfRec + DefinOfRec, data = dative)

md4 = lrm(RealizationOfRecipient ~ SemanticClass, data = dative)

md5 = lrm(RealizationOfRecipient ~ SemanticClass + AnimacyOfRec + DefinOfRec, data = dative)

md6 = lrm(RealizationOfRecipient ~ LengthOfTheme, data = dative)

# Model comparison: linear regression, nested models. Use F-test (ANOVA)

anova(ml1, ml3)

# Model comparison: logistic regression, nested models. Here, we can use likelihood ratio.

# lrm() returns the model deviance in the "deviance" entry.

# This is a vector with two members: deviance for the model with only the intercept,

# and deviance for the models with all its parameters. We are interested in the latter.

dev.1 = md1$deviance[2]

dev.5 = md5$deviance[2]

# log likelihood ratio: deviance of simpler model - deviance of more complex model.

# Approximately chi-square distributed.

# Null hypothesis: no difference between the two models.

# lrtest() is a function of the library rlm that does likelihood ratio tests.

lrtest(md1, md3)

lrtest(md1, md5)

# In non-nested cases, we can use the Akaike Information Criterion, AIC.


# >>> Lower AIC is better <<<


# Here it is for linear regression




# And here it is for logistic regression