In this worksheet, we use the data frame inaugural (columns separated by commas). Measuring central tendencyMean:mean(inaugural$length) # which is the same as: sum(inaugural$length) / length(inaugural$length) Median: median(inaugural$length) # which is the same as: quantile(inaugural$length, probs = 0.5) > quantile(inaugural$length) 0% 25% 50% 75% 100% 147.00 1544.00 2380.00 3172.25 9165.00 > quantile(inaugural$length, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 147.0 1223.5 1478.0 1770.0 1935.0 2380.0 2585.0 2873.5 3693.0 4406.5 100% 9165.0 [ This worksheet previously also explained how to compute the mode, the value that appears most often in a data set. However, this is nontrivial in R, and gets us into many details of contingency tables. If you really need the mode, you find its computation in R code snippets. ] SpreadRange:range(inaugural$length) mean(abs(inaugural$length  mean(inaugural$length))) Variance: var(inaugural$length) Standard deviation: sd(inaugural$length) # which is the same as: sqrt(var(inaugural$length)) Visualizationhist() and the command truehist() from the MASS package show histograms. Here are the histograms: To use "truehist" you may first have to install the package MASS. You can do this in RStudio in the menu Tools, entry Install Packages.par(mfrow = c(1,2)) hist(inaugural$length) library(MASS) truehist(inaugural$length) The first line sets the canvas up for plotting two things at once, next to one another: "par" is for setting general parameters for the system, "mfrow" for whatever reason is for setting up the plotting canvas with a given number of rows and columns, and c(1,2) says that we want one row with two columns, so two plots next to one another. par(mfrow = c(2,3)) would set up the plotting canvas for 6 plots: 2 rows of 3 columns each. And par(mfrow = c(1,1)) resets the canvas to just show one plot. Instead of plotting the overall distribution of the data as a histogram, which bins the data, you can also do a density plot, which does not bin the data and instead estimates a density curve at each point; plot(density(inaugural$length)) boxplot() shows the first and third quartile as a box with the median as a line through the box. The whiskers extend 1.5 times the length of the box by default (though you can change that), and outliers further than that are shown as dots. boxplot(inaugural$length) Over to you
Problems using the dative datasetThe dative dataset is available in the package languageR. Once you have installed the package (again, using "Install packages" in the menu item "Tools" in RStudio), you make it available using library(languageR) head(dative) This dataset analyzes corpus occurrences of ditransitive verbs: In English you can say both "John gave Mary the book" and "John gave the book to Mary". Are these two truly interchangeable, or are there cases when people prefer one form over the other? The column RealizationOfRecipient is the outcome we are interested in: "NP" stands for the form "John gave Mary the book", and "PP" stands for "John gave the book to Mary". Using the dative dataset:

Courses > R worksheets >