Courses‎ > ‎R worksheets‎ > ‎

R worksheet: plotting

For this worksheet we will use an extended version of the inaugural dataset in which I have added a column for "we". It is available here (with whitespace-separated columns, in case you want to use read.csv() to read it in rather than the GUI).

Below, I assume that you have read it in as a data frame called "inaugural". (Rename it from "inauguralX" to "inaugural".)

If you want a plot for official purposes like a research paper, you need to have proper labels on the x and y axes, and ideally also a label for the whole graph. Here is an example, plotting counts of "freedom" versus "duties" with the year on the x axis, and with x and y labels:


plot(inaugural$year, inaugural$freedom, type="l", col="red", xlab = "year in which speech was given", ylab = "word count", main = "Word counts in inaugural speeches")
lines(inaugural$year, inaugural$duties, type="l", col="blue")

Also, you need a legend that states what each line represents:

legend(1800, 25, legend = c("freedom", "duties"), col = c("red", "blue"), pch=15, cex = 0.8)

This command places a legend at x value 1800 and y value 25, showing the words "freedom" and "duties"  with a box (point character "pch" 15) next to each of them, The boxes are to be red and blue, respectively. The text size, "cex", is reduced to 80% of normal.

You can look up graphing parameters by typing "?par". The point characters are explained under the entry of "points", so look them up using "?points"

The graphs  as we had them up to now are not friendly to colorblind people, who may not see the difference between the red and the blue lines. Here are some options:
  • Use different line types. lty = "dashed" gives you a dashed line. Other options include "dotted" and "dotdash". Look up "?par" to see all options.
  • Use type="b" to get both points and lines, and use a different point character for each line by setting the parameter pch.
    • Using "?points" you can see a number of pretty point types available: pch=0 is an unfilled box, pch=1 an unfilled circle, and so on.
    • You can also set pch="F" to use the letter "F" as a point. For our example, here is how to use "F" for freedom and "D" for duties as the points:
      plot(inaugural$year, inaugural$freedom, type="b", col="red", xlab = "year in which speech was given", ylab = "word count", main = "Word counts in inaugural speeches", pch = "F")
      lines(inaugural$year, inaugural$duties, type="b", col="blue", pch="D")

The R command text() plots texts at the given x and y coordinates. This can sometimes be a fun visualization option.
To plot how many counts of "freedom" versus "duties" each (recent) president has, we can use the count of freedom as the x axis and the count of duties as the y axis, and plot each president name at the matching coordinates. We only use speeches more recent than 1960, otherwise the plot gets too busy.

inaug.new = inaugural[inaugural$year > 1960,]
plot(inaug.new$freedom, inaug.new$duties, type = "n", xlab = "freedom", ylab = "duties")
text(inaug.new$freedom, inaug.new$duties, labels = inaug.new$president)

This makes a data frame inaug.new of recent speeches. It then plots nothing -- that is what type = "n" does. The plot command just sets up the canvas to have the right size (so it fits all counts of freedom on the x axis, and all counts of duties on the y axis), and labels the axes. We need to do that because "text" does not start a new canvas, it superimposes on the previous one.
The third command, "text", then prints each president's name at the matching coordinates. For example, Obama is at coordinates (3,2) because his speech contains "freedom" three times and "duties" twice.


Now over to you:

  1. Plot the counts of "I" and the counts of "we" across the years: The x axis should show the years, and the y axis counts. Label your axes, and label the whole graph.
  2. Add a legend.
  3. If you haven't already done so, experiment with different ways to make the plot friendly to colorblind viewers (and to people who view it in grayscale printout).
  4. Plot the names of all recent presidents (choose how recent -- it doesn't have to be 1960, maybe you can fit more or you need to show less) using their count of "I" as the x axis and their count of "we" as the y axis.
  5. Add two new columns to the data frame that show the relative frequency of "I" and of "we" in each speech. Redo the plotting (both (3) and (4)) with relative frequencies.

Comments