We save this new data frame to
The inaugural data frame does not contain
the party affiliation of the presidents. If we had that, we could do a
number of additional analyses, for speech length or the degree to which
presidents use terms like "freedom" or "democracy". This is a good
opportunity to demonstrate how to merge data frames in R.
For convenience, we will use a smaller amount of data. Here is a small dataset of "recent inaugural addresses":
It contains 13 entries. We would like to make a new data frame that maps inaugural address years to the affiliation of the president. But how do you make a new data frame by hand (rather than reading one in from file)?
Here is how: You use the function data.frame( ). Inside the parentheses, you specify each column of the data frame. For example, to make a data frame with one column called "A" and one called "B", you could type:
So the column A contains the sequence of values 1,2,3 (in that order), and the column B contains the sequence 20, 30, 40.
Now we are ready for the data frame mapping inaugural address years to the affiliation of each president:
"year" column in this data frame is a sequence of numbers that are the
same as in inaug.new, and the "party" column is a sequence of either "D"
Now we want to merge the two data frames by their common column "year". And that is exactly what we say:
This produces a new data frame, and we can give it a name:
What merge( ) has done is to link every row in
could have saved ourselves some work if we had defined party
affiliations for presidents, not years, as some presidents appear
multiple times in our data frame. (And we even have two different
presidents with the same name and same affiliation in our data.) So we
could also say:
It actually does the right thing: It links each row in
We can just use a constraint:
(Although if we were really interested in this, we should use relative frequencies, not absolute frequencies. )
Or, if we want to do a lot of processing on the Republicans versus the Democrats separately, we can give names to the two sub-data frames:
Or better with relative frequencies:
And there is a third option: We can use the R function
This says: compute
Here is the mean length of speeches separately by party:
Other options; You can also define a function yourself that you want to give as a 3rd parameter to