Worksheet: Reading and writing files in Python
The aim of this worksheet is to show you how to read and write files in Python.
Getting a sample file
As an example of a file to read, we will use a relatively small, unannotated corpus from Project Gutenberg, http://www.gutenberg.org/wiki/Main_Page
Project Gutenberg is an online collection of texts whose copyright has expired. It contains texts in many languages.
We will work with the First Project Gutenberg Collection of Edgar Allan Poe at http://www.gutenberg.org/etext/1062.
Please download the file, making sure to choose a Plain Text version. The file will be called pg1062.txt. Put it on your Desktop.
Reading a file
You should now have the file pg1062.txt on your Desktop. Accessing this file in Python is easy -- the most complicated part is going to be figuring out how to tell Python about the file's location on your system. On my Unix system, the following lines of Python will print the whole file contents to your screen, line by line:
If you have a Mac, you will probably just need to substitute your user name for "katrinerk". If you are running linux, you will have to put "/home/YOURUSERNAME/Desktop/pg1062.txt" instead. If you have a Windows system, you probably need to write
Note the "r" before the opening double quote: It tells Python not to interpret the "\" as the beginning of some special code.
Any time you read a file, the lines of Python code you write for that will be the almost the same:
Think of a file as something like a box: You have to open it first, then you can to access its contents, and then you close it.
Three ways of reading from a file
You have already seen this:
You can also read the whole file into a single string, using read()
Or you can read one single line only, using readline()
Counting lines, and variables that contain True/False
Try it for yourself: Can you count the lines in the file pg1062.txt ? (Solution below.)
Here is the solution:
The file contains multiple stories by E.A. Poe. Suppose we want to count only the lines in the file that pertain to “The Raven”.
What we would like to do is this:
Read the file, line by line.
If “The Raven” has not started yet: do nothing.
When we have detected that “The Raven” has started, count lines.
When we detect that “The Raven” has ended, stop counting.
To that end, we use a variable to which we explicitly assign a truth value, True or False.
Here is how you write to a file in Python.
Here the “close()” is essential! Your operating system (and, by extension, Python) writes data to files in larger chunks. That is, it may wait with actually writing until you have issued many “print” commands. Only when you close the file does it make sure that all remaining data is written. This is a source for nasty errors if you write a file, don't close it, and subsequently try to read its contents – they may just not be there yet unless you have closed the file.
Sometimes you would like to conserve a Python data structure in a file, such that you can read it into a Python variable without having to convert to and from strings. This is also called "pickling". Here's how you do it: