Worksheet: Reading and writing files in Python

The aim of this worksheet is to show you how to read and write files in Python.

Getting a sample file

As an example of a file to read, we will use a relatively small, unannotated corpus from Project Gutenberg, http://www.gutenberg.org/wiki/Main_Page

Project Gutenberg is an online collection of texts whose copyright has expired. It contains texts in many languages.

We will work with the First Project Gutenberg Collection of Edgar Allan Poe at http://www.gutenberg.org/etext/1062.

Please download the file, making sure to choose a Plain Text version. The file will be called pg1062.txt. Put it on your Desktop.

Reading a file

You should now have the file pg1062.txt on your Desktop. Accessing this file in Python is easy -- the most complicated part is going to be figuring out how to tell Python about the file's location on your system. On my Unix system, the following lines of Python will print the whole file contents to your screen, line by line:

f = open(/Users/katrinerk/Desktop/pg1062.txt")

for line in f:

    print( line )

f.close()

If you have a Mac, you will probably just need to substitute your user name for "katrinerk". If you are running linux, you will have to put "/home/YOURUSERNAME/Desktop/pg1062.txt" instead. If you have a Windows system, you probably need to write

f = open(r"C:\Desktop\pg1062.txt")

Note the "r" before the opening double quote: It tells Python not to interpret the "\" as the beginning of some special code.

Any time you read a file, the lines of Python code you write for that will be the almost the same:

# “open” takes as its argument a file name, which may include directory information.

# Do not forget to start with the “open” command!

# You need the file object that it returns to read the file.

f = open("/Users/katrinerk/Desktop/pg1062.txt")

# “open” returns a file object. This, then, can be used to access the file contents.

>>> type(f)

<class '_io.TextIOWrapper'>

# We can iterate through the lines of a file as if it were a list.

# Note that "line" is just a variable name. I could have named it anything else.

# “line” is a variable that will be filled by each line of the file in turn.

for line in f:

    print( line )

# After reading the file, you close the file object.

# This is not strictly necessary if you are only reading the file

# -- if you are writing, it is necessary -- but it is good practice.

f.close()

Think of a file as something like a box: You have to open it first, then you can to access its contents, and then you close it.

Three ways of reading from a file

You have already seen this:

# read a file line by line

f = open("/Users/katrinerk/Desktop/pg1062.txt")

for line in f:

    print( line )

f.close

You can also read the whole file into a single string, using read()

# read the whole file in one go, as a string

f = open("/Users/katrinerk/Desktop/pg1062.txt")

myfilecontents = f.read()

f.close()

Or you can read one single line only, using readline()

# read the next line of the file

f = open(/Users/katrinerk/Desktop/pg1062.txt")

line1 = f.readline() # reads the first line

# at this point the “file reading pointer”

# points to the second line

line2 = f.readline() # reads the second line

line3 = f.readline() # reads the third line

f.close()

Counting lines, and variables that contain True/False

Try it for yourself: Can you count the lines in the file pg1062.txt ? (Solution below.)

Here is the solution:

num_lines = 0

f = open("~/Desktop/pg1062.txt")

for line in f:

    num_lines = num_lines + 1

f.close()

print( num_lines )

The file contains multiple stories by E.A. Poe. Suppose we want to count only the lines in the file that pertain to “The Raven”.

What we would like to do is this:

Read the file, line by line.

# num_lines will count lines.

num_lines = 0           

# inside_raven will signal if we are inside the poem "The Raven"

inside_raven = False                             

f = open("/Users/katrinerk/Desktop/1epoe10.txt")   # Contains several stories by E.A. Poe

for line in f:

    if not(inside_raven) and line.startswith(“The Raven”):

            # “The Raven” starts here. Start counting.

            inside_raven = True

            num_lines = num_lines + 1

    elif inside_raven and line.startswith("The Masque of the Red Death"):

            # “The Raven” has ended. Remember this in the variable inside_raven

            inside_raven = False

    elif inside_raven:

            # We are inside “The Raven”. Go on counting.

            num_lines = num_lines + 1

f.close()

print( num_lines )

Writing files

Here is how you write to a file in Python.

# Again, we make a file object with “open”. Only this time we give two arguments.

# The second one is “w” for “write”.

# So we have to decide at the time when we open a file whether we want to read it

# or write to it.

f = open("myoutfile", "w")

# We use the "print" command to write to a file, but with the additional 

# parameter file=f.

# Note that f is the variable in which we put the file object.

# If I had named the file object "bob", it would have been "print(..., file = bob)"

print("Hello", file=f)

print("Writing another line to the file.", file = f)

print("Here’s a number:", 5, file= f)

# And close the file.

f.close()

Here the “close()” is essential! Your operating system (and, by extension, Python) writes data to files in larger chunks. That is, it may wait with actually writing until you have issued many “print” commands. Only when you close the file does it make sure that all remaining data is written. This is a source for nasty errors if you write a file, don't close it, and subsequently try to read its contents – they may just not be there yet unless you have closed the file.

"Pickles"

Sometimes you would like to conserve a Python data structure in a file, such that you can read it into a Python variable without having to convert to and from strings. This is also called "pickling". Here's how you do it:

# pickling functionality is not loaded by default; you have to import it

import pickle

mylist = [ 1,2,3,4,5]

f = open("file_mylist.data", "w")

pickle.dump(mylist, f)

f.close()

f = open("file_mylist.data")

newlist = pickle.load(f)

f.close()