Worksheet: First steps in Python
Start Python either by bringing up the GUI, or by typing python in your Unix terminal. Alternatively, you can invoke idle, an integrated development environment for Python. This gives you a Python shell. Here you can input Python commands, and see the result directly on the screen.
Using Python as a calculator
The shell evaluates expressions. (The text after ">>>" is what you type. Text that appears without leading ">>>" is the answer you receive.)
>>> 2 + 3
5
>>> 43/2
21.5
>>> "Hello world"
'Hello world'
>>> "Hello" + " world"
'Hello world'
Python knows about basic arithmetic expressions. It also uses "+" not only to add numbers, but also to paste strings together.
Some other mathematical functions are not loaded automatically, you have to request them:
>>> import math
>>> math.sqrt(3)
1.7320508075688772
We import the math package, then we have access to the sqrt function. Since it comes from the math package, we need to write math.sqrt(). (That way, if you happen to also import another package in which sqrt() is a function that prints 20 exclamation marks on the screen, you can distinguish the two.)
Variables
On your hard drive, you store data in files. Each file has a name by which you can retrieve the data. In programming languages, you also often need to store data, for example the result of some calculation that you intend to use in another calculation later. And again, you need to give names to the stored data, so you can retrieve it later. In a programming language, you make a variable in which you store data, and you give it a name by which you can retrieve the data.
>>> myvar = 2+3
>>> myvar
5
>>> myvar2 = myvar * 3
>>> myvar2
15
>>> myvar = 0.3 * math.log(0.3) + 0.2 * math.log(0.2)
>>> myvar - myvar2
-15.683079423784601
>>> new_var = myvar - myvar2
>>> new_var
-15.683079423784601
We store the value 5 (the value of the expression "2+3") in myvar by typing "myvar = 2+3". We can then retrieve the stored data by its name: If we type the name of the variable, Python supplies the stored value. We can also use the variable as a stand-in for its value: "myvar * 3" now has the same value as "5 * 3".
When you update a file, you are storing a new value under its existing name. You can do the same with a variable. In the line "myvar = 0.3 * ..." we are storing a new value in the same variable that we had before, myvar.
You choose the names for the variables you use. What can you choose the name of a variable to be?
Variable names can contain letters, numbers, underscore
They must not start with a number.
They must not be identical to one of the "reserved words" that Python has already defined.
Warning: There are some names that are already defined in Python that you can still use as a variable name, for example "sum" (which is a function that calculates the sum of multiple values). But if you re-define that as a variable name, you can't use it in its original sense anymore. That is, if you define "sum = 2+3", then you cannot access the function that calculates the sum of multiple values anymore: you have basically overwritten the previous value of the variable "sum". This is a very popular bug.
You will often need to update a variable in this way:
>>> counter = 1
>>> counter = counter + 1
>>> counter
2
This increments the value of the variable counter.
Try it for yourself: Make up a variable name of your choosing, and store in it the value of the expression 2**4. Inspect your variable to see what it contains. Then reduce its value by 1. Now make up a second variable, and set it to have the same value as the first one.
Data types: strings and numbers
We have encountered data of (at least) three data types so far:
Integers
Floating point numbers
Strings
Different data types come with different operations. Integers and floating point numbers can be added, subtracted, divided, ... Strings can be concatenated, you can count letters in them, and so on:
>>> 123 - 20
103
>>> "hello" - "world"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'str'
>>> len("hello")
5
>>> len(123)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()
By the way: When something goes wrong, Python gives you an error message. Please read this message! It will help you figure out what went wrong. Especially in the beginning, you will see a lot of error messages. But don't worry, you will see fewer of them very soon.
But there are builtin functions in Python that convert data from one type to another.
>>> int(3.14)
3
>>> float(3)
3.0
>>> float(42)
42.0
>>> str(42)
'42'
>>> float("2435.1")
2435.1
If you divide two integers, the result is a float (no matter whether there is a rest to the division or not):
>>> 5/2
2.5
>>> 2/1
2.0
You can also ask Python what the type of a piece of data is. The type of a variable is the type of its contents.
>>> type(1)
<class 'int'>
>>> type("hello world")
<class 'str'>
>>> type(type)
<class 'type'>
>>> xyz = "hello world"
>>> type(xyz)
<class 'str'>
The builtin function "print"
The builtin function "print" prints its argument(s) to the screen. When given a mathematical expression, it prints the result. When given a variable, it prints its contents.
>>> print( "hello world" )
hello world
>>> print( 2+3 )
5
>>> myvar = 0.3 * math.log(0.3) + 0.2 * math.log(0.2)
>>> print(myvar)
-0.683079423784601
If you give more than one argument to "print", it prints them with one space inbetween. (You can instruct print() to put something other than a single space between them; we will get back to this.)
>>> print("hello", "world")
hello world
>>> myvar = 0.3 * math.log(0.3) + 0.2 * math.log(0.2)
>>> print( "The answer is", myvar )
The answer is -0.683079423785
>>> print 2,3,4,5
2 3 4 5
Strings
Strings in Python are arbitrary sequences of characters, enclosed in either "..." or '...'. (Most of the time, it doesn't matter if you use single or double quotes. Just make sure you use the same type of quotes at the beginning and the end.)
To make a string that runs over more than one line, use """...""" That is, three double quotes, then your string, then three double quotes. (You can also use three single quotes on either side. But you have to use the same kinds of quotes.) For example, here is a string that holds the first two paragraphs of the Wikipedia entry on Python.
>>> wikipedia_on_python = """Python is a general-purpose, high-level programming language[6] whose design philosophy emphasizes code readability.[7] Python claims to combine "remarkable power with very clear syntax",[8] and its standard library is large and comprehensive. Its use of indentation for block delimiters is unique among popular programming languages.
...
... Python supports multiple programming paradigms, primarily but not limited to object-oriented, imperative and, to a lesser extent, functional programming styles. It features a fully dynamic type system and automatic memory management, similar to that of Scheme, Ruby, Perl, and Tcl. Like other dynamic languages, Python is often used as a scripting language, but is also used in a wide range of non-scripting contexts. Using third-party tools, Python code can be packaged into standalone executable programs. Python interpreters are available for many operating systems."""
>>> len(wikipedia_on_python)
905
As we want to process natural language text, strings and string manipulations are going to be very important. Luckily for us, Python has a lot of built-in functionality for doing things with strings.
Some built-in string functions
A note on notation: Some functions in Python are written like functions in mathematics: function name, then brackets, then arguments, for example
len("hello")
Other functions are written in a different format:
<argument>.<function>(<more arguments>)
for example
"hello".capitalize()
For now, just know that there are these two formats, and know that you need to remember which function is written in which fashion.
Here are some useful string functions. Try them out to see what they do.
"hippopotamus".count("p")
"KNIGHT".lower()
"new".upper()
"new".capitalize()
" a lot of spaces, then some text ".strip()
"armadillo".replace("mad", "...")
Also, as we have mentioned above, you can use "+" to concatenate strings.
Here is a string function that we will use a lot. It splits text on whitespace, returning something called a list, which we will discuss more later. We apply it to the first sentence of the Wikipedia page on Monty Python. As you can see, the result of split() is almost a separation of the sentence into words -- what does it get wrong, and why?
>>> "Monty Python (sometimes known as The Pythons) was a British surreal comedy group who created their influential Monty Python's Flying Circus, a British television comedy sketch show that first aired on the BBC on 5 October 1969.".split()
['Monty', 'Python', '(sometimes', 'known', 'as', 'The', 'Pythons)', 'was', 'a', 'British', 'surreal', 'comedy', 'group', 'who', 'created', 'their', 'influential', 'Monty', "Python's", 'Flying', 'Circus,', 'a', 'British', 'television', 'comedy', 'sketch', 'show', 'that', 'first', 'aired', 'on', 'the', 'BBC', 'on', '5', 'October', '1969.']
Some string functions return either True or False. (This is another datatype, called a Boolean.) For example, "in" tests for substrings.
>>> "eros" in "rhinoceros"
True
>>> "nose" in "rhinoceros"
False
Note that you can make the word "nose" from the letters of "rhinoceros", but that is not what "in" tests.
>>> "truism".endswith("ism")
True
>>> "inconsequential".startswith("pro")
False
You can find Python documentation for Python 3.x at https://docs.python.org/3/. The subpage that you will probably use most often is the Python Standard Library at https://docs.python.org/3/library/index.html, which documents builtin functions and standard available packages. String functions are described at https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str
Try it for yourself: Use the string functions above, and the documentation on the Standard Library page, to answer the following questions. Some of the functions you need may only be on the Standard Library page.
Split the following sequence at each occurence of an "!":
"ab!cde!!f!ghij!!!"
How many letters are in the word "jabberwocky"?
Concatenate the following strings into a single sequence, making sure to also include a whitespace inbetween them: "hello", "world"
Test whether all characters in the following strings are digits: "123456", "123.456"
How many vowels are there in the word "onomatopoeia"? (Warning: there is no "beautiful" solution for this. Go the slow route.)
Accessing parts of a string
You can use indexing to access individual letters or substrings of a string.
>>> "rhinoceros"[3]
'n'
>>> "rhinoceros"[0]
'r'
The 4th character in "rhinoceros" is an "n", and the first is "r". Note that Python indices start at 0, so "rhinoceros"[0] gets you the first character. Also note that indices use straight brackets, not round.
The following carves out a slice of a string:
>>> "rhinoceros"[2:5]
'ino'
The slice starts at the third letter (index 2), and ends before the 6th letter.
What happens if you try to access a single character beyond the end of the string? What happens if you do the same with a slice? Try out
"art"[3]
"art"[1:4]
Do you think the following are valid indices? Try them out.
"art"[-1]
"art"[-3:-1]
And how about the following slices?
"polyphonic"[2:]
"chimera"[:4]
Try it for yourself: What other words can you form out of the letters in "rhinoceros"? Find at least 3 words, and construct them in Python using indices to pick letters and using "+" to concatenate them. For example,
>>> r = "rhinoceros"
>>> r[3:5] + r[-1] + r[-4]
'nose'