Worksheet: Sorting in Python
Sorting lists
The method sort() sorts a list in place:
>>> mylist = ["armadillo", "zebra", "guppy", "cat"]
>>> mylist.sort()
>>> mylist
['armadillo', 'cat', 'guppy', 'zebra']
The function sorted(), on the other hand, leaves the original list unchanged, but returns a sorted list.
>>> mylist = ["armadillo", "zebra", "guppy", "cat"]
>>> sorted(mylist)
['armadillo', 'cat', 'guppy', 'zebra']
>>> mylist
['armadillo', 'zebra', 'guppy', 'cat']
The parameter reverse=True inverts the sorting order.
>>> sorted(mylist, reverse = True)
['zebra', 'guppy', 'cat', 'armadillo']
>>>
Sorting Python dictionaries by their values
Suppose we have a dictionary of word counts, for example
counts = { "the" : 3, "cat":2, "and":1, "chased":1, "dog":1}
If we try to sort it using the same formulation as for lists above, we get the dictionary keys in alphabetical order:
>>> sorted(counts)
['and', 'cat', 'chased', 'dog', 'the']
If we sort key/value pairs, we still get them in alphabetical order of the keys:
>>> sorted(counts.items())
[('and', 1), ('cat', 2), ('chased', 1), ('dog', 1), ('the', 3)]
We need to change the way that sorted() looks at the items that it is sorting. In particular, we want to map each key/value pair to the value, as this is what sorted() should consider for sorting. To do this, we define a function that maps pairs to the second element, and hand it on to sorted() as the sorting key function:
def second_of_pair(tuple):
return tuple[1]
>>> second_of_pair( ("a", 1) )
1
>>> second_of_pair( ("the", 2) )
2
>>> sorted(counts.items(), key = second_of_pair)
[('and', 1), ('chased', 1), ('dog', 1), ('cat', 2), ('the', 3)]
To get the highest counts first, we do
>>> sorted(counts.items(), key = second_of_pair, reverse = True)
[('the', 3), ('cat', 2), ('and', 1), ('chased', 1), ('dog', 1)]
We only need the function second_of_pair() in one place: as a sorting key for sorted(). For cases like these, Python has "nameless functions". We can write
sorted(counts.items(), key = lambda pair: pair[1], reverse = True)
to get the same result as the previous call to sorted(). Note that while the separate function definition of second_of_pair() had to use the "return" statement, the lambda formulation cannot use it.