Sorting in Python

Introduction

Python has sorting methods and functions built in. We’re going to cover these, how they work, and how you can use them to sort things in the order you desire.

sorted(iterable, key=func, reverse=False)

sorted is a built-in function that can sort any iterable. It always returns a list.

By default, sorted will order items in the iterable from smallest to greatest values.

>>> sorted((1,3,2))
[1, 2, 3]

It also uses stable sort. This means that if two items compare the same, then they will appear in the result in the same order they appeared in the iterable.

>>> sorted((0, 0.0, 1, -1))
[-1, 0, 0.0, 1]

Note that sorted will raise a TypeError if two of the items won’t compare with the < symbol. In general, make sure all of the elements of the sequence are the same-ish type, and that the type is ordered. IE, sorting complex numbers won’t work because there is no ordering for them.

>>> sorted((1, 'a'))
... TypeError...
>>> 1 < 'a'
... TypeError...

There are two named parameters: key and reverse.

reverse will sort the list in reverse order. Note that it is still stable sort, and it won’t reverse the order of items that compare the same.

>>> sorted((0, 0.0, 1, -1), reverse=True)
[1, 0, 0.0, -1]

You can also specify the key parameter, a function.

The key function will be applied to each element, and the result of the key function on the element will be the value that is used for sorting.

Let’s say we want to sort a list of strings based on their integer value.

>>> sorted(("100", "12", "21", "1"), key=int)
['1', '12', '21', '100']

Notice that I didn’t have to write a new function or lamnda. int worked perfectly fine!

Perl’s System

In perl 5, there was no key parameter, and so if you wanted to sort by a derived value, you had to use the Schwartzian Transform.

  1. Map the items into pairs of (sort key, value). The sort key is derived from the value.

  2. Sort the pairs.

  3. Change the pairs back into the original value.

Here’s what a Schwartzian Transform might look like in Python:

map(lambda a,b: b, sorted(map(lambda b: (b[1], b))))

The limitation of this system is that it isn’t a true stable sort. Note that when comparing the pairs, the first value is compared, and when equal, the second value is then compared.

Using a key function and comparing only the results of that function allows for a true stable sort.

Pointlessness of cmp

Python 2.7 had a cmp parameter, which is a function with two parameters, returning -1, 0, 1.

cmp has to be called many times, once for each compare between two elements of the sequence. This greatly slows down the sort function, particularly when cmp was simply comparing integers or floats.

key parameter

min and max have a key parameter which takes a function like sorted does.

  • min(iterable or *args, key=func), max(iterable or *args, key=func): In determining whether a value is greater or less than another to determine which is the greatest or least, this will compare the values themselves. If you want to compare on some other value, use the key parameter.

Let’s not forget filter and map, which have a parameter that behaves, somewhat, like the key parameter.

  • filter(func, iterable): Filter uses the key parameter to determine whether to keep the item or not. If the key function returns false, then the item is not kept. Note that the original values are returned, not the value returned from the key function.

  • map(func, iterable, …): map also uses the key parameter, but it returns the result of it rather than the original value. Map() is sort of a special case.

list.sort method

If you have a list, you can do in-place sorting with the sort method. This returns nothing, to remind you that the sort is done in-place. The parameters are exactly the same,

Example: Sorting by casefold()

While we could have used a lambda (lambda name: name.casefold()) we can also just use the method itself as the key: str.casefold.

Typically, we would prefer doing things this way as it is even more explicit and less verbose than a lambda.

>>> sorted(("jack", "Jim"))
['Jim', 'jack']
>>> sorted(("jack", "Jim"), key=str.casefold)
['jack', 'Jim']

Example: Sorting by name, age, height, score

Suppose we have a tuple of tuples. Each tuple represents a person and has the shape:

(name, age, height, score)

Here is some sample data, but you could make up your own:

people = (
  ("Jim", 21, 174.0, 2100),
  ("jim", 22, 174.0, 2800),
  ("JACK", 21, 162.0, 900),
  ("jack", 25, 185.0, 900),
)

name is their name. age is their age. height is how tall they are in cm. And score is their current score. (This could be a test or it could be a video game score.)

Note that Python compares tuples by comparing the first values, then the second, etc… So the natural sorted order is by name. If two people have the same name, then their age is used. If they have the same name and age, then they are sorted by height. Score is only used if everything else is the same.

Note that if we were to sort based on the name, we would get “ASCII-betical” order, which puts capitals before lowercase. This is probably undesirable. We should use casefold() so that we can compare strings regardless of their case.

Stable Sorting allows Multiple Sorting

In Stable Sorting, if two values compare equally, then they will appear in the same order they showed up in. Python does stable sorting.

In unstable sorting, their order may be messed up.

Stable sorting is important because it allows you to sort by two different fields. For instance, if I want to sort by name and then score, I would first sort by score. Then I would sort by name. If two people share the same name, then the ordering we did with score will be preserved.

sorted_by_score = sorted(people,
    key=lambda name, age, height, score: score)
sorted_by_name_score = sorted(sorted_by_score,
    key=lambda name, age, height, score: name.casefold())

Note that we could also sort only a single time by the combination of those two fields:

sorted_by_name_score = sorted(
    people,
    key=lambda name, age, height, score:
        (name.casefold(), score)))

But let’s say we wanted to sort by name A-Z and by high score, not low score. In which case, we need to do two sorts.

sorted_by_score = sorted(people,
    key=lambda name, age, height, score: score,
    reversed=True)
sorted_by_name_score = sorted(sorted_by_score,
    key=lambda name, age, height, score: name.casefold())

Alternatively, we could’ve used -score since score is a number and the minus. This is not always an option for fields. For instance, what is -name?