Sorting in Python¶
Introduction¶
Python has sorting methods and functions built in. We’re going to cover these, how they work, and how you can use them to sort things in the order you desire.
sorted(iterable, key=func, reverse=False)¶
sorted
is a built-in function that can sort any iterable. It always
returns a list.
By default, sorted
will order items in the iterable from smallest to
greatest values.
>>> sorted((1,3,2))
[1, 2, 3]
It also uses stable sort. This means that if two items compare the same, then they will appear in the result in the same order they appeared in the iterable.
>>> sorted((0, 0.0, 1, -1))
[-1, 0, 0.0, 1]
Note that sorted
will raise a TypeError
if two of the items won’t
compare with the <
symbol. In general, make sure all of the elements of
the sequence are the same-ish type, and that the type is ordered. IE, sorting
complex numbers won’t work because there is no ordering for them.
>>> sorted((1, 'a'))
... TypeError...
>>> 1 < 'a'
... TypeError...
There are two named parameters: key
and reverse
.
reverse
will sort the list in reverse order. Note that it is still stable
sort, and it won’t reverse the order of items that compare the same.
>>> sorted((0, 0.0, 1, -1), reverse=True)
[1, 0, 0.0, -1]
You can also specify the key
parameter, a function.
The key function will be applied to each element, and the result of the key function on the element will be the value that is used for sorting.
Let’s say we want to sort a list of strings based on their integer value.
>>> sorted(("100", "12", "21", "1"), key=int)
['1', '12', '21', '100']
Notice that I didn’t have to write a new function or lamnda. int
worked
perfectly fine!
Perl’s System¶
In perl 5, there was no key parameter, and so if you wanted to sort by a derived value, you had to use the Schwartzian Transform.
Map the items into pairs of
(sort key, value)
. The sort key is derived from the value.Sort the pairs.
Change the pairs back into the original value.
Here’s what a Schwartzian Transform might look like in Python:
map(lambda a,b: b, sorted(map(lambda b: (b[1], b))))
The limitation of this system is that it isn’t a true stable sort. Note that when comparing the pairs, the first value is compared, and when equal, the second value is then compared.
Using a key function and comparing only the results of that function allows for a true stable sort.
Pointlessness of cmp¶
Python 2.7 had a cmp
parameter, which is a function with two parameters,
returning -1, 0, 1.
cmp
has to be called many times, once for each compare between two
elements of the sequence. This greatly slows down the sort function,
particularly when cmp
was simply comparing integers or floats.
key parameter¶
min and max have a key
parameter which takes a function like sorted does.
min(iterable or *args, key=func)
,max(iterable or *args, key=func)
: In determining whether a value is greater or less than another to determine which is the greatest or least, this will compare the values themselves. If you want to compare on some other value, use the key parameter.
Let’s not forget filter and map, which have a parameter that behaves, somewhat, like the key parameter.
filter(func, iterable): Filter uses the key parameter to determine whether to keep the item or not. If the key function returns false, then the item is not kept. Note that the original values are returned, not the value returned from the key function.
map(func, iterable, …): map also uses the key parameter, but it returns the result of it rather than the original value. Map() is sort of a special case.
list.sort method¶
If you have a list, you can do in-place sorting with the sort method. This returns nothing, to remind you that the sort is done in-place. The parameters are exactly the same,
Example: Sorting by casefold()¶
While we could have used a lambda (lambda name: name.casefold()
) we can
also just use the method itself as the key: str.casefold
.
Typically, we would prefer doing things this way as it is even more explicit and less verbose than a lambda.
>>> sorted(("jack", "Jim"))
['Jim', 'jack']
>>> sorted(("jack", "Jim"), key=str.casefold)
['jack', 'Jim']
Example: Sorting by name, age, height, score¶
Suppose we have a tuple of tuples. Each tuple represents a person and has the shape:
(name, age, height, score)
Here is some sample data, but you could make up your own:
people = (
("Jim", 21, 174.0, 2100),
("jim", 22, 174.0, 2800),
("JACK", 21, 162.0, 900),
("jack", 25, 185.0, 900),
)
name
is their name. age
is their age. height
is how tall they are
in cm. And score
is their current score. (This could be a test or it could
be a video game score.)
Note that Python compares tuples by comparing the first values, then the second, etc… So the natural sorted order is by name. If two people have the same name, then their age is used. If they have the same name and age, then they are sorted by height. Score is only used if everything else is the same.
Note that if we were to sort based on the name, we would get “ASCII-betical”
order, which puts capitals before lowercase. This is probably undesirable. We
should use casefold()
so that we can compare strings regardless of their
case.
Stable Sorting allows Multiple Sorting¶
In Stable Sorting, if two values compare equally, then they will appear in the same order they showed up in. Python does stable sorting.
In unstable sorting, their order may be messed up.
Stable sorting is important because it allows you to sort by two different fields. For instance, if I want to sort by name and then score, I would first sort by score. Then I would sort by name. If two people share the same name, then the ordering we did with score will be preserved.
sorted_by_score = sorted(people,
key=lambda name, age, height, score: score)
sorted_by_name_score = sorted(sorted_by_score,
key=lambda name, age, height, score: name.casefold())
Note that we could also sort only a single time by the combination of those two fields:
sorted_by_name_score = sorted(
people,
key=lambda name, age, height, score:
(name.casefold(), score)))
But let’s say we wanted to sort by name A-Z and by high score, not low score. In which case, we need to do two sorts.
sorted_by_score = sorted(people,
key=lambda name, age, height, score: score,
reversed=True)
sorted_by_name_score = sorted(sorted_by_score,
key=lambda name, age, height, score: name.casefold())
Alternatively, we could’ve used -score
since score is a number and the
minus. This is not always an option for fields. For instance, what is
-name
?