Lists and Bytearrays¶
Lists and Bytearrays are the first examples of mutable objects in Python. Mutable objects can change their values. This has repurcussions as we’ll see later, but it also introduces the possibility of creating non-pure functions by modifying the arguments passed in.
List Object Type¶
A list in Python is a mutable array. That means that you can add or remove elements from the array. A tuple, on the other hand, is an immutable array. Once you’ve created the tuple, that’s it, it cannot be changed. You can create new tuples from it, but you cannot modify it.
Note
If you want to learn more about what an array is, I made a video just for you. You can watch it through the link that should appear here.
A bytearray is a mutable array as well, except the values are integers between 0 and 255. A bytearray is to bytes what a list is to tuples.
A string is, for all intents and purposes, an array of unicode code points. There is no mutable string type, and for good reasons. If you want to change a string, just create a new one from the old one.
List Syntax¶
In Python, you define a list much like you do for a tuple, except you can only
use square brackets [ ]
[] # Empty list
[1] # No comma neeeded
[1,2,3]
Note that square brackets are also used for subscription and slicing syntax. It’s not hard to keep them separate, though.
a[5] # Subscription a[5,6] # Subscription with a tuple (5,6) as the index.
a[5:6:1] # Slicing
Note
I have heard brackets called many different things. Here are their official names, along with some nicknames I have heard:
()
: Parentheses. (Pronounced “-eeze” at the end.) Memory: “this” is a “parenthesis”. “These” are “parentheses”.(
\x28
: Left or Opening Parenthesis)
\x29
: Right or Closing Parenthesis.
[]
: Square brackets. (I used to call these “square braces”.)[
\x5B
: Left or Opening Square Bracket.]
\x5D
: Right or Closing Square Bracket.
{}
: Curly Brackets or Braces.{
\x7B
: Left or Opening Curly Bracket; Left Brace}
\x7D
: Right or Closing Curly Bracket; Right Brace
<>
: Angle Brackets.<
\x3C
: Less-than Sign or Left Angle Bracket>
\x3E
: Greater-than Sign or Right Angle Bracket.
: Bra-Ket
: Bra (as in “bracket”)
: Ket
Note that there are angle brackets in other code tables in Unicode. I won’t describe them here as they aren’t used for programming but instead for typography.
List Comprehension¶
I’ve introduced comprehensions in a previous video, and we learned how to write a generator expression. As a refresher, a generator expression looks like this:
( comprehension )
Where the comprehension is the following:
expression for target_list in iterable
You can add addition for blocks if you want. You can even add if blocks. Review my video on comprehensions for the full details.
This is the syntax for a list comprehension:
[ comprehension ]
Note that this is almost identical to the generator expression. The list comprehension use square brackets while the generator expression use parentheses.
The list comprehension builds a list with elements from the comprehension. This is entirely different than the generator expression, which does not build a tuple but instead returns a generator.
Let’s look at a simple example:
a = [i**2 + j for i in range(3) for j in range(2)]
# a = [0, 1, 1, 2, 4, 5]
Caution
If the iterables in the for blocks are a generator, then the sequence should terminate. Otherwise, Python will try to exhaust the sequence.
List Operators¶
List operators are similar to the tuple operators. I’ll just list them here again to remind you:
x in s
,x not in s
: Membership test.s + t
: Concatenation. Creates a new list. NOTE: You cannot concatenate two different types together. This raises aTypeError
.s*n
,n*s
: Duplication. Creates a new list made up ofn
s
’s. Note thatn
must be an integer.s[i]
: Subscription.s[i:j]
,s[i:j:k]
: Slices.
List Subscription Assignment¶
List subscriptions can be assigned to with an assign statement.
On the left hand side of an assign statement, you can use subscription on the targets.
a = [1,2,3]
a[1] = 5
# a is now [1,5,3]
a[0], a[2] = (2,9)
# a is now [2, 5, 9]
Note that this can be recursive, so if the element is itself a list, you can assign to a subscription of that list.
# b is a list of lists
b = [[1,2], [3,4], [5,6]]
b[0][1] = 7
# b is now [[1,7], [3,4], [5,6]]
Negative indexes work like you’d expect, of course.
a = [1, 2, 3]
a[-1] = 5
# a is not [1, 2, 5]
Python will raise an IndexError
if the index is out of range.
List Slice Assignment¶
Just like you can assign to list subscriptions, you can also assign to list slices.
The behavior is slightly different from list subscription assignment. What
Python does is it removes the sliced elements and replaces them with the
elements on the RHS. Thus, the RHS must be an iterable! If it is not an
iterable, a TypeError
will be raised.
a = [1,2,3]
a[1:2] = (4,5)
# a is now 1, 4, 5, 3]
If you assign to an empty slice, then you are simply inserting the sequence in the list at that place.
a = [1,2,3]
a[1:1] = range(4, 7)
# a is now 1, 4, 5, 6, 2, 3]
Note that you can replace all the elements of a list:
a = [1,2,3]
a[:] = range(4, 7)
# a is now [4, 5, 6]
If your slice is out of range, that’s ok. If it goes beyond the end, then the elements will be appended to the end. If it goes beyond the beginning, then they will be prepended to the beginning.
>>> a = [1, 4, 5, 3]
>>> a[7:] = [1,2,3]
>>> a
[1, 4, 5, 3, 1, 2, 3]
>>> a[:-9]
[]
>>> a[:-9] = [9,9,9]
>>> a
[9, 9, 9, 1, 4, 5, 3, 1, 2, 3]
>>>
Caution
If the right hand side is a generator, then the sequence should terminate. Otherwise, Python will try to exhaust the sequence.
Removing Elements with del¶
The del
operator removes variables from the namespace. You can also use it
on list subscriptions or slices to remove elements from a list.
a = 5
del a
# a is no longer defined
a = [1,2,3]
del a[1]
# a is now [1,2]
del a[:]
# a is now []
As you might expect, deleting a subscription that is out of range will raise
an IndexError
, while deleting a slice that is out of range will do
nothing.
List Functions¶
Because a list is an iterable, anything that takes an iterable will take a list as well.
This function is new but its name and purpose shouldn’t be surprising.
list()
: Returns a new list.list(iterable)
: Instantiates and copies the iterable.
These are the same as other sequences:
all(s)
any(s)
enumerate(s, start=0)
filter(func, s)
iter(s)
len(s)
map(func, s, ...)
max(s)
min(s)
reversed(s)
sorted(s)
sum(s, start=0)
tuple(s)
zip(s, ...)
List Methods¶
There are a number of methods for lists, but not as many as for strings and bytes.
s.append(x)
: Addsx
to the end.s.extend(t)
: Adds elements of t to s. (Different from concatenation.)s.insert(i,x)
: Insertsx
ati
.s[i:i] = [x]
s.clear()
: Removes all the elements from the list.s.copy()
: Creates a copy.s.pop()
: Removes the last element and returns it.IndexError
is empty.s.pop(i)
: Removes thei``th element and returns it. ``IndexError
if out of range.s.remove(x)
: Removes the firstx
ins
.ValueError
if not found.s.count(x)
: Number ofx
’ss.index(x, start, stop)
: Returns the index ofx
.start
andstop
are optional and match the behavior for tuples.s.reverse()
: Reverses the elements ofs
in-place. ReturnsNone
.s.sort(key=None, reverse=False)
: Sorts the list in-place.
Adding More Elements¶
append
, extend
and insert
are used to add elements to the list.
Note that you can just as easily use subscription or slice assignment to
accomplish the same.
s.append(x)
is the same ass[len(s):len(s)] = [x]
.s.extend(t)
is the same ass[len(s):len(s)] = t
.s.insert(i, x)
is the same ass[i:i] = x
.
Note
We should be carefuly with our words here. We often say “add an element to the list”, which is perfectly understandable and means “append an element to the list”. However, this is confused with the same “add” that we do with numbers. Numbers represent a set of things. Adding a number to a number is just combining those two sets. And so we’re really doing the same thing from a mathematical perspective, just with different kinds of numbers. However, in computer science, numbers are a bit pattern, and adding a number to a number means changing the bit pattern to reflect what we understand by that. Lists are an array of elements, and those elements have values which may be numbers.
Confusion arises when we confuse elements with values. If I say, “Add 2
to the list s
”, you might think 2+s
, which is a TypeError
in
Python because you cannot add an int to a list! Obviously, I meant, “Add an
element with the value 2
to the list s
” which makes perfect sense.
This may sound entirely pedantic, and it is, but I think it’s important that
we not muddle our terms and get lazy with our language, particularly when
dealing with people who aren’t perfectly fluent in the concepts and the
language. So I encourage you to replace “add” with “append” wherever you
can, so that the meaning is perfectly clear. “Append 2
to the list
s
” has no confusion whatsoever.
Removing Elements¶
pop
is used to take elements off the end, or from any index. remove
looks up the item and then removes it.
pop
raises IndexError
if it is out of range, and remove
raises
ValueError
if it can’t find any such item.
pop
and append
are used to make stacks and queues, which I will talk
about shortly.
Note that remove
does not return the value removed.
Oftentimes, I want to remove an item from a list but I don’t know if that item
is even present. I’ll just wrap the remove
call in a try block and catch
the resulting ValueError
. The else
block of the try statement is
useful for handling the code when it did exist and it was removed.
Note that it does not remove all of the items that match. It only removes
the first one. If you want to remove all of them, then you have to keep
calling remove
until there is none left. Here’s a sample of code that
would do that:
def remove_all(s, x):
while True:
try:
s.remove(x)
except ValueError:
break
Sort and Reverse¶
Sort and Reverse will be covered in an upcoming video where I discuss sorting. It deserves its own topic.
Copying¶
It’s not too often that you want to copy an array, but if you do, there are several ways to do it.
s.copy()
s[:]
[x for x in s]
Of all of the above, I find copy
to be the best. It is used for other data
structures that cannot be copied so easily.
The others I consider degenerate cases of more general behavior. It’s nice to have, but I prefer not seeing it in code.
Clearing¶
There are several ways to clear out an array without deleting it altogether. You might wonder when this is a good idea. The truth is that I rarely, if ever, have done this. Typically, when I clear out an array it is because I intend to replace it without something else, so I just skip the clearing step and just replace it with the other thing.
Here are the methods:
s.clear()
s[:] = []
del s[:]
Of all of the above, I find clear
to be the most clear. The other cases
exist as sort of degenerate cases of more general behavior. Again, it’s nice
that Python is mathematically complete, but it’s not something I encourage.
Stacks¶
In Python, lists are often used as a stack. A stack is an array where you can only add elements to the end and pop them off the end.
We sometimes refer to stacks as “FILO”, meaning “First In, Last Out.”
Here are the two functions. Note that we simply append
and pop
values
from the “stack” which is really a list.
def push(stack, x):
"""Adds x to the stack."""
stack.append(x)
def pop(stack):
"""Removes the last item from the stack. Raises IndexError if there are
no more items."""
return stack.pop() # or stack.pop(-1) if you want to be explicit.
Queues¶
In Python, lists are also often used as a queue. A queue is an array where items are pushed to the end, but popped from the beginning. This is “FIFO” behavior – “First In First Out.”
As before, we can implement the two functions rather trivially with append
and pop(0)
. The queue is simply a list.
def push(queue, x):
"""Adds x to the queue."""
stack.append(x)
def pop(queue):
"""Returns the first item in the queue. Raises IndexError if there are
no more items."""
return stack.pop(0)
Bytearrays¶
Bytearrays behave exactly like bytes but allow for assignment to subscriptions and slices like lists.
bytearray(source, encoding='utf-8', errors='strict')
: Creates a new bytearray object.
Like bytes, bytearray elements are integers and must be between 0 and 255. ValueError if you try to assign out-of-range values.
Note that bytearrays don’t seem to have all of the methods that lists do. For instance, it has reverse but not sort. I haven’t used bytearrays much if ever, so I wouldn’t know if that’s a good thing or a bad thing.
When I do want to use an array of integers, I’ll either use a straight-up
list, or I’ll use something like numpy
to do all the hard work for me.
For Loops and Mutable Sequences¶
Because you can modify a list while you iterate across it, you might run into unexpected behavior.
This typically arises when you do something like this:
for i, v in enumerate(a):
a[i:i] = [1,2,3]
This leads to unpredictable and surprising behavior because the list is being modified as it is being iterated across. Will such a for loop ever terminate?
I have done something like the following more times than I’d like to admit:
def remove_all(s, x):
for i, v in enumerate(s):
if v == x: del s[i]
If you don’t see the problem with the above, try this:
s = [1, 2, 1, 1, 3, 4, 1, 1, 1, 5, 6, 1]
remove_all(s, 1)
print(s)
If you don’t understand what happened, let’s walk through the code:
When i=0 and v=1, it removes the first element.
When i=1, v=1. It skipped the 2. We remove the second element.
When i=2, v=3.
When i=3, v=4.
When i=4, v=1. It removes the fifth element.
When i=5, v=1. It removes the sixth element.
When i=6, v=1. It removes the last element.
Now, we could fix this code by iterating in reverse using the reversed
function. But I’d rather avoid the problem altogether so I don’t have to think
very hard about it.
Note
I was once heavily quoted on a team for saying, “If you wrote your code this way, then I can be really stupid.” The thought process was that I don’t want to have to be super-smart to understand your code. Difficult code is hard to understand. Even when you do understand it, it takes work, work you wouldn’t have to do if the code were simple. I’d rather be lazy and write simple code that is easy to understand, so that the people who have to read my code, debug it, and then fix it, don’t have to work very hard or be very smart to do it right.
One of my goals when I write code is to make sure the code is a lot dumber than I am. If I’m only as smart as my code, then I won’t be able to find smarter things that make the code better, or worse, find the bugs that only smarter people would see.
Here’s one way to do it:
def remove_all(s, x):
s[:] = [v for v in s if v != x]
Or spelled out:
def remove_all(s, x):
new_s = []
for v in s:
if v != x:
new_s.append(v)
s[:] = new_s
The general pattern is to just avoid modifying the list while iterating over it. Build up a new list if you need to.
On Function Calls and Mutable Objects¶
Up until now, all the objects we’ve looked at are immutable. One of the consequences of mutable objects is that we can pass them into functions, and the functions can modify those objects. Let’s look at an example:
def append(l, v):
"""Appends the value v to the list l"""
l.append(v)
a = [1,2,3]
append(a, 4)
# a is now [1,2,3,4]
Although this is a trivial example (as the append
method already exists on
the list object) it shows a new possibility in the functions we write.
Some misconceptions you might have if you have programmed in other languages might cause the above code to be confusing. Let’s remember a few principles of how Python variables and objects work as it relates to this case.
Variables do Not Contain Values¶
Variables do not contain values. They contain references to objects, and the objects themselves have values associated with them. Thus, when you are passing variables into Python, you’re really passing the references to the objects, not the variables. Thus, Python might be classified as a “pass by reference” language, as opposed to “pass by value”. (In a “pass by value” language, such as C/C++, the value is copied and passed into the function. Python does not copy values. If it copies anything, it is the reference that is copied and passed in.)
Functions Cannot Change Variables¶
Python functions cannot change the namespace of the code around the
function call. That is, append
does not chance a
. a
still
references a list, the same list in fact. The list, not the variable, was
changed.
The above rules should make it clear that you can’t write code that does the following:
a = 5
foo(a)
# a is now 6!
However, given the above facts, we often get lazy and say something like,
“This function changes a
to append the value v
.” This is not
necessarily very precise, but it is well-understood that what is meant is
“This function changes the list that a
references to append the value
v
.” After all, we’re following the mathematical concept of equality. a
is a reference the list it is a reference to, and so a
is in fact that
list.