Lists and Bytearrays

Link with Playlist

Lists and Bytearrays are the first examples of mutable objects in Python. Mutable objects can change their values. This has repurcussions as we’ll see later, but it also introduces the possibility of creating non-pure functions by modifying the arguments passed in.

List Object Type

A list in Python is a mutable array. That means that you can add or remove elements from the array. A tuple, on the other hand, is an immutable array. Once you’ve created the tuple, that’s it, it cannot be changed. You can create new tuples from it, but you cannot modify it.

Note

If you want to learn more about what an array is, I made a video just for you. You can watch it through the link that should appear here.

A bytearray is a mutable array as well, except the values are integers between 0 and 255. A bytearray is to bytes what a list is to tuples.

A string is, for all intents and purposes, an array of unicode code points. There is no mutable string type, and for good reasons. If you want to change a string, just create a new one from the old one.

List Syntax

In Python, you define a list much like you do for a tuple, except you can only use square brackets [ ]

[] # Empty list
[1] # No comma neeeded
[1,2,3]

Note that square brackets are also used for subscription and slicing syntax. It’s not hard to keep them separate, though.

a[5] # Subscription a[5,6] # Subscription with a tuple (5,6) as the index.
a[5:6:1] # Slicing

Note

I have heard brackets called many different things. Here are their official names, along with some nicknames I have heard:

  • (): Parentheses. (Pronounced “-eeze” at the end.) Memory: “this” is a “parenthesis”. “These” are “parentheses”.

    • ( \x28: Left or Opening Parenthesis

    • ) \x29: Right or Closing Parenthesis.

  • []: Square brackets. (I used to call these “square braces”.)

    • [ \x5B: Left or Opening Square Bracket.

    • ] \x5D: Right or Closing Square Bracket.

  • {}: Curly Brackets or Braces.

    • { \x7B: Left or Opening Curly Bracket; Left Brace

    • } \x7D: Right or Closing Curly Bracket; Right Brace

  • <>: Angle Brackets.

    • < \x3C: Less-than Sign or Left Angle Bracket

    • > \x3E: Greater-than Sign or Right Angle Bracket.

  • \langle a \vert b \rangle: Bra-Ket

    • \langle a \rvert: Bra (as in “bracket”)

    • \lvert a \rangle: Ket

Note that there are angle brackets in other code tables in Unicode. I won’t describe them here as they aren’t used for programming but instead for typography.

List Comprehension

I’ve introduced comprehensions in a previous video, and we learned how to write a generator expression. As a refresher, a generator expression looks like this:

( comprehension )

Where the comprehension is the following:

expression for target_list in iterable

You can add addition for blocks if you want. You can even add if blocks. Review my video on comprehensions for the full details.

This is the syntax for a list comprehension:

[ comprehension ]

Note that this is almost identical to the generator expression. The list comprehension use square brackets while the generator expression use parentheses.

The list comprehension builds a list with elements from the comprehension. This is entirely different than the generator expression, which does not build a tuple but instead returns a generator.

Let’s look at a simple example:

a = [i**2 + j for i in range(3) for j in range(2)]
# a = [0, 1, 1, 2, 4, 5]

Caution

If the iterables in the for blocks are a generator, then the sequence should terminate. Otherwise, Python will try to exhaust the sequence.

List Operators

List operators are similar to the tuple operators. I’ll just list them here again to remind you:

  • x in s, x not in s: Membership test.

  • s + t: Concatenation. Creates a new list. NOTE: You cannot concatenate two different types together. This raises a TypeError.

  • s*n, n*s: Duplication. Creates a new list made up of n s’s. Note that n must be an integer.

  • s[i]: Subscription.

  • s[i:j], s[i:j:k]: Slices.

List Subscription Assignment

List subscriptions can be assigned to with an assign statement.

On the left hand side of an assign statement, you can use subscription on the targets.

a = [1,2,3]
a[1] = 5
# a is now [1,5,3]
a[0], a[2] = (2,9)
# a is now [2, 5, 9]

Note that this can be recursive, so if the element is itself a list, you can assign to a subscription of that list.

# b is a list of lists
b = [[1,2], [3,4], [5,6]]
b[0][1] = 7
# b is now [[1,7], [3,4], [5,6]]

Negative indexes work like you’d expect, of course.

a = [1, 2, 3]
a[-1] = 5
# a is not [1, 2, 5]

Python will raise an IndexError if the index is out of range.

List Slice Assignment

Just like you can assign to list subscriptions, you can also assign to list slices.

The behavior is slightly different from list subscription assignment. What Python does is it removes the sliced elements and replaces them with the elements on the RHS. Thus, the RHS must be an iterable! If it is not an iterable, a TypeError will be raised.

a = [1,2,3]
a[1:2] = (4,5)
# a is now 1, 4, 5, 3]

If you assign to an empty slice, then you are simply inserting the sequence in the list at that place.

a = [1,2,3]
a[1:1] = range(4, 7)
# a is now 1, 4, 5, 6, 2, 3]

Note that you can replace all the elements of a list:

a = [1,2,3]
a[:] = range(4, 7)
# a is now [4, 5, 6]

If your slice is out of range, that’s ok. If it goes beyond the end, then the elements will be appended to the end. If it goes beyond the beginning, then they will be prepended to the beginning.

>>> a = [1, 4, 5, 3]
>>> a[7:] = [1,2,3]
>>> a
[1, 4, 5, 3, 1, 2, 3]
>>> a[:-9]
[]
>>> a[:-9] = [9,9,9]
>>> a
[9, 9, 9, 1, 4, 5, 3, 1, 2, 3]
>>>

Caution

If the right hand side is a generator, then the sequence should terminate. Otherwise, Python will try to exhaust the sequence.

Removing Elements with del

The del operator removes variables from the namespace. You can also use it on list subscriptions or slices to remove elements from a list.

a = 5
del a
# a is no longer defined

a = [1,2,3]
del a[1]
# a is now [1,2]

del a[:]
# a is now []

As you might expect, deleting a subscription that is out of range will raise an IndexError, while deleting a slice that is out of range will do nothing.

List Functions

Because a list is an iterable, anything that takes an iterable will take a list as well.

This function is new but its name and purpose shouldn’t be surprising.

  • list(): Returns a new list.

  • list(iterable): Instantiates and copies the iterable.

These are the same as other sequences:

  • all(s)

  • any(s)

  • enumerate(s, start=0)

  • filter(func, s)

  • iter(s)

  • len(s)

  • map(func, s, ...)

  • max(s)

  • min(s)

  • reversed(s)

  • sorted(s)

  • sum(s, start=0)

  • tuple(s)

  • zip(s, ...)

List Methods

There are a number of methods for lists, but not as many as for strings and bytes.

  • s.append(x): Adds x to the end.

  • s.extend(t): Adds elements of t to s. (Different from concatenation.)

  • s.insert(i,x): Inserts x at i. s[i:i] = [x]

  • s.clear(): Removes all the elements from the list.

  • s.copy(): Creates a copy.

  • s.pop(): Removes the last element and returns it. IndexError is empty.

  • s.pop(i): Removes the i``th element and returns it. ``IndexError if out of range.

  • s.remove(x): Removes the first x in s. ValueError if not found.

  • s.count(x): Number of x’s

  • s.index(x, start, stop): Returns the index of x. start and stop are optional and match the behavior for tuples.

  • s.reverse(): Reverses the elements of s in-place. Returns None.

  • s.sort(key=None, reverse=False): Sorts the list in-place.

Adding More Elements

append, extend and insert are used to add elements to the list. Note that you can just as easily use subscription or slice assignment to accomplish the same.

  • s.append(x) is the same as s[len(s):len(s)] = [x].

  • s.extend(t) is the same as s[len(s):len(s)] = t.

  • s.insert(i, x) is the same as s[i:i] = x.

Note

We should be carefuly with our words here. We often say “add an element to the list”, which is perfectly understandable and means “append an element to the list”. However, this is confused with the same “add” that we do with numbers. Numbers represent a set of things. Adding a number to a number is just combining those two sets. And so we’re really doing the same thing from a mathematical perspective, just with different kinds of numbers. However, in computer science, numbers are a bit pattern, and adding a number to a number means changing the bit pattern to reflect what we understand by that. Lists are an array of elements, and those elements have values which may be numbers.

Confusion arises when we confuse elements with values. If I say, “Add 2 to the list s”, you might think 2+s, which is a TypeError in Python because you cannot add an int to a list! Obviously, I meant, “Add an element with the value 2 to the list s” which makes perfect sense.

This may sound entirely pedantic, and it is, but I think it’s important that we not muddle our terms and get lazy with our language, particularly when dealing with people who aren’t perfectly fluent in the concepts and the language. So I encourage you to replace “add” with “append” wherever you can, so that the meaning is perfectly clear. “Append 2 to the list s” has no confusion whatsoever.

Removing Elements

pop is used to take elements off the end, or from any index. remove looks up the item and then removes it.

pop raises IndexError if it is out of range, and remove raises ValueError if it can’t find any such item.

pop and append are used to make stacks and queues, which I will talk about shortly.

Note that remove does not return the value removed.

Oftentimes, I want to remove an item from a list but I don’t know if that item is even present. I’ll just wrap the remove call in a try block and catch the resulting ValueError. The else block of the try statement is useful for handling the code when it did exist and it was removed.

Note that it does not remove all of the items that match. It only removes the first one. If you want to remove all of them, then you have to keep calling remove until there is none left. Here’s a sample of code that would do that:

def remove_all(s, x):
    while True:
        try:
            s.remove(x)
        except ValueError:
            break

Sort and Reverse

Sort and Reverse will be covered in an upcoming video where I discuss sorting. It deserves its own topic.

Copying

It’s not too often that you want to copy an array, but if you do, there are several ways to do it.

  1. s.copy()

  2. s[:]

  3. [x for x in s]

Of all of the above, I find copy to be the best. It is used for other data structures that cannot be copied so easily.

The others I consider degenerate cases of more general behavior. It’s nice to have, but I prefer not seeing it in code.

Clearing

There are several ways to clear out an array without deleting it altogether. You might wonder when this is a good idea. The truth is that I rarely, if ever, have done this. Typically, when I clear out an array it is because I intend to replace it without something else, so I just skip the clearing step and just replace it with the other thing.

Here are the methods:

  1. s.clear()

  2. s[:] = []

  3. del s[:]

Of all of the above, I find clear to be the most clear. The other cases exist as sort of degenerate cases of more general behavior. Again, it’s nice that Python is mathematically complete, but it’s not something I encourage.

Stacks

In Python, lists are often used as a stack. A stack is an array where you can only add elements to the end and pop them off the end.

We sometimes refer to stacks as “FILO”, meaning “First In, Last Out.”

Here are the two functions. Note that we simply append and pop values from the “stack” which is really a list.

def push(stack, x):
    """Adds x to the stack."""
    stack.append(x)

def pop(stack):
    """Removes the last item from the stack. Raises IndexError if there are
    no more items."""
    return stack.pop() # or stack.pop(-1) if you want to be explicit.

Queues

In Python, lists are also often used as a queue. A queue is an array where items are pushed to the end, but popped from the beginning. This is “FIFO” behavior – “First In First Out.”

As before, we can implement the two functions rather trivially with append and pop(0). The queue is simply a list.

def push(queue, x):
    """Adds x to the queue."""
    stack.append(x)

def pop(queue):
    """Returns the first item in the queue. Raises IndexError if there are
    no more items."""
    return stack.pop(0)

Bytearrays

Bytearrays behave exactly like bytes but allow for assignment to subscriptions and slices like lists.

  • bytearray(source, encoding='utf-8', errors='strict'): Creates a new bytearray object.

Like bytes, bytearray elements are integers and must be between 0 and 255. ValueError if you try to assign out-of-range values.

Note that bytearrays don’t seem to have all of the methods that lists do. For instance, it has reverse but not sort. I haven’t used bytearrays much if ever, so I wouldn’t know if that’s a good thing or a bad thing.

When I do want to use an array of integers, I’ll either use a straight-up list, or I’ll use something like numpy to do all the hard work for me.

For Loops and Mutable Sequences

Because you can modify a list while you iterate across it, you might run into unexpected behavior.

This typically arises when you do something like this:

for i, v in enumerate(a):
    a[i:i] = [1,2,3]

This leads to unpredictable and surprising behavior because the list is being modified as it is being iterated across. Will such a for loop ever terminate?

I have done something like the following more times than I’d like to admit:

def remove_all(s, x):
    for i, v in enumerate(s):
        if v == x: del s[i]

If you don’t see the problem with the above, try this:

s = [1, 2, 1, 1, 3, 4, 1, 1, 1, 5, 6, 1]
remove_all(s, 1)
print(s)

If you don’t understand what happened, let’s walk through the code:

  • When i=0 and v=1, it removes the first element.

  • When i=1, v=1. It skipped the 2. We remove the second element.

  • When i=2, v=3.

  • When i=3, v=4.

  • When i=4, v=1. It removes the fifth element.

  • When i=5, v=1. It removes the sixth element.

  • When i=6, v=1. It removes the last element.

Now, we could fix this code by iterating in reverse using the reversed function. But I’d rather avoid the problem altogether so I don’t have to think very hard about it.

Note

I was once heavily quoted on a team for saying, “If you wrote your code this way, then I can be really stupid.” The thought process was that I don’t want to have to be super-smart to understand your code. Difficult code is hard to understand. Even when you do understand it, it takes work, work you wouldn’t have to do if the code were simple. I’d rather be lazy and write simple code that is easy to understand, so that the people who have to read my code, debug it, and then fix it, don’t have to work very hard or be very smart to do it right.

One of my goals when I write code is to make sure the code is a lot dumber than I am. If I’m only as smart as my code, then I won’t be able to find smarter things that make the code better, or worse, find the bugs that only smarter people would see.

Here’s one way to do it:

def remove_all(s, x):
    s[:] = [v for v in s if v != x]

Or spelled out:

def remove_all(s, x):
    new_s = []
    for v in s:
        if v != x:
            new_s.append(v)
    s[:] = new_s

The general pattern is to just avoid modifying the list while iterating over it. Build up a new list if you need to.

On Function Calls and Mutable Objects

Up until now, all the objects we’ve looked at are immutable. One of the consequences of mutable objects is that we can pass them into functions, and the functions can modify those objects. Let’s look at an example:

def append(l, v):
    """Appends the value v to the list l"""
    l.append(v)

a = [1,2,3]
append(a, 4)
# a is now [1,2,3,4]

Although this is a trivial example (as the append method already exists on the list object) it shows a new possibility in the functions we write.

Some misconceptions you might have if you have programmed in other languages might cause the above code to be confusing. Let’s remember a few principles of how Python variables and objects work as it relates to this case.

Variables do Not Contain Values

Variables do not contain values. They contain references to objects, and the objects themselves have values associated with them. Thus, when you are passing variables into Python, you’re really passing the references to the objects, not the variables. Thus, Python might be classified as a “pass by reference” language, as opposed to “pass by value”. (In a “pass by value” language, such as C/C++, the value is copied and passed into the function. Python does not copy values. If it copies anything, it is the reference that is copied and passed in.)

Functions Cannot Change Variables

Python functions cannot change the namespace of the code around the function call. That is, append does not chance a. a still references a list, the same list in fact. The list, not the variable, was changed.

The above rules should make it clear that you can’t write code that does the following:

a = 5
foo(a)
# a is now 6!

However, given the above facts, we often get lazy and say something like, “This function changes a to append the value v.” This is not necessarily very precise, but it is well-understood that what is meant is “This function changes the list that a references to append the value v.” After all, we’re following the mathematical concept of equality. a is a reference the list it is a reference to, and so a is in fact that list.