Dictionaries

Python has an elegant solution for the hash table called the dictionary. It is used internally in many places, not the least of which are global and local namespaces. They are also used to keep track of an object’s attributes.

You’ll find many uses for dictionaries, and thanks to their O(1) behavior, they’ll be useful for large and small data sets.

Definition

A dictionary is a hash table. A hash table is a computer science concept that allows you to map keys to values by storing key-value pairs in an array in an optimized way such that you can quickly find any entry by its key. This requires that the key be hashable as the dictionary uses the hash function to calculate where the key should be stored in the array.

Python calls the key-value pair an “item” rather than an element. I’ll try to be consistent in this video with that nomenclature.

hash function

Note that the key must be a hashable object, meaning that you can pass it to the hash function. Hashable objects typically include all immutables.

The hash function will always return the same value for two objects that compare equally. IE, hash(0) == hash(0.0). It is possible that the hash function will return the same value for two unequal objects, but this is pretty unlikely.

You can also hash tuples. Each item in the tuple must also be hashable. Remember, any immutable object in Python is likely hashable!

Syntax

Python dictionaries are described by the following syntax:

{}

Two curly brackets are an empty dictionary.

{
  key: value,
  key: value,
}

It’s important to note that the key is an expression which is typically a string. The key can also be a variable or a formula that yields a value.

That is, if you want to use a str as a key (which is very common), you’ll have to quote the string. (Some languages only allow strings as keys, and don’t allow you to specify an expression for the key.)

{
    'cow': 'moo',
    'sheep': 'baa',
    'pig': 'oink',
}

The value can be, of course, anything, including a list or tuple or even another dictionary. The value does not need to be hashable or immutable or anything like that.

{
    ('red', 27): {
        'blue': 19,
        'orange': [1, 2, 3]
    }
}

It is considered good practice to have dictionaries that only have one common type for the keys and one common type for the values, but this not a requirement. It’s not uncommon to see people violate this for various reasons.

Importantly, if two keys compare equally, then they are considered the same key. For instance, if I created a dictionary with a key of 1 and then later assigned a value to the key of 1.0, then that would overwrite the key of 1.

Accordingly, if I were to accidentally specify the same key twice, Python wouldn’t consider that an error. It will just record the last value as the value for that key.

{
    1: 'bird',
    1: 'cow',
}

Final commas are optional. Typically I include them when the dictionary spans multiple lines, and don’t if it doesn’t.

Dictionary Comprehension

You can specify a dictionary with a dictionary comprehension. It is slightly different from other comprehensions in that the expression must be a key:value pair.

{k:v for ...}

As with list comprehensions, this creates a new dictionary with the keys and values specified by the key : value pairs.

Double-Star Expressions

In Python, you can create dictionaries from dictionaries. Consider the following syntax:

a = {1:"a"}
b = {2:"b", **a}

What the double-stars do is unwraps the dictionary a and folds it into b.

You can have as many as you want, in any order, and you can even put key:value pairs in between or after them if you like.

{'z': 26, **a, 'old': 'new', **{'red': 'green'}, 27: 19}

If you end up specifying the same key more than once, the last value for the key will be used.

This is comparable to the Star Expressions for iterables:

a = [1]
b = [2, *a]

You could use a single-star expression with a dictionary, but the results may surprise you. When iterating over a dictionary, Python iterates only over the keys, not the values or even key-value pairs. We’ll cover this later.

Double-Star Arguments

Similar to iterables and single-star arguments, you can specify named parameters in a function by using double-star arguments.

This is typically used to build up the parameters to a function before calling it.

args = {'key':int, 'reverse': True}
sorted(s, **args)

When calling a function, you can specify as many double-star arguments as you like, and even mix them with named parameters, but they must occur after all of the ordered arguments. Note that the keys must be strings. (Invalid identifiers are discouraged, but possible.)

Double-Star Parameters

In the function definition, when specifying the parameters, you can collect all the other named parameters using this syntax:

def foo(**kwargs):
    pass

Here, kwargs will contain all of the named parameters in the function call. Note that if the parameters were passed in with double-star arguments, the keys may not be valid identifiers. They must be strings, however.

Note that in the function specification, the double-star target must be the final target. No more named parameters are allowed after it.

You’ll typically see function signatures collect both args and kwargs, especially when writing wrappers for decorators:

def foo(*args, **kwargs):
    pass

A function with this signature will accept any number of ordered and named parameters.

Dictionary Operators

Dictionaries don’t support many operators.

  • k in d: Tests whether k is a key in the dictionary d. (You may have expected k to be the value or even a key-value pair, but this is not so.

  • k not in d: Tests whether k is not a key in the dictionary d.

  • d[k]: (Subscription) Retrieves the value associated with the key k in the dictionary d. If the key is missing, then it raises a KeyError. If the key is not hashable, then it will raise a TypeError.

  • a == b: Tests whether all the keys and values are the same for the two dictionaries.

  • a != b: Tests whether any key is missing from one or the other dictionary, or any value is different.

Dictionaries do not have a slicing syntax. If you tried to pass a slice into a dictionary, it would treat it as if it were a key and try to lookup a value associated with the slice as the key. A slice is not hashable so this would fail with a TypeError.

Dictionary Assignment

Just like you can assign to a list subscription, you can assign to a dictionary subscription.

Unlike a list subscription, if the key is missing, it will add it for you.

d = {}
d["a"] = 1
# d = {"a":1}
d["a"] = 2
# d = {"a":2}

The only error you might see is if you attempted to add a value with a key that is unhashable. That would raise a TypeError.

Trying to assign to a dictionary slice will just raise a TypeError as slices are unhashable.

Keep in mind that you can use a dictionary subscription as a target with parallel assignment, just as for list subscription. You can also combine it with list subscription, for instance, if you had a dictionary where the value was a list and the elements of the list were dictionaries or whatnot.

Removing Elements

You can use the dictionary subscription assignment to add or modify keys in a dictionary, but you’ll need the del statement to remove items.

del a["a"]

Just like for the assign statement, you can combine this with list subscription in whatever way you desire, depending on the underlying object.

Dictionary Functions

The key function for dictionaries is dict.

  • dict() returns a brand new, empty dictionary.

  • dict(k1=v1, k2=v2, ...) will create a new dictionary according to the kwargs you specify. Note that the keys must be identifiers. This may or may not be desirable.

  • dict(d, **kwargs): Creates a new dictionary with the same keys and values as the dictionary d, with additional keys and values per the kwargs.

  • dict(s, **kwargs): Creates a new dictionary with the keys and values specified by pairs in the iterable s. The pairs are (key, value) and can be tuples or lists or whatever. If there are additional kwargs, they are added to the resulting dictionary.

A dictionary is a sequence, a sequence of its keys.

  • len(d) returns the number of elements in the dictionary d.

  • iter(d) returns an iterator over the keys of the dictionary. We’ll talk about the order of the keys shortly.

You can retrieve the namespaces with these functions:

  • globals(): Returns a dictionary representing the global symbol table.

  • locals(): Returns a dictionary representing the local symbol table.

You should not expect that modifying these dictionaries will modify the namespace.

Dictionary Methods

There aren’t too many dictionary methods. You should memorize them all as they are all useful.

These first three return dictionary view objects that go over all of the items of the dictionary.

  • d.keys(): Returns an dictionary view object over the keys of the dictionary d, same as iter(d).

  • d.values(): Returns an dictionary view object over the values of the dictionary d.

  • d.items(): Returns an dictionary view object over the key-value pairs of the dictionary d.

These dictionary view objects are not copies of the dictionary. They will reflect any changes to the underlying dictionary.

d = {}
values = d.values()
d.update(red='blue')
a = list(values) # --> ['red']
d.update(green='red')
b = list(values) # --> ['red', 'green']

You can pass these into len() to get the number of items in the dictionary, or iter() to get an iterator over the items.

You can also use x in v to see if x is in the view. Note that for the key view, x is a key. For the value view, x is a value. And for the items view, x is a (key, value) pair.

Note the order of the items in these views. The order will always be consistent. That means that I can iterate across the keys() and the values() and be guaranteed that they will be correct. As of Python 3.7, this order is guaranteed to be the insertion order, but in previous versions, it was indeterminate and would change as you added or removed items.

You can remove elements from the dictionary using one of the following methods in addition to the del statement.

  • d.clear(): Removes all the items from the dictionary.

  • d.pop(key): Removes the item with the key key from the dictionary and returns its value. Raises KeyError if there is no such key.

  • d.pop(key, default): Does the same as above, except returning default if the key is not present.

  • d.popitem(): Removes and returns a (key, value) pair from the dictionary. If the dictionary is empty, raises a KeyError. In Python 3.7, the current version, the items are popped in LIFO order. In previous versions, the order was indeterminate.

There is the special method get(). Note that it never raises a KeyError. setdefault() behaves much like get() except it will also add in the missing key.

  • d.get(key): Returns the value for the key, returning None if it is not present.

  • d.get(key, default): Returns the value for the key or default if it is not present.

  • d.setdefault(key): Returns the value for the key, returning None if it is not present. It also inserts the key with the value None if it was missing.

  • d.setdefault(key, default): Returns the value for the key, returning default if it is not present. It also inserts the key with the value default if it was missing.

Compare these with subscription:

d['missing'] # KeyError
d.get('missing') # None
d.get('missing', 17) # 17
d.setdefault('missing') # None, and d['missing'] is now None
d.setdefault('missing', 17) # 17, and d['missing'] is now 17

A few miscellaneous methods:

  • d.update(...): This has the same arguments as dict() and it will overwrite existing keys in the dictionary with the keys described by the parameters.

  • d.copy(): Returns a shallow copy of the dictionary. That is, a new dictionary as if you did dict(d).

I frequently use update and copy if I want to build up a dictionary that has defaults set.

def dict_from_defaults(**kwargs):
    new_dict = defaults.copy()
    new_dict.update(kwargs)
    return new_dict

Dictionaries in Practice

Dictionaries are probably the most useful data structure in Python, and I use them almost all the time.

  • Gathering named parameters for a function call.

  • Caches.

  • Additional namespaces.

  • Mappings and look-up tables. IE, http_error_code['Not Found'] --> 404 or http_code_name[404] --> 'Not Found'.

You can use dictionaries as a sort of poor-man’s “switch” statement. A “switch” statement is sort of like a series of if-elif-elif-elif blocks, where you have a large number of elifs. Since the program will have to walk through all of the if conditions in order to find the one branch to execute, you can speed things up a bit.

Say, for instance, you are writing a video game. You need to translate the many possible keys the user can press, as well as key combinations, into actions. You probably also want to allow the players to reassign key combos. An easy way to do this properly is with a dictionary. For instance:

key_codes = {
  'w': move_up,
  'a': move_left,
  's': move_down,
  'd': move_right,
}

key_codes[key]() # Execute the code for the keypress

I think you can see the power of dictionaries in this sort of scenario.

The beautiful thing about dictionaries is lookup, assignment, deletion is always O(1). That means you can have a thousand different key combinations, and it will always jump right to the one you have, or tell you it’s missing, skipping over all of the rest. Check out my video on hash tables to understand how this works.

Modifying While Iterating

I should mention one word of caution similar to the word of caution for iterating over a list: If you modify the dictionary while iterating over it, bad things might happen, such as a RuntimeError. So don’t do that.

If you want to modify the dictionary, build up a new dictionary, or remember which keys you want to change and to what value and which you want to remove.

Keys are Not Attributes

Some other programming languages treat their dictionaries as objects, and the keys are attributes on the objects. Other languages are schizophrenic where you can treat your dictionaries as both objects and dictionaries.

I strongly prefer Python’s method. Object attributes are not confused with dictionary item – keys, and values.

There are ways to create a dictionary where you can look up the values by attribute access, but I have found them to cause more problems than they solve. I recommend not doing that.

We’ll talk more about what attributes are and how they work when we talk about object-oriented programming. I felt this was important enough to bring up now, however.

Get Imaginative with the Keys

One word of encouragement: Feel free to experiment with various types of keys. For instance, you can use tuples of hashable values as a key.

When you allow yourself the freedom to use keys that can contain multiple values, you can create some amazing dictionaries, all with O(1) lookup!