Dictionaries¶
Python has an elegant solution for the hash table called the dictionary. It is used internally in many places, not the least of which are global and local namespaces. They are also used to keep track of an object’s attributes.
You’ll find many uses for dictionaries, and thanks to their O(1) behavior, they’ll be useful for large and small data sets.
Definition¶
A dictionary is a hash table. A hash table is a computer science concept that allows you to map keys to values by storing key-value pairs in an array in an optimized way such that you can quickly find any entry by its key. This requires that the key be hashable as the dictionary uses the hash function to calculate where the key should be stored in the array.
Python calls the key-value pair an “item” rather than an element. I’ll try to be consistent in this video with that nomenclature.
hash function¶
Note that the key must be a hashable object, meaning that you can pass it to
the hash
function. Hashable objects typically include all immutables.
The hash function will always return the same value for two objects that
compare equally. IE, hash(0) == hash(0.0)
. It is possible that the hash
function will return the same value for two unequal objects, but this is
pretty unlikely.
You can also hash tuples. Each item in the tuple must also be hashable. Remember, any immutable object in Python is likely hashable!
Syntax¶
Python dictionaries are described by the following syntax:
{}
Two curly brackets are an empty dictionary.
{
key: value,
key: value,
}
It’s important to note that the key is an expression which is typically a string. The key can also be a variable or a formula that yields a value.
That is, if you want to use a str as a key (which is very common), you’ll have to quote the string. (Some languages only allow strings as keys, and don’t allow you to specify an expression for the key.)
{
'cow': 'moo',
'sheep': 'baa',
'pig': 'oink',
}
The value can be, of course, anything, including a list or tuple or even another dictionary. The value does not need to be hashable or immutable or anything like that.
{
('red', 27): {
'blue': 19,
'orange': [1, 2, 3]
}
}
It is considered good practice to have dictionaries that only have one common type for the keys and one common type for the values, but this not a requirement. It’s not uncommon to see people violate this for various reasons.
Importantly, if two keys compare equally, then they are considered the same
key. For instance, if I created a dictionary with a key of 1
and then
later assigned a value to the key of 1.0
, then that would overwrite the
key of 1
.
Accordingly, if I were to accidentally specify the same key twice, Python wouldn’t consider that an error. It will just record the last value as the value for that key.
{
1: 'bird',
1: 'cow',
}
Final commas are optional. Typically I include them when the dictionary spans multiple lines, and don’t if it doesn’t.
Dictionary Comprehension¶
You can specify a dictionary with a dictionary comprehension. It is slightly different from other comprehensions in that the expression must be a key:value pair.
{k:v for ...}
As with list comprehensions, this creates a new dictionary with the keys and values specified by the key : value pairs.
Double-Star Expressions¶
In Python, you can create dictionaries from dictionaries. Consider the following syntax:
a = {1:"a"}
b = {2:"b", **a}
What the double-stars do is unwraps the dictionary a
and folds it into
b
.
You can have as many as you want, in any order, and you can even put key:value pairs in between or after them if you like.
{'z': 26, **a, 'old': 'new', **{'red': 'green'}, 27: 19}
If you end up specifying the same key more than once, the last value for the key will be used.
This is comparable to the Star Expressions for iterables:
a = [1]
b = [2, *a]
You could use a single-star expression with a dictionary, but the results may surprise you. When iterating over a dictionary, Python iterates only over the keys, not the values or even key-value pairs. We’ll cover this later.
Double-Star Arguments¶
Similar to iterables and single-star arguments, you can specify named parameters in a function by using double-star arguments.
This is typically used to build up the parameters to a function before calling it.
args = {'key':int, 'reverse': True}
sorted(s, **args)
When calling a function, you can specify as many double-star arguments as you like, and even mix them with named parameters, but they must occur after all of the ordered arguments. Note that the keys must be strings. (Invalid identifiers are discouraged, but possible.)
Double-Star Parameters¶
In the function definition, when specifying the parameters, you can collect all the other named parameters using this syntax:
def foo(**kwargs):
pass
Here, kwargs
will contain all of the named parameters in the function
call. Note that if the parameters were passed in with double-star arguments,
the keys may not be valid identifiers. They must be strings, however.
Note that in the function specification, the double-star target must be the final target. No more named parameters are allowed after it.
You’ll typically see function signatures collect both args and kwargs, especially when writing wrappers for decorators:
def foo(*args, **kwargs):
pass
A function with this signature will accept any number of ordered and named parameters.
Dictionary Operators¶
Dictionaries don’t support many operators.
k in d
: Tests whether k is a key in the dictionary d. (You may have expectedk
to be the value or even akey-value
pair, but this is not so.k not in d
: Tests whether k is not a key in the dictionary d.d[k]
: (Subscription) Retrieves the value associated with the keyk
in the dictionaryd
. If the key is missing, then it raises aKeyError
. If the key is not hashable, then it will raise aTypeError
.a == b
: Tests whether all the keys and values are the same for the two dictionaries.a != b
: Tests whether any key is missing from one or the other dictionary, or any value is different.
Dictionaries do not have a slicing syntax. If you tried to pass a slice into a
dictionary, it would treat it as if it were a key and try to lookup a value
associated with the slice as the key. A slice is not hashable so this would
fail with a TypeError
.
Dictionary Assignment¶
Just like you can assign to a list subscription, you can assign to a dictionary subscription.
Unlike a list subscription, if the key is missing, it will add it for you.
d = {}
d["a"] = 1
# d = {"a":1}
d["a"] = 2
# d = {"a":2}
The only error you might see is if you attempted to add a value with a key
that is unhashable. That would raise a TypeError
.
Trying to assign to a dictionary slice will just raise a TypeError
as
slices are unhashable.
Keep in mind that you can use a dictionary subscription as a target with parallel assignment, just as for list subscription. You can also combine it with list subscription, for instance, if you had a dictionary where the value was a list and the elements of the list were dictionaries or whatnot.
Removing Elements¶
You can use the dictionary subscription assignment to add or modify keys in a
dictionary, but you’ll need the del
statement to remove items.
del a["a"]
Just like for the assign statement, you can combine this with list subscription in whatever way you desire, depending on the underlying object.
Dictionary Functions¶
The key function for dictionaries is dict
.
dict()
returns a brand new, empty dictionary.dict(k1=v1, k2=v2, ...)
will create a new dictionary according to the kwargs you specify. Note that the keys must be identifiers. This may or may not be desirable.dict(d, **kwargs)
: Creates a new dictionary with the same keys and values as the dictionaryd
, with additional keys and values per thekwargs
.dict(s, **kwargs)
: Creates a new dictionary with the keys and values specified by pairs in the iterables
. The pairs are(key, value)
and can be tuples or lists or whatever. If there are additionalkwargs
, they are added to the resulting dictionary.
A dictionary is a sequence, a sequence of its keys.
len(d)
returns the number of elements in the dictionaryd
.iter(d)
returns an iterator over the keys of the dictionary. We’ll talk about the order of the keys shortly.
You can retrieve the namespaces with these functions:
globals()
: Returns a dictionary representing the global symbol table.locals()
: Returns a dictionary representing the local symbol table.
You should not expect that modifying these dictionaries will modify the namespace.
Dictionary Methods¶
There aren’t too many dictionary methods. You should memorize them all as they are all useful.
These first three return dictionary view objects that go over all of the items of the dictionary.
d.keys()
: Returns an dictionary view object over the keys of the dictionaryd
, same asiter(d)
.d.values()
: Returns an dictionary view object over the values of the dictionaryd
.d.items()
: Returns an dictionary view object over the key-value pairs of the dictionaryd
.
These dictionary view objects are not copies of the dictionary. They will reflect any changes to the underlying dictionary.
d = {}
values = d.values()
d.update(red='blue')
a = list(values) # --> ['red']
d.update(green='red')
b = list(values) # --> ['red', 'green']
You can pass these into len()
to get the number of items in the
dictionary, or iter()
to get an iterator over the items.
You can also use x in v
to see if x is in the view. Note that for the key
view, x
is a key. For the value view, x
is a value. And for the items
view, x
is a (key, value) pair.
Note the order of the items in these views. The order will always be
consistent. That means that I can iterate across the keys()
and the
values()
and be guaranteed that they will be correct. As of Python 3.7,
this order is guaranteed to be the insertion order, but in previous versions,
it was indeterminate and would change as you added or removed items.
You can remove elements from the dictionary using one of the following methods
in addition to the del
statement.
d.clear()
: Removes all the items from the dictionary.d.pop(key)
: Removes the item with the keykey
from the dictionary and returns its value. RaisesKeyError
if there is no such key.d.pop(key, default)
: Does the same as above, except returningdefault
if the key is not present.d.popitem()
: Removes and returns a (key, value) pair from the dictionary. If the dictionary is empty, raises aKeyError
. In Python 3.7, the current version, the items are popped in LIFO order. In previous versions, the order was indeterminate.
There is the special method get()
. Note that it never raises a
KeyError
. setdefault()
behaves much like get()
except it will also
add in the missing key.
d.get(key)
: Returns the value for the key, returningNone
if it is not present.d.get(key, default)
: Returns the value for the key ordefault
if it is not present.d.setdefault(key)
: Returns the value for the key, returningNone
if it is not present. It also inserts the key with the valueNone
if it was missing.d.setdefault(key, default)
: Returns the value for the key, returningdefault
if it is not present. It also inserts the key with the valuedefault
if it was missing.
Compare these with subscription:
d['missing'] # KeyError
d.get('missing') # None
d.get('missing', 17) # 17
d.setdefault('missing') # None, and d['missing'] is now None
d.setdefault('missing', 17) # 17, and d['missing'] is now 17
A few miscellaneous methods:
d.update(...)
: This has the same arguments asdict()
and it will overwrite existing keys in the dictionary with the keys described by the parameters.d.copy()
: Returns a shallow copy of the dictionary. That is, a new dictionary as if you diddict(d)
.
I frequently use update
and copy
if I want to build up a dictionary
that has defaults set.
def dict_from_defaults(**kwargs):
new_dict = defaults.copy()
new_dict.update(kwargs)
return new_dict
Dictionaries in Practice¶
Dictionaries are probably the most useful data structure in Python, and I use them almost all the time.
Gathering named parameters for a function call.
Caches.
Additional namespaces.
Mappings and look-up tables. IE,
http_error_code['Not Found'] --> 404
orhttp_code_name[404] --> 'Not Found'
.
You can use dictionaries as a sort of poor-man’s “switch” statement. A “switch” statement is sort of like a series of if-elif-elif-elif blocks, where you have a large number of elifs. Since the program will have to walk through all of the if conditions in order to find the one branch to execute, you can speed things up a bit.
Say, for instance, you are writing a video game. You need to translate the many possible keys the user can press, as well as key combinations, into actions. You probably also want to allow the players to reassign key combos. An easy way to do this properly is with a dictionary. For instance:
key_codes = {
'w': move_up,
'a': move_left,
's': move_down,
'd': move_right,
}
key_codes[key]() # Execute the code for the keypress
I think you can see the power of dictionaries in this sort of scenario.
The beautiful thing about dictionaries is lookup, assignment, deletion is always O(1). That means you can have a thousand different key combinations, and it will always jump right to the one you have, or tell you it’s missing, skipping over all of the rest. Check out my video on hash tables to understand how this works.
Modifying While Iterating¶
I should mention one word of caution similar to the word of caution for
iterating over a list: If you modify the dictionary while iterating over it,
bad things might happen, such as a RuntimeError
. So don’t do that.
If you want to modify the dictionary, build up a new dictionary, or remember which keys you want to change and to what value and which you want to remove.
Keys are Not Attributes¶
Some other programming languages treat their dictionaries as objects, and the keys are attributes on the objects. Other languages are schizophrenic where you can treat your dictionaries as both objects and dictionaries.
I strongly prefer Python’s method. Object attributes are not confused with dictionary item – keys, and values.
There are ways to create a dictionary where you can look up the values by attribute access, but I have found them to cause more problems than they solve. I recommend not doing that.
We’ll talk more about what attributes are and how they work when we talk about object-oriented programming. I felt this was important enough to bring up now, however.
Get Imaginative with the Keys¶
One word of encouragement: Feel free to experiment with various types of keys. For instance, you can use tuples of hashable values as a key.
When you allow yourself the freedom to use keys that can contain multiple values, you can create some amazing dictionaries, all with O(1) lookup!