Sets

A mathematical set is a collection of unique items. No two items in a set can be the same.

You might think such a theoretical concept has no place in a programming language, but I have found them quite useful.

Definition

A Python set is unordered and can contain only hashable objects. The set is mutable.

Python also has a frozenset, which is an immutable set.

Note that Python uses an underlying hash table to keep track of unique keys, so lookup, adding, and removing values are all O(1) in time and memory.

Syntax

Sets are specified with the following syntax:

{e1, e2, e2}
{e}

Note that you cannot specify an empty set with opening and closing curly brackets: {}. This is an empty dictionary.

Note also the difference between sets and dictionary syntax: Sets do not have : in them.

Specifying the same value multiple times is not an error. The duplicates will just be ignored.

Set Comprehension

You can specify a set comprehension with the following syntax:

{ comprehension }

Unlike the dictionary comprehension, the expression in the comprehension is not a key: value pair. It is just a value. This looks exactly like a list comprehension, except instead of square brackets, it is using curly brackets.

Now that you know the set comprehension, you now have the complete set of comprehensions:

( comprehension ) # generator expression
[ comprehension ] # list comprehension
{ comprehension } # set comprehension
{ key:value for ... } # dictionary comprehension
func(comprehension) # function-call comprehension

Congratulations!

Set Operators

  • x in s: Tests whether the value x is in the set s.

  • x not in s: Tests whether the value x is not in the set s.

You can test for equality:

  • a == b: True if all the values in a are in b, and all the values in b are in a. Mathematically, a <= b and a >= b.

  • a != b: True if a has any value b does not have, or vice-versa.

The comparators do work with sets:

  • a < b: True if every value in a is also in b but a != b.

  • a <= b: True if every value in a is also in b, which would also be true if they were equal. (Same as a.issubset(b).)

  • a >= b: True if every value of b is also in a, which would also be true if they were equal. (Same as a.issuperset(b).)

  • a > b: True if every value of b is in a but a != b.

We also have some operators, including the bit-wise operators.

  • a | b: Returns a new set with elements that are either in a or b. Same as a.union(b).

  • a & b: Returns a new set with elements that are in both a and b. Same as a.intersection(b).

  • a - b: Returns a new set with elements in a but not in b. Same as a.difference(b).

  • a ^ b: Returns a new set with elements in a but not in b, and elements in b but not in a. Same as a.symmetric_difference(b).

The set operators are hard to remember, just like I have a hard time remembering the set operator notation in predicate logic. I tend to just use their names instead.

Set Functions

  • set(): Returns a new empty set.

  • set(iterable): Returns a new set with values from iterable.

  • frozenset(): Returns a new empty frozenset.

  • frozenset(iterable): Returns a new frozenset with values from iterable.

  • len(s): Returns the number of values in the set.

Since sets are iterable, you can use them anywhere you can use an iterable. I won’t list the functions here again.

Set Methods

There are quite a few methods on a set, but they are pretty easy to remember.

We can always create a set copy:

  • a.copy(): Creates a copy of the set.

These methods compare the set with another set and return True or False:

  • a.isdisjoint(b): True if there are no values in common beween a and b.

  • a.issubset(b): True if all of the values in a are in b. This would be true if the two sets were the same as well.

  • a.issuperset(b): True if all the values of b are in a. This would be true if the two sets were the same as well.

These methods combine two or more sets together:

  • a.union(*others): Returns a new set with all the elements of a and all the elements of each set in others. In other words, elements that are in any of the sets.

  • a.intersection(*others): Returns a new set with only the elements in a and each set in others. In other words, elements that are in all of the sets.

  • a.difference(*others): Returns a new set with elements in a but not in any of the other sets.

  • a.symmetric_difference(b): Returns a new set with elements in a but not in b, and in b but not in a.

The following methods only work for sets, not frozensets.

These set methods will update the set based on other sets:

  • a.update(*others): Modifies a to include all the elements of the other sets.

  • a.intersection_update(*others): Modifies a to only keep the elements that are in all of the other sets.

  • a.difference_update(*others): Modifies a to remove elements found in the other sets.

  • a.symmetric_difference_update(b): Modifies a to only keep the elements that are not in b, and add the elements that are in b but not in a.

You can add elements to a set very easily. It is not an error to add the same item more than once.

  • a.add(value): Adds the value to the set.

There are a few methods to remove elements from a set:

  • a.remove(value): Removes the value from the set, raising a KeyError if it is missing.

  • a.discard(value): Removes the value from the set, doing nothing if it is missing.

  • a.pop(): Removes an arbitary element from the set, raising a KeyError if it is empty.

  • a.clear(): Removes all the elements from the set.

Using Sets

It may not be immediately obvious, but many algorithms have some sort of set operation in them.

Let’s look at a few.

  • Getting a unique list of items: If you are only interested in unique things, then create a set from the iterable.

  • Stuff that is here but not there: If you want items that are in one sequence but not the other, you probably want to do a set difference.

  • Checking that everything is present: If you want to make sure everything you’re looking for is available, then you’re testing for subset.

There are many, many more scenarios.

As you program, you’ll recognize the situations that arise where sets are ideal.

People who are experienced in programming should at least be familiar with the algorithms necessary to accomplish the set operations. They might not know they are doing set operations, but they have certainly run into situations where they arise and figured out a good solution. I often ask more senior programmers about set operations (without calling them that) to see how familiar they are with the ideas behind them. Younger programmers who don’t have much experience likely haven’t run into many situations involving the set operations and so they fumble around, while senior programmers will either quickly recognize them as set operations or rely on their previous experience to solve the problem at hand.

So if you want to pass yourself off as a senior programmer in my eyes, you’ll learn the set operations, how to implement them, where they are useful, and what their names are.