Set
Learning Objectives
At the end of this sub-unit, students should
- understand set.
- know the basic set operations.
Unordered Collection
So far, all our collections are ordered. Is there any collection that is unordered? The most basic one is a set. Like mathematical set, we do not care about where the element is in the set as long as we know that the element is in the set. In fact, like mathematical set, Python set do no contain duplicate values. The notation for Python set is also similar to mathematical set. Both uses curly brackets.
Syntax
The syntax for set is as follows.
- Empty Set:
set()
({}
is already used by dictionary) - Singleton:
{ expr }
(the comma at the end is not required, but possible) - General:
{ expr1, expr2, expr3 }
(can have as many expressions as needed)
Set
We may think that an empty set is denoted by {}
.
Unfortunately, that is not the case.
This is because there is another data type that was implemented before set that uses curly bracket too.
That data type is dictionary.
So unfortunately, {}
creates an empty dictionary.
To construct an empty set, we need to use set()
.
Set
Set Operation
The standard set operations are available in Python except for the complement operation.
It is impossible to have a complement of a set of {1}
because there are an infinite number of integers possible in Python.
Remember, integer have an arbitrary precision in Python.
If we have two sets \(S_1\) and \(S_2\), the visualization of the operations are shown below. In all cases, Python ensures all the values in the resulting sets are distinct. The deduplication is performed automatically by Python.
Set Operation
This is on top of a typical collection operations like in
and not in
.
Set Operation
There is an implicit convention in Python that the operator +
is intended to be an ordered combination operation like concatenation.
When we concatenate 'abc'
with 'def'
, the order matters.
'abc' + 'def'
produces 'abcdef'
while 'def' + 'abc'
produces 'defabc'
.
On the other hand, the operator |
is an unordered combination.
So {1, 2, 3} + {4, 5, 6}
does not work on set because by convention it should be ordered while set is unordered.
Unfortunately, given that a set is unordered, it does not make sense to ask what is the first element in the set. So indexing does not work on set. Similarly, slicing depends on indexing, so it does not work on set too.
Set Operations
Comparison
So here is the first difference from ordered collection: the notion of equality.
The philosophy of Python is that two objects are equal if they carry the same meaning.
Of course this is not always followed due to the limitation of the representation.
For instance, 0.1 + 0.2 == 0.3
is False
and 1 == '1'
is also False
.
In the case of set, two sets are equal if they have exactly the same elements in any order.
So we expect {1, 2, 3} == {3, 2, 1}
to return True
.
Similarly, since Python is performing deduplication, we expect {1, 2, 3} == {3, 2, 1, 2, 3}
to also return True
.
Set Equality
To make things more concrete, we can explain the definition of equality on set by mimicking the behavior with list.
We say that two list lst1
and lst2
that mimic the behavior of a set is equal if for every element in lst1
exist in lst2
and every element in lst2
exist in lst1
.
Put it another way, we can say that lst1
is not equal to lst2
if one (or both) of the following is satisfied.
- There is an element in
lst1
that does not exist inlst2
. - There is an element in
lst2
that does not exist inlst1
.
Set Equality
What about other relational operator?
We can also map these to mathematical operations as follows.
Consider two sets s1
and s2
that corresponds to \(S_1\) and \(S_2\).
Python | Mathematics |
---|---|
s1 < s2 | \(S_1 \subset S_2\) |
s1 <= s2 | \(S_1 \subseteq S_2\) |
s1 > s2 | \(S_1 \supset S_2\) |
s1 >= s2 | \(S_1 \supseteq S_2\) |
This connection to subset operation should be used with care. A common mistake is shown below.
Incomparable
When dealing with number, string, or tuple, we can replace not (x < y)
with x >= y
.
However, this property is not true for set.
Note that given two sets x
and y
, we cannot simply say that one of the following will always be true.
x < y
x == y
x > y
It is possible that all of them produce False
.
This means that x
and y
are incomparable.
Using Set
Set is useful when we need our data to be distinct. We had an example of this when we want to find the average intake. We needed the number of distinct years. Our solution was to check if the year is already recorded in the list. If not, we record the year.
Using set, the solution is easier to read.
In fact, it can be shorter at the cost of some readability.
Pangram
Pangram
A pangram or holoalphabetic sentence is a sentence using every letter of a given alphabet at least once. Pangrams have been used to display typefaces, test equipment, and develop skills in handwriting, calligraphy, and typing.
The most famous pangram is probably the following short story about a fox and a dog.
The quick brown fox jumps over the lazy dog.
It has all characters in the alphabets from A to Z.
The kind of pangram we are interested is slightly different.
We want a more restricted set of alphabets.
For example, if the set of alphabets is 'agimnopr'
, then 'programming'
forms a pangram over the alphabet.
So now, to check if a word is a pangram, we need to check that each character in the set of alphabets appears in the word.
But this is incomplete.
We also need to check if each character in the word appears in the set of alphabets.
Otherwise, we have an invalid word.
For instance, using only the characters in the set of alphabet 'agimnopr'
, the word 'programmer'
cannot be formed.
We can quickly see that a pangram occurs when the set of characters in the set of alphabets is equal to the set of characters in the word.
As a side note, we can quickly convert any sequence into a set of element using the set(seq)
constructor.
So set('agimnopr')
gives us {'a', 'g', 'i', 'm', 'n', 'o', 'p', 'r'}
.
The solution can then be written simply as follows.