Dictionary
Learning Objectives
At the end of this sub-unit, students should
- understand dictionary.
- know the basic dictionary operations.
Looking for Something
In many cases, we have a particular set of data that is guaranteed to be unique like national identification number (e.g., NRIC). This value is typically artificially generated and controlled so uniqueness can be guaranteed. Since this value is unique, we can use this as an identification term called the key.
In the context of NUS, we have matric numbers as the unique identifier for every student. Consider a student data consisting of the following information in a tuple.
- Matric Number:
- A string (i.e.,
str
), guaranteed to be unique. - Accessor:
get_matric(student)
.
- A string (i.e.,
- Name:
- A string (i.e.,
str
). - Accessor:
get_name(student)
.
- A string (i.e.,
- Year:
- An integer (i.e.,
int
). - Accessor:
get_year(student)
.
- An integer (i.e.,
If we have a list of students, how can we find the name of a student with a particular matric number? Since we have learnt about linear and binary search, we can choose either one. But if the list is not sorted based on the matric number, we have to use linear search. So the code can be the following. We will assume that the student can always be found.
This can be quite tedious but kind of necessary. Our search uses matric number but a list is indexed by number. So there is a mismatch here. It will be useful to be able to index using anything other than number.
We can do this ourselves by writing a function that convert a given matric number into a non-negative number.
So in the case of a matric number with the format of A_______X
where _______
is a number, we can simply use this number.
Assuming that our list of student is created correctly such that a student with matric number of A0031288R
is placed at the index 31288
and assuming that our list is big enough to hold all students, we can search for the name for a particular student as follows.
So we do not have to search the list of students (i.e., for student in students
) but we need to compute the index instead.
But there are a lot of things we have to ensure to make it work.
- How do we ensure that it works for all matric number.
- What happen if two matric numbers produces the same number (i.e., a collision).
- Can we make the amount of space tight (e.g., if we use index 0 and 999999, we should not create 1000000 space).
This is not simple to create, but the idea is quite simple to explain.
If we look at a sequence (e.g., list, tuple, string, etc) as a mapping from non-negative integer to value (i.e., int -> object
), we want a mapping from value to value (i.e., object -> object
).
We call the left object
as the key and the right object
as the value.
There is already a support in Python for this called a dictionary. A dictionary is a mapping from key to value. As usual, the value can be any value. But there is a restriction on what the key can be. The key can only be immutable object1. The advantage of using a dictionary is that searching based on the key is much faster than performing linear search.
The graph above shows the average time to search for a particular value. The blue color is for ordered sequence where we are required to search linearly whereas the orange color is for unordered sequence where we can simply "jump" to a particular key. The actual value of the time is irrelevant as it depends on the machine but the relative speed and the general trend as the size increases matter more.
Syntax
The syntax for dictionary is as follows.
- Empty Dictionary:
{}
(alternativelydict()
) - General:
{ key1: val1, key2: val2, key3: val3 }
(can have as many expressions as needed)key1
,key2
, andkey3
must produce an immutable value.val1
,val2
, andval3
can produce any value.
- Alternative:
dict([(key1, val1), (key2, val2), (key3, val3)])
- The sequence can be other types (e.g.,
((key1, val1), (key2, val2), (key3, val3))
). - The element of the sequence must be a pair.
- The sequence can be other types (e.g.,
Properties
There are some basic properties we must satisfy to create a dictionary. The two properties below are about the keys.
Unique Keys
Similar to how set holds unique elements, if we look at the keys of a dictionary, they should behave like a set. But a dictionary is better than set because in a dictionary, we can use the key to retrieve the value. In a set, we can only ask if the element is inside the set.
However, for a fast retrieval to work, there cannot be a duplicate key. Otherwise, we would not know which value to retrieve. If we think about this in terms of a list, there should not be two elements at a particular index2 otherwise we do not know what to do.
It used to be the case that the order in the dictionary should be treated as random. But from Python 3.8 onwards, dictionary retains the order of insertion.
Immutable Keys
Keys of a dictionary must be immutable. As such, we cannot use mutable values such as list or dictionary as a key.
It does not mean that the key must be "simple". It can be complex. For instance, the key itself can be a tuple.
Immutable Keys
However, for complex data, we must ensure that it cannot have mutable values no matter how deep the nesting is. So if our complex data is a tuple and it contains a list, then we still cannot use it as a key.
Nested Mutable Keys
Why Immutable Key?
Why must the key be immutable?
The danger of mutable keys can be illustrated below.
First, we want the key to be unique.
So we do not want two values that are equal according to ==
operator to exist at the same time.
So imagine if we can do the following.
We can visualize this with a table.
Key | Value |
---|---|
[1] | 1 |
[2] | 2 |
So far so good. Now since a list is immutable, we can do the following as a continuation of the previous code.
How can we visualize this as a table?
Key | Value |
---|---|
[2] | 1 |
[2] | 2 |
Now we have a problem.
If we ask for the value that corresponds to the key [2]
, what will be the value?
We can no longer guarantee that the keys are unique.
Dictionary Operations
Now that we know how to create a dictionary, we can describe other operations involving dictionaries. Our initial dictionary will be the following.
We can visualize this as a table of key-value pair.
Key | Value |
---|---|
apple | 450 |
kiwi | 412 |
Checking Key
The most basic operation we can perform on a dictionary is to check if a key exist.
There are many ways to do this, but the simplest way is to use the in
operator.
Notice how there are two collections in a single dictionary.
We have a collection of keys and a collection of values.
Since there are two collections, the in
operator can only check one of them.
Using the idea of matric number above, the checking for existence of matric number is easier than existence of name.
In a more general term, checking if a key exist is easier than checking if a value exist.
So the in
operator checks for existence of keys.
Checking Key
Retrieval
As we have mentioned earlier, a dictionary is a mapping from key to value. So to retrieve the value, we will need to supply the key. The syntax used is similar to tuple/list indexing except that we do not need to use only integers.
What happen if the key does not exist in the dictionary?
Similar to how we get an error if an index does not exist in the sequence, we will also get an error.
In sequence, we will get an IndexError
because the mapping is from index.
So in a dictionary, we will get a KeyError
instead.
How do we ensure safe retrieval?
We will need to first check if the key exist.
If the key exist, we perform the retrieval, otherwise we skip the operation.
Alternatively, there is a method we can use for this called the dct.get(...)
method.
There are two variants of this method.
dct.get(key)
: Returnsdct[key]
ifkey
exists as a key, otherwise returnsNone
.dct.get(key, default)
: Returnsdct[key]
ifkey
exists as a key, otherwise returnsdefault
.
Safe Retrieval
Update/Insert
Since dictionary is a mutable collection, we can use the syntax dct[key] = val
to update the dictionary.
Unlike a list, a dictionary is also auto-expanding.
If the key does not exist, one will be created by the above update.
Otherwise, the value is updated.
In both cases, after the statement, the given key-value pair will exist in the dictionary.
Update/Insert
The visualization using table will look like the following.
Key | Value |
---|---|
apple | 450 |
kiwi | 412 |
Key | Value |
---|---|
apple | 450 |
kiwi | 412 |
pear | 1010 |
Dictionary and Variables
There is a number of similarities between dictionaries and variable scopes.
Consider a global scope.
We can represent this as a dictionary where the key is a variable name.
In this case, KeyError
when the key does not exist becomes NameError
.
This is in fact the return value of the globals()
built-in function.
The mapping is shown below.
Scope | Dictionary |
---|---|
Variable Name | Key (a string) |
Variable Value | Value mapped from Key |
NameError |
KeyError |
Delete
Deleting a key-value pair from a dictionary is more difficult.
The difficulty comes from what we meant by the term "delete".
Here, what we want is to encounter KeyError
when we want to retrieve the value of a key after we delete the key.
Given that expected behavior, we cannot simply assign the value of 0 because that is still a value.
Even assigning the value of nothing (i.e., dct[key] = None
) because the key still exist but the value becomes None
.
The operator that we need to use here is called the delete
operator.
Delete
Here, we can see that after the deletion using delete
operator, we will get KeyError
.
Delete
Equality
Similar to set, dictionary equality does not care about ordering because dictionary is an unordered collection.
So when we compare two dictionary, we simply compare if the keys are equal and for all keys, they have the same values.
Unfortunately, unlike set, there are no <
, <=
, >
, and >
operators for dictionary.
Equality
Dictionary Methods
Similar to list, there are also quite a number of dictionary methods. Here, we will show the usage and will display the summary afterwards.
Dictionary Methods
The last three also show how we can iterate over dictionary. However, there is no need to remember them because there is an even simpler way to iterate over dictionary. We will show 4 ways to do them.
Note that using dct.values()
contains the least number of information.
This is because we know only the values and not the keys.
There is a kind of asymmetry in the information here.
If we know the key, we can know the value easily.
But the reverse is not true.
In the case of dct.items()
, each item
is a tuple containing exactly two elements.
Our example simply print them, but we can retrieve each independently via item[0]
and item[1]
.
So the choice is up to us.
We would recommend familiarizing ourselves with the behavior of the simplest loop without the use of keys
, values
, or items
method.
This is sufficient because by iterating over all the keys, we can also know all the values.
-
Unfortunately, it is not a true immutable object as it is near impossible to do in Python. It is an approximation of an immutable object. It is an object that has a definition for the
__hash__
method. ↩ -
If the element is a tuple, there is still one element just that the element contains multiple values. ↩