Dictionary

Learning Objectives

At the end of this sub-unit, students should

understand dictionary.
know the basic dictionary operations.

Looking for Something

In many cases, we have a particular set of data that is guaranteed to be unique like national identification number (e.g., NRIC). This value is typically artificially generated and controlled so uniqueness can be guaranteed. Since this value is unique, we can use this as an identification term called the key.

In the context of NUS, we have matric numbers as the unique identifier for every student. Consider a student data consisting of the following information in a tuple.

Matric Number:
- A string (i.e., str), guaranteed to be unique.
- Accessor: get_matric(student).
Name:
- A string (i.e., str).
- Accessor: get_name(student).
Year:
- An integer (i.e., int).
- Accessor: get_year(student).

If we have a list of students, how can we find the name of a student with a particular matric number? Since we have learnt about linear and binary search, we can choose either one. But if the list is not sorted based on the matric number, we have to use linear search. So the code can be the following. We will assume that the student can always be found.

def find_name(students, matric):
  for student in students:
    if get_matric(student) == matric:
      return get_name(student)

This can be quite tedious but kind of necessary. Our search uses matric number but a list is indexed by number. So there is a mismatch here. It will be useful to be able to index using anything other than number.

We can do this ourselves by writing a function that convert a given matric number into a non-negative number. So in the case of a matric number with the format of A_______X where _______ is a number, we can simply use this number. Assuming that our list of student is created correctly such that a student with matric number of A0031288R is placed at the index 31288 and assuming that our list is big enough to hold all students, we can search for the name for a particular student as follows.

def find_name(students, matric):
  idx = 0
  for digit in matric[1:-1]:  # remove leftmost and rightmost letters
    idx = (idx * 10) + int(digit)
  return get_name(students[idx])

So we do not have to search the list of students (i.e., for student in students) but we need to compute the index instead. But there are a lot of things we have to ensure to make it work.

How do we ensure that it works for all matric number.
What happen if two matric numbers produces the same number (i.e., a collision).
Can we make the amount of space tight (e.g., if we use index 0 and 999999, we should not create 1000000 space).

This is not simple to create, but the idea is quite simple to explain. If we look at a sequence (e.g., list, tuple, string, etc) as a mapping from non-negative integer to value (i.e., int -> object), we want a mapping from value to value (i.e., object -> object). We call the left object as the key and the right object as the value.

There is already a support in Python for this called a dictionary. A dictionary is a mapping from key to value. As usual, the value can be any value. But there is a restriction on what the key can be. The key can only be immutable object¹. The advantage of using a dictionary is that searching based on the key is much faster than performing linear search.

Time01

The graph above shows the average time to search for a particular value. The blue color is for ordered sequence where we are required to search linearly whereas the orange color is for unordered sequence where we can simply "jump" to a particular key. The actual value of the time is irrelevant as it depends on the machine but the relative speed and the general trend as the size increases matter more.

Syntax

The syntax for dictionary is as follows.

{❬ expression ❭ : ❬ expression ❭, ..., ❬ expression ❭ : ❬ expression ❭}

Empty Dictionary: {} (alternatively dict())
General: { key1: val1, key2: val2, key3: val3 } (can have as many expressions as needed)
- key1, key2, and key3 must produce an immutable value.
- val1, val2, and val3 can produce any value.
Alternative: dict([(key1, val1), (key2, val2), (key3, val3)])
- The sequence can be other types (e.g., ((key1, val1), (key2, val2), (key3, val3))).
- The element of the sequence must be a pair.

Properties

There are some basic properties we must satisfy to create a dictionary. The two properties below are about the keys.

Unique Keys

Similar to how set holds unique elements, if we look at the keys of a dictionary, they should behave like a set. But a dictionary is better than set because in a dictionary, we can use the key to retrieve the value. In a set, we can only ask if the element is inside the set.

However, for a fast retrieval to work, there cannot be a duplicate key. Otherwise, we would not know which value to retrieve. If we think about this in terms of a list, there should not be two elements at a particular index² otherwise we do not know what to do.

It used to be the case that the order in the dictionary should be treated as random. But from Python 3.8 onwards, dictionary retains the order of insertion.

Unique Keys

>>> dct = {1: 'A', 1: 'B'}
>>> dct   # only {1: 'B'} is kept
{1: 'B'}

Immutable Keys

Keys of a dictionary must be immutable. As such, we cannot use mutable values such as list or dictionary as a key.

Mutable Keys

>>> {[1]: 1}
TypeError: unhashable type: 'list'

It does not mean that the key must be "simple". It can be complex. For instance, the key itself can be a tuple.

Immutable Keys

>>> {(1, 2): 'one and two'}
{(1, 2): 'one and two'}
>>> {1: [1, 2, 3]}  # value can be mutable
{1: [1, 2, 3]}

However, for complex data, we must ensure that it cannot have mutable values no matter how deep the nesting is. So if our complex data is a tuple and it contains a list, then we still cannot use it as a key.

Nested Mutable Keys

>>> {([1], 2): 'cannot!'}
TypeError: unhashable type: 'list'
>>> x = [1]
>>> {(x, 2): 'also cannot!'}
TypeError: unhashable type: 'list'

Why Immutable Key?

Why must the key be immutable? The danger of mutable keys can be illustrated below. First, we want the key to be unique. So we do not want two values that are equal according to == operator to exist at the same time. So imagine if we can do the following.

lst1 = [1]
lst2 = [2]
dct = {lst1: 1, lst2: 2}

We can visualize this with a table.

Key	Value
`[1]`	1
`[2]`	2

So far so good. Now since a list is immutable, we can do the following as a continuation of the previous code.

lst1[0] = 2

How can we visualize this as a table?

Key	Value
`[2]`	1
`[2]`	2

Now we have a problem. If we ask for the value that corresponds to the key [2], what will be the value? We can no longer guarantee that the keys are unique.

Dictionary Operations

Now that we know how to create a dictionary, we can describe other operations involving dictionaries. Our initial dictionary will be the following.

stock = { 'apple': 450, 'kiwi': 412 }

We can visualize this as a table of key-value pair.

Key	Value
apple	450
kiwi	412

Checking Key

The most basic operation we can perform on a dictionary is to check if a key exist. There are many ways to do this, but the simplest way is to use the in operator. Notice how there are two collections in a single dictionary. We have a collection of keys and a collection of values.

Since there are two collections, the in operator can only check one of them. Using the idea of matric number above, the checking for existence of matric number is easier than existence of name. In a more general term, checking if a key exist is easier than checking if a value exist. So the in operator checks for existence of keys.

Checking Key

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> 'apple' in stock    # 'apple' exists as key
True
>>> 'orange' in stock   # 'orange' does not exist as key
False
>>> 450 in stock        # cannot search for value
False

Retrieval

As we have mentioned earlier, a dictionary is a mapping from key to value. So to retrieve the value, we will need to supply the key. The syntax used is similar to tuple/list indexing except that we do not need to use only integers.

Retrieval

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> stock['apple']
450

What happen if the key does not exist in the dictionary? Similar to how we get an error if an index does not exist in the sequence, we will also get an error. In sequence, we will get an IndexError because the mapping is from index. So in a dictionary, we will get a KeyError instead.

Retrieval

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> stock['orange']
KeyError: 'orange'

How do we ensure safe retrieval? We will need to first check if the key exist. If the key exist, we perform the retrieval, otherwise we skip the operation. Alternatively, there is a method we can use for this called the dct.get(...) method. There are two variants of this method.

dct.get(key): Returns dct[key] if key exists as a key, otherwise returns None.
dct.get(key, default): Returns dct[key] if key exists as a key, otherwise returns default.

Safe Retrieval

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> stock.get('orange')     # nothing printed because the result is None
>>> stock.get('orange', 0)
0
>>> if 'orange' in stock:
...   print(stock['orange'])
... else:
...   print('no orange')
... 
no orange

Update/Insert

Since dictionary is a mutable collection, we can use the syntax dct[key] = val to update the dictionary. Unlike a list, a dictionary is also auto-expanding. If the key does not exist, one will be created by the above update. Otherwise, the value is updated. In both cases, after the statement, the given key-value pair will exist in the dictionary.

Update/Insert

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> stock
{'apple': 450, 'kiwi': 412}
>>> stock['pear'] = 1010
>>> stock
{'apple': 450, 'kiwi': 412, 'pear': 1010}

The visualization using table will look like the following.

Before UpdateAfter Update

Key	Value
apple	450
kiwi	412

Key	Value
apple	450
kiwi	412
pear	1010

Dictionary and Variables

There is a number of similarities between dictionaries and variable scopes. Consider a global scope. We can represent this as a dictionary where the key is a variable name. In this case, KeyError when the key does not exist becomes NameError. This is in fact the return value of the globals() built-in function. The mapping is shown below.

Scope	Dictionary
Variable Name	Key (a string)
Variable Value	Value mapped from Key
`NameError`	`KeyError`

Delete

Deleting a key-value pair from a dictionary is more difficult. The difficulty comes from what we meant by the term "delete". Here, what we want is to encounter KeyError when we want to retrieve the value of a key after we delete the key.

Given that expected behavior, we cannot simply assign the value of 0 because that is still a value. Even assigning the value of nothing (i.e., dct[key] = None) because the key still exist but the value becomes None.

The operator that we need to use here is called the delete operator.

Delete

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> stock
{'apple': 450, 'kiwi': 412}
>>> stock['apple'] = None
>>> stock
{'apple': None, 'kiwi': 412}
>>> del stock['apple']
>>> stock
{'kiwi': 412}

Here, we can see that after the deletion using delete operator, we will get KeyError.

Delete

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> del stock['apple']
>>> stock['apple']
KeyError: 'apple'

Equality

Similar to set, dictionary equality does not care about ordering because dictionary is an unordered collection. So when we compare two dictionary, we simply compare if the keys are equal and for all keys, they have the same values. Unfortunately, unlike set, there are no <, <=, >, and > operators for dictionary.

Equality

>>> {'A': 1, 'B': 2} == {'B': 2, 'A': 1}
True
>>> {'A': 1, 'B': 2} == {'B': 2, 'A': 1, 'C': 3}
False

Dictionary Methods

Similar to list, there are also quite a number of dictionary methods. Here, we will show the usage and will display the summary afterwards.

Dictionary Methods

>>> stock = { 'apple': 450, 'kiwi': 412 }
>>> copied = stock.copy()
>>> stock
{'apple': 450, 'kiwi': 412}
>>> stock.clear()
>>> stock
{}
>>> copied
{'apple': 450, 'kiwi': 412}
>>> copied.keys()
dict_keys(['apple', 'kiwi'])
>>> copied.values()
dict_values([450, 412])
>>> copied.items()
dict_items([('apple', 450), ('kiwi', 412)])

The last three also show how we can iterate over dictionary. However, there is no need to remember them because there is an even simpler way to iterate over dictionary. We will show 4 ways to do them.

For-Loop with Dictionary

for key in stock:  # iterate keys
  print(key, stock[key])

1 2	`apple 450 kiwi 412`

For-Loop with Keys

for key in stock.keys():  # iterate keys
  print(key, stock[key])

1 2	`apple 450 kiwi 412`

For-Loop with Values

for val in stock.values():  # iterate values
  print(val)  # least information

1
2

450
412

For-Loop with Items

for item in stock.items():  # iterate items
  print(item[0], item[1])

1 2	`('apple', 450) ('kiwi', 412)`

Note that using dct.values() contains the least number of information. This is because we know only the values and not the keys. There is a kind of asymmetry in the information here. If we know the key, we can know the value easily. But the reverse is not true.

In the case of dct.items(), each item is a tuple containing exactly two elements. Our example simply print them, but we can retrieve each independently via item[0] and item[1].

So the choice is up to us. We would recommend familiarizing ourselves with the behavior of the simplest loop without the use of keys, values, or items method. This is sufficient because by iterating over all the keys, we can also know all the values.

Unfortunately, it is not a true immutable object as it is near impossible to do in Python. It is an approximation of an immutable object. It is an object that has a definition for the __hash__ method. ↩
If the element is a tuple, there is still one element just that the element contains multiple values. ↩