Multiple Representation

Learning Objectives

At the end of this sub-unit, students should

appreciate having multiple representation for a single ADT.
know how to represent an ADT in multiple ways.

Multiset

We know that Python has an implementation of a set. In a set, there are no duplicate values. What if we want to keep duplicate values? Mathematically, this is represented as a multiset. In a multiset, the count of each element --called multiplicity-- matters but the order does not matter.

For example, \(\{a, b, b, a, b\}\) is equal to \(\{b, a, b, a, b\}\) but not equal to \(\{a, b\}\). We can check this by looking at the multiplicity. In both \(\{a, b, b, a, b\}\) and \(\{b, a, b, a, b\}\), the multiplicity of \(a\) is 2 and the multiplicity of \(b\) is 3. So we have the same multiplicity for each element.

On the other hand, the multiplicity of \(a\) is 1 and the multiplicity of \(b\) is 1 for \(\{a, b\}\). Since there is a different multiplicity, this is not equal to \(\{b, a, b, a, b\}\) if we treat this as a multiset. Our task is to implement the following operations for multiset.

make_empty_mset(): Returns an empty multiset.
multiplicity_of(elem, mset): Returns the multiplicity of the element elem in the multiset mset. The multiplicity may be 0.
union(mset1, mset2): Returns the union of the two multisets.

Note that for union, we need to define it carefully. Since the multiplicity is important, the multiplicity of each element \(a\) in the result should be the sum of the multiplicity of \(a\) in both operands.

Mset01

With that, we can now present some possible representation of a multiset.

List

List is an ordered sequence so the order matter. But if we are using a list to represent a multiset, the order should not matter. We can of course sort the list so that some operations are easier, but we will choose the simplest representation instead.

Now that we know that we want to represent a multiset as a list, we can decide on what the empty multiset is. This should correspond to an empty list because in both cases, there should not be any element.

def make_empty_mset():
  return []

Now, consider the \(\{a, b, b, a, b\}\) from above. How will this look like as a list? A possible representation is the following.

mset = ['a', 'b', 'b', 'a', 'b']

With this concrete example, we can rephrase the problem of finding the multiplicity of a given element is the problem of counting the number of occurrences of the element in the list. We will solve this with a simple accumulator pattern.

def multiplicity_of(elem, mset):
  res = 0
  for item in mset:
    if item == elem:
      res = res + 1
  return res

Finally, to perform a union of two multiset, we can simply concatenate the two list!

def union(mset1, mset2):
  return mset1 + mset2

Putting things together, we get the following representation of multiset as list.

Multiset as List

def make_empty_mset():
  return []

def multiplicity_of(elem, mset):
  res = 0
  for item in mset:
    if item == elem:
      res = res + 1
  return res

def union(mset1, mset2):
  return mset1 + mset2

Dictionary

Can we solve this more easily with a dictionary? That depends on what we meant by easy. If we want to represent this as a dictionary, we immediately know that an empty multiset is really just an empty dictionary.

def make_empty_mset():
  return {}

What about multiplicity? Well, again, we have to try thinking about what \(\{a, b, b, a, b\}\) looks like. It is not quite clear how we can use a dictionary for this. One way is to simply mimic a list. We will use an integer as the key.

mset = {0: 'a', 1: 'b', 2: 'b', 3: 'a', 4: 'b'}

This does not look nice. Is there another way? Since the count of each element is important, we can use the element as the key and the count as the value. So \(\{a, b, b, a, b\}\) will be representated as the following.

mset = {'a': 2, 'b': 3}

Now we can see that finding the multiplicity is trivial. But remember, we need to ensure that there is no error if the multiplicity is 0. We can use mset.get(key, 0) to ensure this.

def multiplicity_of(elem, mset):
  return mset.get(elem, 0)

Finally, in the case of union, we need to create a new empty multiset. We then need to copy the key from both multiset. At the same time as we are copying, we need to sum the count from both. In the solution below, we do not care if we assign the value multiple times as long as we are copying from both. This is why we do not need check if the key already exist in the result.

def union(mset1, mset2):
  res = {}
  for key in mset1:  # copy all from mset1
    res[key] = mset1.get(key, 0) + mset2.get(key, 0)
  for key in mset2:  # copy all from mset2
    res[key] = mset1.get(key, 0) + mset2.get(key, 0)
  return res

Putting things together, we get the following representation of multiset as dictionary. The main thing to note is that some operations become simpler as a list and some operations become simpler as a dictionary.

Multiset as Dictionary

def make_empty_mset():
  return {}

def multiplicity_of(elem, mset):
  return mset.get(elem, 0)

def union(mset1, mset2):
  res = {}
  for key in mset1:  # copy all from mset1
    res[key] = mset1.get(key, 0) + mset2.get(key, 0)
  for key in mset2:  # copy all from mset2
    res[key] = mset1.get(key, 0) + mset2.get(key, 0)
  return res

Matrix

Let us look at some other ADT with a possible multiple representation.

Dense Matrix

A dense matrix representation is implemented using 2-dimensional array. We have such a representation before. The reason why this is called a dense matrix is because we store everything including all the zeros.

What happen if everything is zero? We still need to store everything. Can we do better? Yes, the idea is to not store zeros. This is called a sparse matrix which we will discuss next.

Sparse Matrix

To avoid storing the zero, we need to know the following information.

The number of rows.
- Previously, we can know this by len(m).
The number of columns.
- Previously, we can know this by len(m[0]).
The element at a particular row and column.
- Previously, we can know this by m[row][col] assuming zero-based value.

In this representation, if the element at a particular row and column is zero, we do not even need to have that in our representation. Consider the following matrix.

\[ \begin{vmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{vmatrix} \]

We can compare the two representation as follows.

Dense Matrix

m = [
  [1, 0, 0],
  [0, 1, 0],
  [0, 0, 1]
]

Sparse Matrix

m = {
  'row': 3, 'col': 3,
  'data': {
    (0, 0): 1, (1, 1): 1, (2, 2): 1
  }
}

We can see that for this particular case, the sparse matrix seems to be more complicated. But imagine we have 100 rows and 100 columns where we only have 3 non-zero elements. The representation for sparse matrix becomes simpler as the data will only contain 3 values.

We need to talk a little bit more about the data. Note that the data is represented as a dictionary. This is quite a special dictionary. The key in this dictionary is the pair (row, col) and the value is the value of the matrix at the given row and col.

Let us give a different example to clearly show the representation. Consider the following matrix.

\[ \begin{vmatrix} 1 & 0 & 2 \\ 0 & 0 & 0 \\ 3 & 0 & 0 \end{vmatrix} \]

Dense Matrix

m = [
  [1, 0, 2],
  [0, 0, 0],
  [3, 0, 0]
]

Sparse Matrix

m = {
  'row': 3, 'col': 3,
  'data': {
    (0, 0): 1, (0, 2): 2, (2, 0): 3
  }
}

We can now write the functions for matrix that we have written before. To illustrate the benefit of this representation, we will show the implementation for zero matrix, identity matrix, and transpose. Try to solve these on your own before looking at the solution below.

Sparse Matrix

def zeros(n):
  return {
    'row': n, 'col': n,
    'data': {}
  }

def identity(n):
  res = zeros(n)
  for i in range(n):
    res['data'][(i, i)] = 1
  return res

def transpose(m):
  t_data = {}
  data = m['data']
  for key in data:
    t_data[(key[1], key[0])] = data[key]
    # initial key is (x, y)
    # after transpose, the key is (y, x)
  m['data'] = t_data