Using Dictionary

Learning Objectives

At the end of this sub-unit, students should

know how to solve problems using dictionary.

Table as Dictionary

We know that dictionary represents a mapping from key to value with the restriction that the key is an immutable value. To represent this visually, we use table. There is one table that we used to solve problems before. We used table to compute the price of a burger.

Ingredient	Price
Bread	1
Beef	7
Chicken	4
Vegetables	2
Mushrooms	3
Cheese	1
Onions	1
Hash Brown	2

The computation of the price invoke a single function ingredient_price(ingredient) that we left as an exercise. This is because the solution is rather tedious using a sequence of if-statements. The function compute_price(burger) is reproduced below.

def compute_price(burger):
  price = 0
  for ingredient in burger:
    price = price + ingredient_price(ingredient)
  return price

How can we implement ingredient_price easily? If we do not want to use a lot of if-statements, we can represent this as a dictionary. So what this means is that if-statements involving only equality can be replaced with a dictionary.

Ingredient Price

PRICE = {
  'Bread': 1, 'Beef': 7, 'Chicken': 4, 'Vegetable': 2,
  'Mushroom': 3, 'Cheese': 1, 'Onions': 1, 'Hash Brown': 2
}
def ingredient_price(ingredient):
  return PRICE[ingredient]

Anagram

Problem Description

An anagram is a word formed by rearranging the letters of a different word. In other words, we say that two words are an anagram of one another if they can be formed out of the same letters. Note that two empty strings are an anagram of one another.

Task

Write the function is_anagram(word1, word2) that returns True if word1 and word2 are anagrams of one another. Otherwise, the function returns False.

Assumptions

word1 and word2 are strings that are possibly empty (i.e., len(word1) >= 0 and len(word2) >= 0).
word1 and word2 contain only lowercase alphabets.

To solve this, we can decompose the problem into the following.

Find the count of each letters in word1.
Find the count of each letters in word2.
Check if the counts are equal.

To count each letter, we can use many representation. Since we are learning about dictionary, we can represent this as a dictionary where the keys are the letters in the word and the values are the count of the letters.

Anagram

def is_anagram(word1, word2):
  count1, count2 = {}, {}
  for char in word1:
    count1[char] = count1.get(char, 0) + 1
  for char in word2:
    count2[char] = count2.get(char, 0) + 1
  return count1 == count2

An alternative representation is to map each letter into an index. We can map a to 0, b to 1, and so on until z to 25. Try implementing this on your own.

Grouping

The last problem we are trying to solve is the problem of grouping. We needed grouping to partition data in a table which we then aggregate. Our previous solution uses a combination of filter and recursion. This may not be easy to do. So is there an alternative way?

Luckily, we know about dictionary and we can use it to partition the data. Recap that we are partitioning the table using a combination of some columns. All rows with the same values on these columns will be in the same partition. They become a "mini-table".

So the idea is to use the combination of these columns as the key in a dictionary. Then we do the following procedure.

Loop through the rows in the table.
Construct the key from the row.
Check if the key exists in the dictionary.
- If not, we create one.
Store the data in the appropriate key.

Now, using this dictionary, we can aggregate the data. Let us consider the partition using the value from columns with index 1 and 2. First, since we cannot use mutable data as key, we need to use a tuple (row[1], row[2]). Then we store the data as dct[(row[1], row[2])].append(row). This gives us the following code structure.

def grouping(data);
  # Partition
  dct = {}                  # partition dictionary
  for row in data:          # loop through the rows
    key = (row[1], row[2])  # construct the key
    if key not in dct:      # construct key if not exist
      dct[key] = []
    dct[key].append(row)    # create a mini-table

  # Aggregation
  for key in dct:
    for elem in dct[key]:
      # aggregation

Let us now use dictionary to perform grouping to solve the average intake problem.

Average Intake

def cleanup(row):
  return [int(row[0]), row[1] == 'F', row[2], int(row[3])]

def grouping(data):
  # Partition
  dct = {}
  for row in data:
    if row[0] not in dct:
      dct[row[0]] = []
    dct[row[0]].append(row)

  # Aggregation
  return len(dct)  # this is quite trivial for this problem

def average_intake(filename, faculty):
  data = list(map(cleanup, read_csv(filename)[1:]))        # remove header and cleanup
  data = list(filter(lambda row: row[2] == faculty, data)) # only keep data from the given faculty

  # problem solving
  intk = list(map(lambda row: row[3], data))
  return sum(intk) / grouping(data)