Searching

Learning Objectives

At the end of this sub-unit, students should

understand the idea of linear searching.
understand the idea of binary search.

Linear Search

Consider an arbitrary list. How can we know if it contains a certain element? Well, luckily, in Python, there is an operator for that called the in operator.

So that is straightforward. What if we want to find the positive index of the element? And if the element does not exist, we say that the index is -1. Since this is part of the negative index, we have not violated the problem specification.

So how can we do this? If there is no assumption about the elements in the list, then there is only thing we can do. We must search all the possible index.

Linear Search

def linear_search(lst, val):
  for idx in range(len(lst)):
    if lst[idx] == val:
      return idx
  return -1

Can we improve on this? Unfortunately not. We can do a little thought experiment. Say we can improve on this by never searching one of the index. It can be any index. What can be a problem?

The problem is if the value is in that index we skipped. So we cannot skip this particular index. Since we made no assumption about which index we skip, it means that we cannot skip any index at all. Since we cannot skip any index, we have to check all index.

This is an example of informal argument we need to do to convince ourselves that our code is correct.

Binary Search

The reason we cannot improve is because we have no other assumption about the list. In the worst-case, the value is always in the last place we look.

Quote

"I lost some time once. It's always in the last place you look for it."

Neil Gaiman

So to improve linear search, we need to make assumptions on the elements in the list. The assumption we want to impose on the list is that the list is sorted. For simplicity, we will say that the list is in ascending order. This implies that there is no two elements are equal. So let us imagine that we are a computer and we try to search for a particular value 65 inside 7 boxes.

Linear search will open all boxes from left to right or in any other order. To improve on this, we assume that the boxes are arranged in ascending order. We do not want to open all boxes. What can we do?

Open the middle box.
This middle box is smaller than 65.
- We can exclude all boxes to the left of this since we know that they must be smaller than 65.
We open the middle of the remaining box.
This middle box is larger than 65.
- We can exclude all boxes to the right of this since we know that they must be larger than 65.
We open the middle of the remaining box.

So we managed to find the box in 3 steps. What happen if 65 is not even in the box? If 65 is not in the box, we do not have any more boxes to consider.

Let us generalize this concept. If we consider all possible values in a straight line, the idea of a binary search is as follows.

First, we let our left hand point at the smallest possible index and our right hand point at the largest possible index. We then check the middle value. For simplicity, use floor division to compute the middle value.

Binary01

If the middle value is smaller than what we want, the value we want must be between middle and right hand. So we move the left hand to the middle.

Binary02

Then we open the middle value again.

Binary03

If the middle value is larger than what we want, the value we want must be between left hand and middle. So we move the right hand to the middle.

Binary04

Then we open the middle value again.

Binary05

We repeat this until we found the value.

Binary06

If our hand crossed, then we know that the value is not in there.

Binary07

The code is shown below.

Binary Search

def binary_search(lst, val):
  left, right = 0, len(lst) - 1   # starting hands
  while left <= right:            # hands crossing
    middle = (left + right) // 2  # compute middle
    if lst[middle] == val:        # lucky!
      return middle               #   found
    if lst[middle] < val:         # smaller
      left = middle + 1           #   move left to middle (exclude middle)
    else:                         # larger
      right = middle - 1          #   move right to middle (exclude middle)
  return -1                       # not found