Binary Search
Table of Contents
Beginning
The Algorithm Parts
- Precondition: The input array (A) is sorted in non-decreasing order and the key \(k\) is in \(A\).
 - Postcondition: The index of the matching element is output
 - Loop Invariant: The key is in our current sublist: \(k \in A[left \ldots right]\).
 - Basic Step: Find the midpoint of the sub-array, adjust the sub-array to use the midpoint such that the key is in the new sub-array.
 - Exit Condition: \(left \le right\).
 - Make Progress: After each loop the sublist is half the size of the previous sublist.
 - Maintain the Loop Invariant: Pick the half of the sublist whose boundary value would allow the key.
 - Establish the Loop Invariant: The initial sublist is the original list (\(left = 0, right=A.length -1\)).
 - Worst Case Runtime: Since the input is repeatedly halved, the worst-case is \(\Theta(\log_{2} n)\).
 
Rather than saying "if k is in A" over and over, the search precondition requires that \(k\) is in \(A\) and if it isn't it's considered a failed search - which doesn't mean the algorithm is incorrect, just that the pre-condition wasn't met. We have to handle the other case, but for the purpose of defining the algorithm we'll set a stricter pre-condition.
Implementation
Imports
# python
from collections import namedtuple
from collections.abc import Sequence
from functools import partial
import random
# pypi
from expects import be_false, be_none, equal, expect, raise_error
from joblib import Parallel, delayed
import altair
import numpy
import pandas
from graeae import Timer
from graeae.visualization.altair_helpers import output_path, save_chart
Set Up
TIMER = Timer()
OUTPUT_PATH = output_path("binary-search")
save_it = partial(save_chart, output_path=OUTPUT_PATH)
SearchOutput = namedtuple("SearchOutput", ["index", "count"])
Binary Search One
The Pseudocode
\begin{algorithm}
\caption{BinarySearch}
\begin{algorithmic}
\INPUT Array A of items in non-decreasing order, search key
\OUTPUT If key is in array, the index of the item that matches
\PROCEDURE{BinarySearch}{A, key}
\STATE left $\gets$ 0
\STATE right $\gets$ A.length - 1
\WHILE {right > left}
\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$
\IF {key $\le$ A[middle]}
\STATE right $\gets$ middle
\ELSE
\STATE left $\gets$ middle + 1
\ENDIF
\ENDWHILE
\IF {key = A[left]}
\RETURN left
\ELSE
\RETURN NotFound
\ENDIF
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
The Function
def binary_search(element: int, elements: Sequence) -> SearchOutput:
    """Does an iterative binary search for the key
    Args:
     element: item in the source to search for
     elements: sorted (non-decreasing) collection with the key
    Returns:
     SearchOutput(index of the key in the source or None if not found, count of comparisons)
    """
    left, right = 0, len(elements) - 1
    count = 0
    if not len(elements) or not (elements[left] <= element <= elements[right]):
        return (None, 1)
    while right > left:
        assert elements[left] <= element <= elements[right]
        count += 1
        middle = (left + right)//2
        left, right = ((left, middle) if element <= elements[middle]
                       else (middle + 1, right))
    location = left if (elements[left] == element) else None
    return (location, count)
def test_it(searcher):
    test_case = list(range(11))
    expect(searcher(element=5, elements=test_case)[0]).to(equal(5))
    items = list(sorted(random.sample(range(100), k=50)))
    for expected, item in enumerate(items):
        expect(searcher(element=item, elements=items)[0]).to(equal(expected))
    expect(searcher(element=-5, elements=items)[0]).to(be_none)
    last = items[-1]
    items[-1] = last + 100
    expect(searcher(101, items)[0]).to(be_none)
    expect(searcher(5, [])[0]).to(be_none)
    expect(searcher(5, [5])[0]).to(equal(0))
    return
test_it(binary_search)
Levitin's Version
The version I entered above is one that I found on the web (PDF Lecture Notes), and is roughly what CLRS has. The one in Levitin's Book is clearer to me but has one more comparison.
The Pseudocode
\begin{algorithm}
\caption{BinarySearchTwo}
\begin{algorithmic}
\INPUT Array A of items in non-decreasing order, search key in A
\OUTPUT The index of the item that matches key
\PROCEDURE{BinarySearchTwo}{A, key}
\STATE left $\gets$ 0
\STATE right $\gets$ A.length - 1
\WHILE {left $\le$ right}
\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$
\IF {key = A[middle]}
  \RETURN middle
\ELIF {key < A[middle]}
\STATE right $\gets$ middle - 1
\ELSE
\STATE left $\gets$ middle + 1
\ENDIF
\ENDWHILE
\IF {key = A[left]}
\RETURN left
\ELSE
\RETURN NotFound
\ENDIF
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
The Function
def binary_search_two(element, elements) -> SearchOutput:
    """Iterative Binary Search
    Args:
     element: item in the source
     elements: sorted collection to search
    Returns:
     index where key is found in the source or False if not found
    Raises:
     AssertionError: key is out of bounds for the source
    """
    left, right = 0, len(elements) - 1
    count = 0
    location = None
    while left <= right and location is None:
        middle = (left + right)//2
        if element == elements[middle]:
            location = middle
        left, right = ((left, middle - 1) if element <= elements[middle] else
                       (middle + 1, right))
    return (location, count)
test_it(binary_search_two)
Recursive Version
Although it is pretty straightforward as an iterative function, divide and conquer lends itself to recursion so, just for completeness, here's a recursive version of the binary search.
The Pseudocode
\begin{algorithm}
\caption{Recursive Binary Search}
\begin{algorithmic}
\REQUIRE Input Array is in non-decreasing order
\INPUT Array A , search key, left and right indices to limit the search
\OUTPUT The index of the item that matches the key
\PROCEDURE{BinarySearchRecursive}{A, key, left, right}
\IF {left > right}
  \RETURN NotFound
\ENDIF
\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$
\IF {key = A[middle]}
 \RETURN middle
\ELIF {key < A[middle]}
 \STATE right $\gets$ middle - 1
\ELSE
 \STATE left $\gets$ middle + 1
\ENDIF
\RETURN \textsc{BinarySearchRecursive}(elements, key, left, right)
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
The Implementation
def recursive_binary_search(elements: Sequence, key: int,
                            left: int, right: int,
                            count: int=0):
    """Recursive binary search
    Args:
     elements: sorted sequence with element in it
     key: item in elements to search for
     left: left index of the current sub-list to search
     right: right index of the current sub-list to search
     count: number of times we run the comparison (for plotting)
    Returns:
     (index, count) - index of the element in the elements and the comparison count
    """
    count += 1
    if left > right:
        # we missed it, the element isn't in the elements
        return None, count
    middle = (left + right)//2
    if elements[middle] == key:
        # we found it
        return middle, count
    # move one of the boundaries to the middle
    if key < elements[middle]:
        right = middle - 1
    else:
        left = middle + 1
    return recursive_binary_search(elements, key, left, right, count)
This is a helper to get the recursive call started and to handle empty lists or search terms that are outside of the range of the list.
def search(element: int, elements: Sequence) -> tuple:
    """calls the recursive binary search
    Args:
     element: an element in source to search for
     elements: sorted sequence of items     
    """
    left, right = 0, len(elements) - 1
    if not len(elements) or  not elements[left] <= element <= elements[right]:
        return (None, 1)
    return recursive_binary_search(elements, element, left, right)
test_it(search)
Some Plotting
Left and Right
Let's look at how the search updates the left and right boundaries. First we'll need a function that records the locations.
def binary_search_points(key, source) -> tuple:
    """Iterative Binary Search
    Args:
     key: item in the source
     source: sorted collection to search
    Returns:
     tuple of left-right locations
    Raises:
     AssertionError: key is out of bounds for the source
    """
    left, right = 0, len(source) - 1
    lefts = [left]
    rights = [right]
    while right > left:
        assert source[left] <= key <= source[right]
        middle = (left + right)//2
        (left, right) = ((left, middle) if key <= source[middle]
                         else (middle + 1, right))
        lefts.append(left)
        rights.append(right)
    return lefts, rights
Now we'll plot it.
items = list(range(101))
key = items[24]
lefts, rights = binary_search_points(key, items)
data = pandas.DataFrame(dict(Left=lefts, Right=rights, Split=list(range(len(lefts)))))
melted = data.melt(id_vars=["Split"], value_vars=["Left", "Right"],
                   var_name="left_or_right",
                   value_name="Index")
base = altair.Chart(melted)
lines = base.mark_line().encode(
    x="Split:O",
    y="Index",
    color="left_or_right"
)
points = base.mark_point().encode(
    x="Split:O",
    y="Index",
    color="left_or_right"
)
chart = (lines + points).properties(
    title="Binary Search Left-Right Boundaries",
    width=800,
    height=550
).interactive()
save_it(chart=chart, name="binary-search")
So, it isn't really so pretty as with the sorting plots. As the plot confirms, the left and right slowly narrow to find the item in the list.
Runtime
Let's see how the number of loops goes up with the size of the search space.
EXPONENT = 5
numba_search = njit(binary_search)
sizes = tuple(range(1, 10**EXPONENT + 1, 1000))
random_source = [numpy.sort(random.integers(low=0, high=count, size=count))
                    for count in sizes]
random_things = [(random.choice(elements), elements)
                    for elements in random_source]
worst_things = [(elements[0], elements) for elements in random_source]
with TIMER:
    random_output = Parallel(n_jobs=-1)(
        delayed(numba_search)(element, elements)
        for (element, elements) in random_things)
    worst_output = Parallel(n_jobs=-1)(
        delayed(numba_search)(element, elements)
        for (element, elements) in worst_things)
Started: 2022-01-16 21:33:07.668315 Ended: 2022-01-16 21:33:08.905390 Elapsed: 0:00:01.237075
data = pandas.DataFrame(dict(
    Count=sizes,
    Random=[output[1] for output in random_output],
    First=[output[1] for output in worst_output]
))
melted = data.melt(id_vars=["Count"], value_vars=["Random", "First"],
                   var_name="Location", value_name="Bisections")
theoretical = pandas.DataFrame(dict(Count=sizes, Theoretical=numpy.log2(sizes)))
Now, to plot.
points = altair.Chart(melted).mark_point().encode(
    x="Count",
    y="Bisections",
    color="Location")
line = altair.Chart(theoretical).mark_line().encode(
    x="Count",
    y="Theoretical",
    tooltip=[altair.Tooltip("Count", format=","),
             altair.Tooltip("Theoretical", format=".2f")],
)
chart = (line + points).properties(
    title="Binary Search Bisections",
    width=800,
    height=525,
).interactive()
save_chart(chart=chart, name="binary-search-comparisons",
           output_path=OUTPUT_PATH)
See Also
- Loop Invariants [Internet]. [cited 2022 Jan 15]. Available from: https://www.cs.cornell.edu/courses/cs2112/2018fa/lectures/lec_loopinv/
 - Introduction to the Design & Analysis of Algorithms (Levitin)