Binary Search

Beginning

The Algorithm Parts

  • Precondition: The input array (A) is sorted in non-decreasing order and the key \(k\) is in \(A\).
  • Postcondition: The index of the matching element is output
  • Loop Invariant: The key is in our current sublist: \(k \in A[left \ldots right]\).
  • Basic Step: Find the midpoint of the sub-array, adjust the sub-array to use the midpoint such that the key is in the new sub-array.
  • Exit Condition: \(left \le right\).
  • Make Progress: After each loop the sublist is half the size of the previous sublist.
  • Maintain the Loop Invariant: Pick the half of the sublist whose boundary value would allow the key.
  • Establish the Loop Invariant: The initial sublist is the original list (\(left = 0, right=A.length -1\)).
  • Worst Case Runtime: Since the input is repeatedly halved, the worst-case is \(\Theta(\log_{2} n)\).

Rather than saying "if k is in A" over and over, the search precondition requires that \(k\) is in \(A\) and if it isn't it's considered a failed search - which doesn't mean the algorithm is incorrect, just that the pre-condition wasn't met. We have to handle the other case, but for the purpose of defining the algorithm we'll set a stricter pre-condition.

Implementation

Imports

# python
from collections import namedtuple
from collections.abc import Sequence
from functools import partial

import random

# pypi
from expects import be_false, be_none, equal, expect, raise_error
from joblib import Parallel, delayed

import altair
import numpy
import pandas

from graeae import Timer
from graeae.visualization.altair_helpers import output_path, save_chart

Set Up

TIMER = Timer()
OUTPUT_PATH = output_path("binary-search")
save_it = partial(save_chart, output_path=OUTPUT_PATH)

SearchOutput = namedtuple("SearchOutput", ["index", "count"])

Binary Search One

The Pseudocode

\begin{algorithm}
\caption{BinarySearch}
\begin{algorithmic}
\INPUT Array A of items in non-decreasing order, search key
\OUTPUT If key is in array, the index of the item that matches
\PROCEDURE{BinarySearch}{A, key}
\STATE left $\gets$ 0
\STATE right $\gets$ A.length - 1

\WHILE {right > left}
\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$

\IF {key $\le$ A[middle]}
\STATE right $\gets$ middle
\ELSE
\STATE left $\gets$ middle + 1
\ENDIF
\ENDWHILE

\IF {key = A[left]}
\RETURN left
\ELSE
\RETURN NotFound
\ENDIF
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The Function

def binary_search(element: int, elements: Sequence) -> SearchOutput:
    """Does an iterative binary search for the key

    Args:
     element: item in the source to search for
     elements: sorted (non-decreasing) collection with the key

    Returns:
     SearchOutput(index of the key in the source or None if not found, count of comparisons)
    """
    left, right = 0, len(elements) - 1
    count = 0

    if not len(elements) or not (elements[left] <= element <= elements[right]):
        return (None, 1)

    while right > left:
        assert elements[left] <= element <= elements[right]
        count += 1
        middle = (left + right)//2

        left, right = ((left, middle) if element <= elements[middle]
                       else (middle + 1, right))

    location = left if (elements[left] == element) else None
    return (location, count)
def test_it(searcher):
    test_case = list(range(11))
    expect(searcher(element=5, elements=test_case)[0]).to(equal(5))

    items = list(sorted(random.sample(range(100), k=50)))

    for expected, item in enumerate(items):
        expect(searcher(element=item, elements=items)[0]).to(equal(expected))

    expect(searcher(element=-5, elements=items)[0]).to(be_none)

    last = items[-1]
    items[-1] = last + 100
    expect(searcher(101, items)[0]).to(be_none)

    expect(searcher(5, [])[0]).to(be_none)

    expect(searcher(5, [5])[0]).to(equal(0))
    return

test_it(binary_search)

Levitin's Version

The version I entered above is one that I found on the web (PDF Lecture Notes), and is roughly what CLRS has. The one in Levitin's Book is clearer to me but has one more comparison.

The Pseudocode

\begin{algorithm}
\caption{BinarySearchTwo}
\begin{algorithmic}
\INPUT Array A of items in non-decreasing order, search key in A
\OUTPUT The index of the item that matches key
\PROCEDURE{BinarySearchTwo}{A, key}
\STATE left $\gets$ 0
\STATE right $\gets$ A.length - 1

\WHILE {left $\le$ right}
\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$

\IF {key = A[middle]}
  \RETURN middle
\ELIF {key < A[middle]}
\STATE right $\gets$ middle - 1
\ELSE
\STATE left $\gets$ middle + 1
\ENDIF
\ENDWHILE

\IF {key = A[left]}
\RETURN left
\ELSE
\RETURN NotFound
\ENDIF
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The Function

def binary_search_two(element, elements) -> SearchOutput:
    """Iterative Binary Search

    Args:
     element: item in the source
     elements: sorted collection to search

    Returns:
     index where key is found in the source or False if not found

    Raises:
     AssertionError: key is out of bounds for the source
    """
    left, right = 0, len(elements) - 1
    count = 0
    location = None
    while left <= right and location is None:
        middle = (left + right)//2

        if element == elements[middle]:
            location = middle

        left, right = ((left, middle - 1) if element <= elements[middle] else
                       (middle + 1, right))
    return (location, count)
test_it(binary_search_two)

Recursive Version

Although it is pretty straightforward as an iterative function, divide and conquer lends itself to recursion so, just for completeness, here's a recursive version of the binary search.

The Pseudocode

\begin{algorithm}
\caption{Recursive Binary Search}
\begin{algorithmic}
\REQUIRE Input Array is in non-decreasing order
\INPUT Array A , search key, left and right indices to limit the search
\OUTPUT The index of the item that matches the key
\PROCEDURE{BinarySearchRecursive}{A, key, left, right}

\IF {left > right}
  \RETURN NotFound
\ENDIF

\STATE middle $\gets \left \lfloor \frac{\textrm{left} + \textrm{right}}{2} \right \rfloor$

\IF {key = A[middle]}
 \RETURN middle
\ELIF {key < A[middle]}
 \STATE right $\gets$ middle - 1
\ELSE
 \STATE left $\gets$ middle + 1
\ENDIF

\RETURN \textsc{BinarySearchRecursive}(elements, key, left, right)

\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The Implementation

def recursive_binary_search(elements: Sequence, key: int,
                            left: int, right: int,
                            count: int=0):
    """Recursive binary search

    Args:
     elements: sorted sequence with element in it
     key: item in elements to search for
     left: left index of the current sub-list to search
     right: right index of the current sub-list to search
     count: number of times we run the comparison (for plotting)

    Returns:
     (index, count) - index of the element in the elements and the comparison count
    """
    count += 1
    if left > right:
        # we missed it, the element isn't in the elements
        return None, count

    middle = (left + right)//2

    if elements[middle] == key:
        # we found it
        return middle, count

    # move one of the boundaries to the middle
    if key < elements[middle]:
        right = middle - 1
    else:
        left = middle + 1
    return recursive_binary_search(elements, key, left, right, count)

This is a helper to get the recursive call started and to handle empty lists or search terms that are outside of the range of the list.

def search(element: int, elements: Sequence) -> tuple:
    """calls the recursive binary search

    Args:
     element: an element in source to search for
     elements: sorted sequence of items     
    """
    left, right = 0, len(elements) - 1

    if not len(elements) or  not elements[left] <= element <= elements[right]:
        return (None, 1)
    return recursive_binary_search(elements, element, left, right)
test_it(search)

Some Plotting

Left and Right

Let's look at how the search updates the left and right boundaries. First we'll need a function that records the locations.

def binary_search_points(key, source) -> tuple:
    """Iterative Binary Search

    Args:
     key: item in the source
     source: sorted collection to search

    Returns:
     tuple of left-right locations

    Raises:
     AssertionError: key is out of bounds for the source
    """
    left, right = 0, len(source) - 1
    lefts = [left]
    rights = [right]

    while right > left:
        assert source[left] <= key <= source[right]
        middle = (left + right)//2

        (left, right) = ((left, middle) if key <= source[middle]
                         else (middle + 1, right))
        lefts.append(left)
        rights.append(right)
    return lefts, rights

Now we'll plot it.

items = list(range(101))
key = items[24]
lefts, rights = binary_search_points(key, items)

data = pandas.DataFrame(dict(Left=lefts, Right=rights, Split=list(range(len(lefts)))))

melted = data.melt(id_vars=["Split"], value_vars=["Left", "Right"],
                   var_name="left_or_right",
                   value_name="Index")
base = altair.Chart(melted)

lines = base.mark_line().encode(
    x="Split:O",
    y="Index",
    color="left_or_right"
)

points = base.mark_point().encode(
    x="Split:O",
    y="Index",
    color="left_or_right"
)

chart = (lines + points).properties(
    title="Binary Search Left-Right Boundaries",
    width=800,
    height=550
).interactive()

save_it(chart=chart, name="binary-search")

Figure Missing

So, it isn't really so pretty as with the sorting plots. As the plot confirms, the left and right slowly narrow to find the item in the list.

Runtime

Let's see how the number of loops goes up with the size of the search space.

EXPONENT = 5
numba_search = njit(binary_search)

sizes = tuple(range(1, 10**EXPONENT + 1, 1000))
random_source = [numpy.sort(random.integers(low=0, high=count, size=count))
                    for count in sizes]
random_things = [(random.choice(elements), elements)
                    for elements in random_source]

worst_things = [(elements[0], elements) for elements in random_source]

with TIMER:
    random_output = Parallel(n_jobs=-1)(
        delayed(numba_search)(element, elements)
        for (element, elements) in random_things)

    worst_output = Parallel(n_jobs=-1)(
        delayed(numba_search)(element, elements)
        for (element, elements) in worst_things)
Started: 2022-01-16 21:33:07.668315
Ended: 2022-01-16 21:33:08.905390
Elapsed: 0:00:01.237075
data = pandas.DataFrame(dict(
    Count=sizes,
    Random=[output[1] for output in random_output],
    First=[output[1] for output in worst_output]
))

melted = data.melt(id_vars=["Count"], value_vars=["Random", "First"],
                   var_name="Location", value_name="Bisections")

theoretical = pandas.DataFrame(dict(Count=sizes, Theoretical=numpy.log2(sizes)))

Now, to plot.

points = altair.Chart(melted).mark_point().encode(
    x="Count",
    y="Bisections",
    color="Location")

line = altair.Chart(theoretical).mark_line().encode(
    x="Count",
    y="Theoretical",
    tooltip=[altair.Tooltip("Count", format=","),
             altair.Tooltip("Theoretical", format=".2f")],
)
chart = (line + points).properties(
    title="Binary Search Bisections",
    width=800,
    height=525,
).interactive()

save_chart(chart=chart, name="binary-search-comparisons",
           output_path=OUTPUT_PATH)

Figure Missing

See Also