Binary Search

Cloistered Monkey

2018-07-04 17:28

Binary Search

Binary search takes a sorted array and repeatedly divides it in half, checking whether an item searched for is at the mid-point. If you were to just traverse the array, you would have a runtime of \(O(n)\). But because you are using divide and conquer (repeatedly halving to reduce the search space), you have a runtime of \(O(\log n)\).

Imports

# python standard library
from math import log

# pypi
from expects import (
    equal,
    expect,
)

Constants

NOT_FOUND = -1

Linear Search

The naive way is to just traverse the list.

def linear_search(a, x, verbose=True):
    """Brute-force search

    Args:
     a (list): source to search
     x : Item to search for
     verbose (bool): Emit number of loops

    Returns:
     int: index of x in a or -1 if not found
    """
    counter = 0
    for i in range(len(a)):
	if verbose:
	    counter += 1
	    print("Loop {}".format(counter))
	if a[i] == x:
	    return i
    return NOT_FOUND

Interestingly, if you submit this it will time out.

Failed case #32/36: time limit exceeded (Time used: 9.98/5.00, memory used: 40480768/536870912.)

The Algorithm

This is given as part of the problem statement.

def binary_search(K, q, verbose=False):
    """Finds the index of an item in the list

    Args:
     K (list): A sorted list of integers
     q (int): the item to search for
     verbose (bool): if true, emit the number of loops done

    Returns:
     int: index of q in K or -1 if not found
    """
    min_index, max_index = 0, len(K) - 1
    counter = 0
    while max_index >= min_index:
	if verbose:
	    counter += 1
	    print("In loop {} - {}:{}".format(counter, min_index, max_index))

	mid_index = (min_index + max_index)//2

	if K[mid_index] == q:
	    return mid_index
	elif K[mid_index] < q:
	    min_index = mid_index + 1
	else:
	    max_index = mid_index - 1
    return NOT_FOUND

Testing

q = 9
K = [1, 3, 7, 8, 9, 12, 15]
expected = 4
actual = binary_search(K, q, True)
expect(actual).to(equal(expected))

Looking at the output you can see that after finding that the middle element didn't match q it searched the upper half of the list (by raising the Minimum Index to 4) and then found q at the point where the Minimum and Maximum Index were equal.

We can compare this to the linear search.

actual = linear_search(K, q)
expect(actual).to(equal(expected))

It works but it takes a little longer. You can see the theoretical (Big-O) runtimes are different as well.

print("O(n): {}".format(len(K)))
print("O(log n): {:.2f}".format(log(len(K), 2)))

Sorted Array Multiple Search Problem

Because you need at least a linear runtime to read in the inputs, the grader can't tell that our search took less time than it took to read in the list. Because of this they set up a slightly harder problem to solve which can be graded.

Problem	Search multiple keys in a sorted sequence of keys
Input	A sorted array \(K=[k_0,\ldots,k_{n-1}]\) of integers and \(Q=[q_0,\ldots,q_{n-1}\)]
Output	For each \(q_i\), its index in \(K\) or \(-1\) if it isn't in \(K\)
Constraints	\(1\le n, m \le 10^4,1 \le k_i \le 10^9\) for all \(0\le i < n;1 \le q_j \le 10^9\) for all \(0 \le j

Implementation

def multiple_search(source, keys):
    """Searches the source for the keys

    Args:
     source (list): sorted list of search items
     keys (list): items to search for in the source

    Returns:
     list: indices of keys in source
    """
    return [binary_search(source, key) for key in keys]

I wrote this based on the problem statement, but if you look at the sample code they actually do the iteration themselves so you only need to implement binary_search.

Sample

Inputs

1 5 8 12 13
8 1 23 1 11

Outputs

2 0 -1 0 -1

class TestKeys:
    source = 'source'
    search_terms = 'search-terms'
    expected = 'outputs'

TEST_CASES = dict(
    one={
	TestKeys.source: [1, 3, 7, 8, 9, 12, 15],
	TestKeys.search_terms: [9, 56, 3, 55, 1],
	TestKeys.expected: [4, -1, 1, -1, 0],
    },
    two={
	TestKeys.source: [1, 5, 8, 12, 13],
	TestKeys.search_terms: [8, 1, 23, 1, 11],
	TestKeys.expected: [2, 0, -1, 0, -1],
    }
)

Testing

for example, case in TEST_CASES.items():
    print(example)
    expected = case[TestKeys.expected]
    actual = multiple_search(case[TestKeys.source],
			     case[TestKeys.search_terms])
    expect(actual).to(equal(expected))

Grading

The binary search improves quite a bit over the linear search, passing the grader.

Good job! (Max time used: 0.69/5.00, max memory used: 40230912/536870912.)