CLRS Partition

Introduction

This is part of a series that starts with this post.

Imports and Setup

# python
from collections.abc import MutableSequence
from functools import partial

import random

# pypi
from expects import be_true, contain_exactly, equal, expect

import altair
import pandas

# software under test
from bowling.sort.quick import clrs_partition

# monkey
from graeae.visualization.altair_helpers import output_path, save_chart
SLUG = "clrs-partition"
OUTPUT_PATH = output_path(SLUG)
save_it = partial(save_chart, output_path=OUTPUT_PATH)

The Algorithm

The CLRS version seems a little clearer to follow although they use the last element as the pivot element instead of the first which threw me off for a bit. It uses a single for loop which moves anything less than the pivot element to the lower partition as it traverses the elements.

\begin{algorithm}
\caption{Partition (CLRS)}
\begin{algorithmic}
\INPUT An array and left and right locations defining a subarray
\OUTPUT The sub-array from left to right is partitioned and the partition location is returned

\PROCEDURE{Partition}{A, left, right}

\STATE PivotElement $\gets$ A[right]
\STATE LowerBound $\gets$ left - 1

\FOR {UpperBound $\in$ \{left $\ldots$ right - 1\}}
 \IF {A[UpperBound] $\leq$ PivotElement}
   \STATE LowerBound = LowerBound + 1
   \STATE \textsc{Swap}(A[LowerBound], A[UpperBound])
 \ENDIF
\ENDFOR

\STATE pivot $\gets$ LowerBound + 1
\STATE \textsc{Swap}(A[pivot], A[right])
\RETURN pivot
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The LowerBound is the index of the last element less than or equal to the pivot and LowerBound + 1 is the first element greater than the pivot. The UpperBound is the index of the last item greater than the pivot.

The LowerBound is like a demon that guards the lower partition and the UpperBound is another demon that plows through the array, throwing any elements that belong in the lower partition back to the LowerBound demon who throws the first element in the upper partition back up to the UpperBound and moves up to guard the expanded lower partition.

A Worked Example

We can take a look at how this works using a table. I'll use LB for the LowerBound index, UB for the UpperBound index, and x for the PivotElement to keep the table from getting too wide (hopefully). The array to partition is [5, 7, 9, 4, 6] with a zero-based index so left = 0, right = 4 and x (the pivot element) is 6.

Here are the values for the variables as we step through the for-loop.

UB A[UB] A[UB] \(\leq\) x LB A
0 5 True 0 5, 7, 9, 4, 6
1 7 False 0 5, 7, 9, 4, 6
2 9 False 0 5, 7, 9, 4, 6
3 4 True 1 5, 4, 9, 7, 6

As long and the element in the array we're checking is less than or equal to the Pivot Element we increment the LowerBound along with the Upper Bound since the element belongs in the lower partition. If the Lower and Upper bound indexes are equal, than they agree on where it is so nothing happens when we do the swap (or you could say they swap in place, maybe). But while the element checked is larger than the Pivot Element the Upper Bound Index goes up but the Lower Bound doesn't so when we next hit a case where the element is less than or equal to the Pivot Element, we know it's out of place and needs to be swapped with the element currently just after the lower partition.

Once we're out of the loop we then swap out the Pivot Element and the element to the right of the Lower Bound (so the first element of the Upper Bound) and return the location where the Pivot Element ended up.

  • pivot = 2
  • A = [5, 4, 6, 7, 9]

An Odd Case

What happens if the last element is the largest element?

  • A = [9, 6, 25, 4, 100]
  • x = 100
UB A[UB] A[UB] \(\leq\) x LB A
0 9 True 0 9, 6, 25, 4, 100
1 6 True 1 9, 6, 25, 4, 100
2 25 True 2 9, 6, 25, 4, 100
3 4 True 3 9, 6, 25, 4, 100

And in the end we have a pivot of \(LB + 1 = 4\) (the last element) with the lower partition being everything but the last element and no elements in the upper partition. If the array happened to be already sorted than any attempt to partition a sub-array would end up with a similar output with an empty upper partition. This doesn't really matter here, but when we use it in quicksort it will.

Since nothing happens when an element being checked is greater than the pivot element, if the pivot element happens to be the smallest item in the array we'd have a similar case with an empty lower partition, the pivot element as the first element, and the rest of the elements in the upper partition, so starting with an array that's in reversed-sorted-order would also always end up with empty partitions no matter how we choose the sub-arrays.

The Implementation

According to wikipedia, the version CLRS uses is a version of the Lomuto Partition Scheme, created by Nico Lomuto.

def partition_clrs(collection: MutableSequence, left: int, right: int) -> int:
    """Partitions the collection around the last element

    Args:
     collection: the list to partition
     left: index of the first element in the sub-list to partition
     right: index of the last element in the sub-list to partition

    Returns:
     the index of the pivot element
    """
    pivot_element = collection[right]
    lower_bound = left - 1
    for upper_bound in range(left, right):
        if collection[upper_bound] <= pivot_element:
            lower_bound += 1
            (collection[lower_bound],
             collection[upper_bound]) = (collection[upper_bound],
                                         collection[lower_bound])
    pivot = lower_bound + 1
    (collection[pivot],
     collection[right]) = (collection[right],
                           collection[pivot])
    return pivot

Some Checks

The First Example

This is the worked example I gave.

start = [5, 7, 9, 4, 6]
test = start.copy()
expected = [5, 4, 6, 7, 9]
first_expected_pivot = 2

pivot = clrs_partition(test, 0, 4)

expect(pivot).to(equal(first_expected_pivot))
expect(test).to(contain_exactly(*expected))

And to make sure the sub-list works (as opposed to using the whole list).

left, right = [100, 20], [999, 888, 777]
test = left + start.copy() + right

pivot = clrs_partition(test, 2, 6)

# all we did was shift the sub-list to spots to the right
expect(pivot).to(equal(first_expected_pivot + 2))

# only the sub-list should be partitioned
expect(test).to(contain_exactly(*(left + expected + right)))

The Pivot Is the Biggest Element

If the last element (the pivot) is the biggest element then partitioning doesn't do anything to the list.

start = [9, 6, 25, 4, 100]
test = start.copy()

pivot = clrs_partition(test, 0, 4)

# the pivot should be the last element
expect(pivot).to(equal(4))

# nothing changes in the list
expect(test).to(contain_exactly(*start))

Small Inputs

Make sure it can handle collections of small size.

start = [0]
pivot = clrs_partition(start, 0, 0)
expect(pivot).to(equal(0))

start = [1, 2]
pivot = clrs_partition(start, 0, 1)
expect(pivot).to(equal(1))

Big Inputs

This is the same test as given to the Levitin version except we need to move the test-value to the end of the input list.

prefix = random.choices(range(100), k=100)
middle = 100
suffix = random.choices(range(101, 201), k=100)
test = prefix + suffix + [middle]

output = clrs_partition(test, 0, len(test) - 1)
expect(output).to(equal(middle))
expect(test[output]).to(equal(middle))
expect(all(item < middle for item in test[:output])).to(be_true)
expect(all(item > middle for item in test[output + 1:])).to(be_true)

A CLRS Tracker

This should be the same function (as clrs_partition) but it collects the locations of the elements within the list as they get swapped around.

def partition_tracker(collection: MutableSequence, 
                      left: int, right: int) -> tuple:
    """Partitions the collection around the last element

    Args:
     collection: the list to partition
     left: index of the first element in the sub-list to partition
     right: index of the last element in the sub-list to partition

    Returns:
     locations dict, lower_bounds, upper_bounds
    """
    locations = {value: [index] for index, value in enumerate(collection)}

    pivot_element = collection[right]
    lower_bound = left - 1

    lower_bounds = [lower_bound]
    for upper_bound in range(left, right):
        if collection[upper_bound] <= pivot_element:
            lower_bound += 1
            (collection[lower_bound],
             collection[upper_bound]) = (collection[upper_bound],
                                         collection[lower_bound])
        for index, item in enumerate(collection):
            locations[item].append(index)
        lower_bounds.append(lower_bound)
    pivot = lower_bound + 1
    (collection[pivot],
     collection[right]) = (collection[right],
                           collection[pivot])
    for index, item in enumerate(collection):
        locations[item].append(index)
    lower_bounds.append(lower_bound)
    return locations, lower_bounds
def partition_track_plotter(locations, lower_bounds, title, filename):
    frame = pandas.DataFrame(locations)
    re_indexed = frame.reset_index().rename(columns={"index": "Step"})

    melted = re_indexed.melt(id_vars=["Step"], var_name="Element",
                             value_name="Location")

    lower_frame = pandas.DataFrame({"Lower Bound": lower_bounds})
    re_lowered = lower_frame.reset_index().rename(columns={"index": "Step"})
    low_melted = re_lowered.melt(id_vars=["Step"], var_name="Element",
                                 value_name="Location")


    last_location = melted.Location.max()

    elements = altair.Chart(melted).mark_line().encode(
        x=altair.X("Step:Q", axis=altair.Axis(tickMinStep=1)),
        y=altair.Y("Location:Q", axis=altair.Axis(tickMinStep=1),
                   scale=altair.Scale(domain=(-1, last_location))),
        color=altair.Color("Element:O", legend=None),
        tooltip=["Step", "Element", "Location"]
    )

    lower = altair.Chart(low_melted).mark_line(color="red").encode(
        x=altair.X("Step:Q", axis=altair.Axis(tickMinStep=1)),
        y=altair.Y("Location:Q", axis=altair.Axis(tickMinStep=1),
                   scale=altair.Scale(domain=(-1, last_location))),
        tooltip=["Step", "Location"]
    )

    chart = (elements + lower).properties(
        title=title,
        width=800, height=520
    )

    save_it(chart, filename)
    return

A Backwards Case

First, a plot of a list that starts out with all the elements greater than the pivot followed by all the elements less than the pivot.

middle = 20
first_half = list(range(middle))
second_half = list(range(middle + 1, 2 * middle))

random.shuffle(first_half)
random.shuffle(second_half
)
items = second_half + first_half + [middle]

locations, lower_bounds = partition_tracker(items, 0, len(items) - 1)

partition_track_plotter(locations, lower_bounds, "CLRS Worst-Case Swapping", "clrs-worst-case")

Figure Missing

What we have here is that the first half of the steps are going over the items greater than the pivot so we never get past the conditional in the loop, thus nothing gets moved around. Then at the halfway point we start going over all the items bigger than the pivot so every item from that point gets swapped to the lower partition. Then in the final step we're out of the loop and the pivot gets moved to the middle of the partitions.

The red-line marks the last item in the lower partition. Even though I randomized the items, since we aren't sorting the values, just moving them backwards and forwards around the partitioning, it doesn't affect what happens.

A More Random Case

Let's try something a little more random.

middle = 20
first_half = list(range(middle))
second_half = list(range(middle + 1, 2 * middle))
items = first_half + second_half
random.shuffle(items)
items.append(middle)

locations, lower_bounds = partition_tracker(items, 0, len(items) - 1)

partition_track_plotter(locations, lower_bounds,
                        title="Randomized Input",
                        filename="partitioning-plot")

Figure Missing

Not a whole lot more interesting, but it shows how it normally works with the function moving things that have a lower value than the pivot element down to where the red line is (indicating the lower partition) whenever it's encountered as the loop is traversed, then at the end the pivot element gets swapped with the element that's just above the red line.

Sources