Selection Sort

Umm… Selection Sort?

Yes, this continues the look at Brute-Force Sorting, begun with the The Bubble Sort. With the selection sort you repeatedly select the smallest item in the unsorted section of the data and place it at the end of the sorted data until you have sorted all the items. How To Think About Algorithms classifies it as a More of the Output algorithm, one of the types of Iterative Algorithms, while Introduction to the Design & Analysis of Algorithms (Levitin) classifies it as a Brute Force algorithm.

Imports

# python
from functools import partial
from random import shuffle

# pypi
from numba import njit
from expects import contain_exactly, equal, expect
from joblib import Parallel, delayed
from numpy.random import default_rng

import altair
import numpy
import pandas

# this project
from bowling.sort.bubble.bubble import bubble
from bowling.sort.selection import selection_counter
from bowling.sort.selection.selection import selection_swaps

# my stuff
from graeae.visualization.altair_helpers import output_path, save_chart
from graeae import Timer

Set Up

random = default_rng(2021)
TIMER = Timer()

SLUG = "selection-sort"
OUTPUT_PATH = output_path(SLUG)
save_it = partial(save_chart, output_path=OUTPUT_PATH)

Parts of the Algorithm

Item Answer
Specification This is a Sorting Problem.
Basic Steps We're going to repeatedly select the smallest item in the unselected.
Measure Progress We'll know we're progressing by the number of elements we've selected (k).
Loop Invariant None of the selected elements is larger than any element in the unselected and the selected items are in sorted order. The unselected items contain the largest elements in the list.
What are the Main Steps? 1. Find the smallest unsorted element
  2. Put the selected element at the end of the previously selected element(s)
Making Progress The number of selected items (k) always goes up after each loop.
Loop Invariant Maintenance - The previous Loop Invariant tells us that all the unselected items were at least as large or larger than the largest of the selected items.
  - We chose the largest item of the unselected so it's as big or bigger than the biggest selected item.
  - Putting the selected item at the end of the previously selected items maintains the Loop Invariant.
Establish the Loop Invariant At the start, no items have been selected so the Loop Invariant is vacuously true.
Exit-Condition We stop when all the items have been selected.
Ending - Exit-Condition: All items have been selected
  - Loop Invariant: All selected are sorted.
  - So our output has all the original items and they are now sorted, satisfying the Postconditions.

Running Time

\begin{align} C(n) &= \sum_{i=0}^{n-2} \sum_{j=i+1}^{n-1} 1\\ &= \frac{n(n-1)}{2} \in \Theta{n^2} \end{align}

The Pseudocode

\begin{algorithm}
\caption{SelectionSort}
\begin{algorithmic}
\INPUT An array of orderable items
\OUTPUT The array sorted in ascending order
\PROCEDURE{SelectionSort}{$A$}
  \FOR{$i \gets 0$ to $n - 2$}
    \STATE $min \gets i$
    \FOR{$j \gets i + 1$ to $n - 1$}
      \IF{$A[j] < A[min]$}
       \STATE $min \gets j$
      \ENDIF
    \ENDFOR
    \STATE Swap $A[i]$ and $A[min]$
  \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The outer loop (where i is set) tracks the number of items selected so far (which is the same thing as the size of the sorted section at the beginning of the list). The inner loop goes over the remaining, previously unselected, items in the list and looks for the index of the smallest item, it then puts that smallest item at the beginning of the unselected items and moves what was at the beginning to where the smallest item was. Once the swap is done, the sub-list on the left-side of the list is now bigger by one so the outer loop increments and then we search for the next smallest item, and so on until all the items have been selected (except for the last item) and the list is sorted.

The Implementations

Selection Sort

This will be a straight translation of the pseudocode (or straight-ish).

from collections.abc import MutableSequence
from collections import namedtuple
from typing import Any, Dict
SelectionOutput = namedtuple("SelectionOutput",
                             ["element_count",
                              "comparisons",
                              "swaps",
                              "elements"])
Swaps = Dict[int, list[int]]
Sortable = MutableSequence[Any]
def selection_counter(elements: Sortable) -> SelectionOutput:
    """Does the selection sort on the elements

    Args:
     elements: list of orderable objects

    Returns:
     (number of elements, comparisons, swaps)
    """
    number_of_elements = len(elements)
    comparisons = swaps = 0

    for start_of_unselected in range(number_of_elements - 1):
        smallest_unselected = start_of_unselected
        for next_unselected in range(start_of_unselected + 1,
                                     number_of_elements):
            comparisons += 1
            if elements[next_unselected] < elements[smallest_unselected]:
                smallest_unselected = next_unselected
        swaps += 1
        elements[start_of_unselected], elements[smallest_unselected] = (
            elements[smallest_unselected], elements[start_of_unselected]
        )
    return SelectionOutput(element_count=number_of_elements,
                           comparisons=comparisons,
                           swaps=swaps,
                           elements=elements)

Some Checks

def check(collection: list, n: int, comparisons: int, swaps: int) -> None:
    """Check that the sort worked

    Args:
     collection: the sorted collection
     n: number of elements in the collection
     comparisons: number of comparisons made
     swaps: number of swaps made

    Raises:
     AssertionError: some check didn't match
    """
    expect(n).to(equal(len(collection)))
    runtime = (n * (n - 1))/2
    expect(comparisons).to(equal(runtime))
    expect(swaps).to(equal(n - 1))
    expect(list(collection)).to(contain_exactly(*list(sorted(collection))))
    return

test = [1, 2, 3]
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

test = [4, 3, 2, 1]
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

COUNT = 1000
test = random.integers(low=0, high=COUNT, size=COUNT)
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

Run It

So, let's see how it does. We'll set the selection sort up as a numba function and set up the things to sort so that we can compare it to the bubble sort.

numba_selection = njit(selection_counter)
things_to_sort = [random.integers(low=0, high=count, size=count)
                  for count in range(1, 10**5 + 1, 1000)]
with TIMER:
    elements_comparisons_and_swaps = Parallel(n_jobs=-1)(
        delayed(numba_selection)(thing_to_sort)
        for thing_to_sort in things_to_sort)
Started: 2022-01-11 00:05:00.866956
Ended: 2022-01-11 00:05:41.085584
Elapsed: 0:00:40.218628

Let's plot the comparisons and swaps.

SIZE, COMPARISONS, SWAPS = 0, 1, 2
unzipped = list(zip(*elements_comparisons_and_swaps))
count_frame = pandas.DataFrame({"Elements": unzipped[SIZE],
                                "Selection Comparisons": unzipped[COMPARISONS],
                                "Selection Swaps": unzipped[SWAPS]})
base = altair.Chart(count_frame).mark_point().encode(
    x = "Elements",
)

comparisons = base.encode(
    y="Selection Comparisons",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Selection Comparisons", format=","),
             altair.Tooltip("Selection Swaps", format=",")]
).properties(title="Selection Sort Comparisons", width=800, height=250)

swaps = base.mark_point(color="DarkRed").encode(
    x="Elements",
    y="Selection Swaps",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Selection Comparisons", format=","),
             altair.Tooltip("Selection Swaps", format=",")]
).properties(title="Selection Sort Swaps", width=800, height=250)

chart = (comparisons & swaps)

save_it(chart, "selection-sort-comparisons")

Figure Missing

It's important to note the scale of the y-axes here - when I tried putting the comparisons and swaps on the same plot it was pretty much impossible to see the slope of the swaps, even when zoomed way in. Unlike the Bubble Sort, the Selection Sort's swaps have a linear growth instead of a quadratic growth.

Looking at the Swaps

Here's where it might be a little more interesting. We can do the same exercise we did with the bubble sort and plot the actual swaps to see if we can see the sorting in action.

def selection_swaps(elements: Sortable) -> Swaps:
    """Keeps track of the element indexes as they are swapped

    Args:
     elements: list of orderable elements

    Returns:
     dict mapping element to list of indices where it was in the elements list
    """
    swaps = {element: [index] for index, element in enumerate(elements)}

    number_of_elements = len(elements)

    for start_of_unselected in range(number_of_elements - 1):
        smallest_unselected = start_of_unselected

        for next_unselected in range(start_of_unselected + 1,
                                     number_of_elements):
            if elements[next_unselected] < elements[smallest_unselected]:
                smallest_unselected = next_unselected

        elements[start_of_unselected], elements[smallest_unselected] = (
            elements[smallest_unselected], elements[start_of_unselected]
        )

        # record the location of the elements
        for index, element in enumerate(elements):
            swaps[element].append(index)
    return swaps

Because we're tracking the swaps with a dict there can't be any repetitions in the inputs, so I'll use python instead of numpy to make the randomized input since it seems clearer to me.

COUNT = 50

inputs = list(range(COUNT))
shuffle(inputs)
swaps = selection_swaps(inputs)

track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort",
                         value_name="Location In Array", id_vars="Swap")
chart = altair.Chart(melted).mark_line().encode(
    x="Swap",
    y="Location In Array",
    color="Value To Sort:O",
    tooltip=["Swap", "Location In Array", "Value To Sort"]
).properties(
    title="Selection Sort Swaps",
    width=800,
    height=525,
).interactive()

save_it(chart, "selection-sort-swaps")

Figure Missing

Since I put in more inputs than I did with the Bubble Sort, the actual swaps aren't so easy to see, here, but the point of this plot is to show the (imaginary) diagonal line running from the bottom left corner up to the upper right. This shows why it's called a "More of the Output" algorithm - with each loop (represented by a "Swap" on the X-axis) one more sorted item is added to the beginning of the list (the bottom of the chart) from the unsorted part (the section above the imaginary diagonal of the chart) until you end up with a sorted list.

Since the Selection Sort always checks all the items in the "unsorted" section, the lengths of the lines above the diagonal as they move up and down don't really have a significance, they're just single swaps. What matters more is that the section below the diagonal is "sorted" and when picking the next item the algorithm has to check every element above the diagonal to find the next smallest item.

Worst Case

COUNT = 50

inputs = list(reversed(range(COUNT)))
swaps = selection_swaps(inputs)

track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort",
                         value_name="Location In Array", id_vars="Swap")
chart = altair.Chart(melted).mark_line().encode(
    x="Swap",
    y="Location In Array",
    color="Value To Sort:O",
    tooltip=["Swap", "Location In Array", "Value To Sort"]
).properties(
    title="Selection Sort Locations (Worst Case)",
    width=800,
    height=525,
).interactive()

save_it(chart, "selection-sort-locations-worst-case")

Figure Missing

This produces kind of an interesting figure… Since the input list is in exactly the backwards order (from largest element to smallest), and the selection sort traverses the list from start to end and swaps out each element with the smallest element to the right of the current element, we are always swapping out the largest element remaining for the smallest remaining element, so although the algorithm doesn't specifically try to do it, it's putting both the smallest unsorted element in place and the largest unsorted element in place with each swap, so it's actually done sorting at the halfway point. We could, then create a short-circuit version of this as well, to make it quit if there's no element in the unsorted smaller than the elements sorted so far, but as with the bubble sort, this would most-likely only help in outside cases like these.

Sources