Selection Sort

Umm… Selection Sort?

Yes, this continues the look at Brute-Force Sorting, begun with the The Bubble Sort. With the selection sort you repeatedly select the smallest item in the unsorted section of the data and place it at the end of the sorted data until you have sorted all the items. How To Think About Algorithms classifies it as a More of the Output algorithm, one of the types of Iterative Algorithms, while Introduction to the Design & Analysis of Algorithms (Levitin) classifies it as a Brute Force algorithm.

Imports

# python
from functools import partial
from random import shuffle

# pypi
from numba import njit
from expects import contain_exactly, equal, expect
from joblib import Parallel, delayed
from numpy.random import default_rng

import altair
import numpy
import pandas

# this project
from bowling.sort.bubble.bubble import bubble
from bowling.sort.selection import selection_counter
from bowling.sort.selection.selection import selection_swaps

# my stuff
from graeae.visualization.altair_helpers import output_path, save_chart
from graeae import Timer

Set Up

random = default_rng(2021)
TIMER = Timer()

SLUG = "selection-sort"
OUTPUT_PATH = output_path(SLUG)
save_it = partial(save_chart, output_path=OUTPUT_PATH)

Parts of the Algorithm

Item Answer
Specification This is a Sorting Problem.
Basic Steps We're going to repeatedly select the smallest item in the unselected.
Measure Progress We'll know we're progressing by the number of elements we've selected (k).
Loop Invariant None of the selected elements is larger than any element in the unselected and the selected items are in sorted order. The unselected items contain the largest elements in the list.
What are the Main Steps? 1. Find the smallest unsorted element
  2. Put the selected element at the end of the previously selected element(s)
Making Progress The number of selected items (k) always goes up after each loop.
Loop Invariant Maintenance - The previous Loop Invariant tells us that all the unselected items were at least as large or larger than the largest of the selected items.
  - We chose the largest item of the unselected so it's as big or bigger than the biggest selected item.
  - Putting the selected item at the end of the previously selected items maintains the Loop Invariant.
Establish the Loop Invariant At the start, no items have been selected so the Loop Invariant is vacuously true.
Exit-Condition We stop when all the items have been selected.
Ending - Exit-Condition: All items have been selected
  - Loop Invariant: All selected are sorted.
  - So our output has all the original items and they are now sorted, satisfying the Postconditions.

Running Time

\begin{align} C(n) &= \sum_{i=0}^{n-2} \sum_{j=i+1}^{n-1} 1\\ &= \frac{n(n-1)}{2} \in \Theta{n^2} \end{align}

The Pseudocode

\begin{algorithm}
\caption{SelectionSort}
\begin{algorithmic}
\INPUT An array of orderable items
\OUTPUT The array sorted in ascending order
\PROCEDURE{SelectionSort}{$A$}
  \FOR{$i \gets 0$ to $n - 2$}
    \STATE $min \gets i$
    \FOR{$j \gets i + 1$ to $n - 1$}
      \IF{$A[j] < A[min]$}
       \STATE $min \gets j$
      \ENDIF
    \ENDFOR
    \STATE Swap $A[i]$ and $A[min]$
  \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

The outer loop (where i is set) tracks the number of items selected so far (which is the same thing as the size of the sorted section at the beginning of the list). The inner loop goes over the remaining, previously unselected, items in the list and looks for the index of the smallest item, it then puts that smallest item at the beginning of the unselected items and moves what was at the beginning to where the smallest item was. Once the swap is done, the sub-list on the left-side of the list is now bigger by one so the outer loop increments and then we search for the next smallest item, and so on until all the items have been selected (except for the last item) and the list is sorted.

The Implementations

Selection Sort

This will be a straight translation of the pseudocode (or straight-ish).

from collections.abc import MutableSequence
from collections import namedtuple
from typing import Any, Dict
SelectionOutput = namedtuple("SelectionOutput",
                             ["element_count",
                              "comparisons",
                              "swaps",
                              "elements"])
Swaps = Dict[int, list[int]]
Sortable = MutableSequence[Any]
def selection_counter(elements: Sortable) -> SelectionOutput:
    """Does the selection sort on the elements

    Args:
     elements: list of orderable objects

    Returns:
     (number of elements, comparisons, swaps)
    """
    number_of_elements = len(elements)
    comparisons = swaps = 0

    for start_of_unselected in range(number_of_elements - 1):
        smallest_unselected = start_of_unselected
        for next_unselected in range(start_of_unselected + 1,
                                     number_of_elements):
            comparisons += 1
            if elements[next_unselected] < elements[smallest_unselected]:
                smallest_unselected = next_unselected
        swaps += 1
        elements[start_of_unselected], elements[smallest_unselected] = (
            elements[smallest_unselected], elements[start_of_unselected]
        )
    return SelectionOutput(element_count=number_of_elements,
                           comparisons=comparisons,
                           swaps=swaps,
                           elements=elements)

Some Checks

def check(collection: list, n: int, comparisons: int, swaps: int) -> None:
    """Check that the sort worked

    Args:
     collection: the sorted collection
     n: number of elements in the collection
     comparisons: number of comparisons made
     swaps: number of swaps made

    Raises:
     AssertionError: some check didn't match
    """
    expect(n).to(equal(len(collection)))
    runtime = (n * (n - 1))/2
    expect(comparisons).to(equal(runtime))
    expect(swaps).to(equal(n - 1))
    expect(list(collection)).to(contain_exactly(*list(sorted(collection))))
    return

test = [1, 2, 3]
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

test = [4, 3, 2, 1]
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

COUNT = 1000
test = random.integers(low=0, high=COUNT, size=COUNT)
n, comparisons, swaps, _ = selection_counter(test)
check(test, n, comparisons, swaps)

Run It

So, let's see how it does. We'll set the selection sort up as a numba function and set up the things to sort so that we can compare it to the bubble sort.

numba_selection = njit(selection_counter)
things_to_sort = [random.integers(low=0, high=count, size=count)
                  for count in range(1, 10**5 + 1, 1000)]
with TIMER:
    elements_comparisons_and_swaps = Parallel(n_jobs=-1)(
        delayed(numba_selection)(thing_to_sort)
        for thing_to_sort in things_to_sort)
Started: 2022-01-11 00:05:00.866956
Ended: 2022-01-11 00:05:41.085584
Elapsed: 0:00:40.218628

Let's plot the comparisons and swaps.

SIZE, COMPARISONS, SWAPS = 0, 1, 2
unzipped = list(zip(*elements_comparisons_and_swaps))
count_frame = pandas.DataFrame({"Elements": unzipped[SIZE],
                                "Selection Comparisons": unzipped[COMPARISONS],
                                "Selection Swaps": unzipped[SWAPS]})
base = altair.Chart(count_frame).mark_point().encode(
    x = "Elements",
)

comparisons = base.encode(
    y="Selection Comparisons",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Selection Comparisons", format=","),
             altair.Tooltip("Selection Swaps", format=",")]
).properties(title="Selection Sort Comparisons", width=800, height=250)

swaps = base.mark_point(color="DarkRed").encode(
    x="Elements",
    y="Selection Swaps",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Selection Comparisons", format=","),
             altair.Tooltip("Selection Swaps", format=",")]
).properties(title="Selection Sort Swaps", width=800, height=250)

chart = (comparisons & swaps)

save_it(chart, "selection-sort-comparisons")

Figure Missing

It's important to note the scale of the y-axes here - when I tried putting the comparisons and swaps on the same plot it was pretty much impossible to see the slope of the swaps, even when zoomed way in. Unlike the Bubble Sort, the Selection Sort's swaps have a linear growth instead of a quadratic growth.

Looking at the Swaps

Here's where it might be a little more interesting. We can do the same exercise we did with the bubble sort and plot the actual swaps to see if we can see the sorting in action.

def selection_swaps(elements: Sortable) -> Swaps:
    """Keeps track of the element indexes as they are swapped

    Args:
     elements: list of orderable elements

    Returns:
     dict mapping element to list of indices where it was in the elements list
    """
    swaps = {element: [index] for index, element in enumerate(elements)}

    number_of_elements = len(elements)

    for start_of_unselected in range(number_of_elements - 1):
        smallest_unselected = start_of_unselected

        for next_unselected in range(start_of_unselected + 1,
                                     number_of_elements):
            if elements[next_unselected] < elements[smallest_unselected]:
                smallest_unselected = next_unselected

        elements[start_of_unselected], elements[smallest_unselected] = (
            elements[smallest_unselected], elements[start_of_unselected]
        )

        # record the location of the elements
        for index, element in enumerate(elements):
            swaps[element].append(index)
    return swaps

Because we're tracking the swaps with a dict there can't be any repetitions in the inputs, so I'll use python instead of numpy to make the randomized input since it seems clearer to me.

COUNT = 50

inputs = list(range(COUNT))
shuffle(inputs)
swaps = selection_swaps(inputs)

track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort",
                         value_name="Location In Array", id_vars="Swap")
chart = altair.Chart(melted).mark_line().encode(
    x="Swap",
    y="Location In Array",
    color="Value To Sort:O",
    tooltip=["Swap", "Location In Array", "Value To Sort"]
).properties(
    title="Selection Sort Swaps",
    width=800,
    height=525,
).interactive()

save_it(chart, "selection-sort-swaps")

Figure Missing

Since I put in more inputs than I did with the Bubble Sort, the actual swaps aren't so easy to see, here, but the point of this plot is to show the (imaginary) diagonal line running from the bottom left corner up to the upper right. This shows why it's called a "More of the Output" algorithm - with each loop (represented by a "Swap" on the X-axis) one more sorted item is added to the beginning of the list (the bottom of the chart) from the unsorted part (the section above the imaginary diagonal of the chart) until you end up with a sorted list.

Since the Selection Sort always checks all the items in the "unsorted" section, the lengths of the lines above the diagonal as they move up and down don't really have a significance, they're just single swaps. What matters more is that the section below the diagonal is "sorted" and when picking the next item the algorithm has to check every element above the diagonal to find the next smallest item.

Worst Case

COUNT = 50

inputs = list(reversed(range(COUNT)))
swaps = selection_swaps(inputs)

track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort",
                         value_name="Location In Array", id_vars="Swap")
chart = altair.Chart(melted).mark_line().encode(
    x="Swap",
    y="Location In Array",
    color="Value To Sort:O",
    tooltip=["Swap", "Location In Array", "Value To Sort"]
).properties(
    title="Selection Sort Locations (Worst Case)",
    width=800,
    height=525,
).interactive()

save_it(chart, "selection-sort-locations-worst-case")

Figure Missing

This produces kind of an interesting figure… Since the input list is in exactly the backwards order (from largest element to smallest), and the selection sort traverses the list from start to end and swaps out each element with the smallest element to the right of the current element, we are always swapping out the largest element remaining for the smallest remaining element, so although the algorithm doesn't specifically try to do it, it's putting both the smallest unsorted element in place and the largest unsorted element in place with each swap, so it's actually done sorting at the halfway point. We could, then create a short-circuit version of this as well, to make it quit if there's no element in the unsorted smaller than the elements sorted so far, but as with the bubble sort, this would most-likely only help in outside cases like these.

Sources

Bubble-Sort Revisited

This is a continuation of a series on the Bubble Sort.

In the Beginning

In the previous bubble sort post I implemented the basic bubble sort, but there is a variation of it where you short-circuit the search if the collection is already sorted which is what I'll look at here. The short-circuiting means that the bubble sort can sometimes do better than other methods, but only when the data is mostly sorted so it's still not a practical algorithm. Why do this, then? Eh, why not?

Some Setup

Imports

# python
from functools import partial

# pypi
# from bokeh.models import HoverTool
from expects import be_below, contain_exactly, equal, expect
from joblib import Parallel, delayed
from numba import njit
from numba.typed import List
from numpy.random import default_rng

import altair
import numpy
import pandas

# this project
from graeae.visualization.altair_helpers import output_path, save_chart
from bowling.sort.bubble import bubble, BubbleOutput

# my other stuff
from graeae import Timer

Stuff For Later

TIMER = Timer()
SLUG = "double-bubble-sort"
OUTPUT_PATH = output_path(SLUG)
random = default_rng(2021)

The Sorters

Bubba Sort

This will be the short-circuiting version of the bubble sort.

Bubba

This is the short-circuiter. It works pretty much the same way with just an extra if-statement to break out of the loop if nothing was swapped.

def bubba(elements) -> BubbleOutput:
    """Sorts the list in place

    Args:
     elements: list of (in-place) sortable elements

    Returns:
     number of elements, count of comparisons
    """
    all_but_one = len(elements) - 1
    comparisons = swaps = 0
    for items_sorted in range(all_but_one):
        swapped_at_least_once = False
        for in_front_of_us in range(all_but_one - items_sorted):
            comparisons += 1
            to_the_right = in_front_of_us + 1
            if elements[to_the_right] < elements[in_front_of_us]:
                (elements[in_front_of_us],
                 elements[to_the_right]) = (elements[to_the_right],
                                          elements[in_front_of_us])
                swapped_at_least_once = True
                swaps += 1
        if not swapped_at_least_once:
            break
    return BubbleOutput(len(elements), comparisons, swaps, elements)

Bubble

I showed how I implement the bubble sort in the previous bubble sort post so, since I imported it up above, I'm just going to run it.

Test It Out

Once again, let's just make sure everything works.

inputs = [1, 2, 3, 4, 7, 6, 5]

expected = list(sorted(inputs))

test_1, test_2 = inputs.copy(), inputs.copy()

bubba_output = bubba(test_1)
original_output = bubble(test_2)

expect(test_1).to(contain_exactly(*expected))
expect(test_2).to(contain_exactly(*expected))

n = len(inputs)
worst = (n * (n - 1))/2
expect(bubba_output.comparisons).to(be_below(worst))
expect(bubba_output.comparisons).to(equal(15))

expect(bubba_output.swaps).to(equal(original_output.swaps))
expect(original_output.comparisons).to(equal(worst))

# try a bigger input
inputs = random.choice(list(range(100)), size=100)
expected = list(sorted(inputs))

test_1, test_2 = inputs.copy(), inputs.copy()

bubba_output = bubba(test_1)
original_output = bubble(test_2)

expect(list(test_1)).to(contain_exactly(*expected))
expect(list(test_2)).to(contain_exactly(*expected))

n = len(inputs)
worst = (n * (n - 1))/2
expect(bubba_output.comparisons).to(be_below(worst))
expect(original_output.comparisons).to(equal(worst))

expect(bubba_output.swaps).to(equal(original_output.swaps))
expect(original_output.comparisons).to(equal(worst))

Counting Comparisons

Since the actual method of sorting is the same the swaps should be the same in either case (the new version doesn't quit until there's no more swaps to be done) so I'll just look at the comparisons and see if it made any real difference.

Set Up Numba

bubba = njit(bubba, nogil=True)
bubble = njit(bubble, nogil=True)

Run The Bubble Counter

things_to_sort = [random.choice(list(range(count)), size=count)
                  for count in range(1, 10**5+ 1, 1000)]

with TIMER:
    bubbles_counts_and_comparisons = Parallel(n_jobs=-1)(
        delayed(bubble)(thing_to_sort)
        for thing_to_sort in things_to_sort)
Started: 2022-01-12 23:40:18.730737
Ended: 2022-01-12 23:41:51.143272
Elapsed: 0:01:32.412535

Run the Bubba Counter

with TIMER:
    bubbas_counts_and_comparisons = Parallel(n_jobs=-2)(
        delayed(bubba)(thing_to_sort)
        for thing_to_sort in things_to_sort)
Started: 2022-01-12 23:41:58.766180
Ended: 2022-01-12 23:43:38.475355
Elapsed: 0:01:39.709175

Time-wise it seems to have done about the same as the original bubble sort.

SIZE, COMPARISONS = 0, 1
bubble_unzipped = list(zip(*bubbles_counts_and_comparisons))
bubba_unzipped = list(zip(*bubbas_counts_and_comparisons))
bubba_frame = pandas.DataFrame({"Elements": bubble_unzipped[SIZE],
                                "Bubble": bubble_unzipped[COMPARISONS],
                                "Bubba": bubba_unzipped[COMPARISONS]})

melted = bubba_frame.melt(id_vars=["Elements"],
                          var_name="Sorter",
                          value_name="Comparisons")
chart = altair.Chart(melted).mark_point().encode(
    x="Elements",
    y="Comparisons",
    color="Sorter",
    tooltip=[altair.Tooltip("Elements", format=","), "Sorter",
             altair.Tooltip("Comparisons", format=",")]
).properties(
    title="Bubble Sort Comparisons (with and without short-circuiting)",
    width=900,
).interactive()

save_chart(chart, "bubble_comparisons", output_path=OUTPUT_PATH, height=400)

Figure Missing

At first glance it looks like there's only one grey set of points, but if you zoom in (using the mouse's scroll wheel) you'll see that the grey points are actually (sometimes) created by adjacent points with the short-circuiting version's point (the blue point) slightly lower than the non-short-ciruiting orange point. So there is a difference, but it's small enough that it isn't easy to see.

Let's see if it becomes more obvious using the difference between the points instead.

bubba_frame["Difference"] = (bubba_frame["Bubble"] -
                             bubba_frame["Bubba"])
chart = altair.Chart(bubba_frame).mark_trail().encode(
    x="Elements:O",
    y="Difference:Q",
    size="Difference:Q",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Difference", format=","),
             altair.Tooltip("Bubble", format=","),
             altair.Tooltip("Bubba", format=",")]
).properties(title="Difference In Comparisons Between the Bubble Sorts",
             height=500,
             width=900).interactive()

save_chart(chart, "bubble_differences", output_path=OUTPUT_PATH, height=600)

Figure Missing

So, it looks like there are sometimes significant differences between the two sorts, but how do they appear given the total number of comparisons made?

bubba_frame["Difference/Comparisons"] = bubba_frame.Difference/bubba_frame.Bubba
sub_bubba = bubba_frame[["Elements", "Difference/Comparisons"]]
chart = altair.Chart(sub_bubba).mark_trail().encode(
    x="Elements:O",
    y="Difference/Comparisons:Q",
    size="Difference/Comparisons:Q",
    tooltip=[altair.Tooltip("Elements", format=","),
             altair.Tooltip("Difference/Comparisons", format=".4f")]
).properties(title="Proportion of Difference Between the Bubble Sorts",
             height=500,
             width=900).interactive()

save_chart(chart, "bubble_difference_proportions", output_path=OUTPUT_PATH,
           height=600)

Figure Missing

It looks like as the amount of sorting that's needed goes up, the difference made by the short-circuiting becomes smaller in comparison to the total number of comparisons made, but even the biggest effect proportional to the amount of comparisons made by the short-circuiting sorter only comes to the difference being 0.2% of the total comparisons.

Best Case, Worst Case

Now that we know that the short-circuiting isn't that big a deal I'll add some plotting showing how our random cases compare to the best-case and worst case sorting inputs.

best_things = [numpy.arange(count, dtype=int) for count in range(1, 10**5+ 1, 1000)]
worst_things = [numpy.flip(things) for things in best_things]

with TIMER:
    worst_output = Parallel(n_jobs=-1)(
        delayed(bubba)(thing) for thing in worst_things
)
Started: 2022-01-13 00:01:27.932132
Ended: 2022-01-13 00:02:08.108278
Elapsed: 0:00:40.176146
with TIMER:
    best_output = Parallel(n_jobs=-1)(
        delayed(bubba)(thing) for thing in best_things
)
Started: 2022-01-13 00:08:59.231664
Ended: 2022-01-13 00:09:00.409103
Elapsed: 0:00:01.177439
worst_unzipped = list(zip(*worst_output))
best_unzipped = list(zip(*best_output))

del bubba_frame["Bubble"]
del bubba_frame["Difference"]
del bubba_frame["Difference/Comparisons"]
bubba_frame = bubba_frame.rename(columns={"Bubba": "Random"})
bubba_frame["Worst Case"] = worst_unzipped[COMPARISONS]
bubba_frame["Best Case"] = best_unzipped[COMPARISONS]

melted = bubba_frame.melt(id_vars=["Elements"],
                          var_name="Input",
                          value_name="Comparisons")
chart = altair.Chart(melted).mark_point().encode(
    x="Elements",
    y="Comparisons",
    color="Input"
).properties(
    width=800,
    height=550,
    title="Bubble Sort Best, Worst, and Random Input Comparisons"
).interactive()

save_chart(chart, "best-worst-random", output_path=OUTPUT_PATH, height=600)

Figure Missing

Once again we run into the problem/case that it looks like the Random input is missing, but if you zoom way in you'll see it does slightly better than the worst-case, but the difference is so small that when you zoom back out they look like they did exactly the same. The best-case does much better, since it only has to pass through the data once to see that it's already sorted and then quit (which isn't really reflected in the plot because our increment of comparisons only happens if the inner loop is entered), but except for this unusual case, random inputs don't do enough better than the worst-case examples to make short-circuiting a noteworthy improvement.

The End, The End

Well, I think I've beaten this dead horse enough for now. The overview for all the posts is here, if you didn't get enough of it. Up next: Selection Sort.

Bubble Sort: An Empirical Look

This is a continuation of a series on the Bubble Sort. In the previous post I made an implementation of the Bubble Sort in python. In this post I'll run it with inputs of varying sizes to see how its runtime changes as well as do a plot to get a sense of how it rearranges the items its sorting as it works.

An Empirical Look

Using math to figure out the number of comparisons is nice and all, but let's do some counting using real code. I'm going to re-implement the bubble sort but this time keeping track of the comparisons and swaps.

Some Setup

# python
from collections.abc import MutableSequence
from collections import namedtuple
from functools import partial

import random
from numpy.random import default_rng

# pypi
from bokeh.models import HoverTool
from expects import be_above_or_equal, contain_only, equal, expect
from joblib import Parallel, delayed
from numba import njit

import hvplot.pandas
import pandas

# my stuff
from graeae import EmbedHoloviews, Timer

Stuff For Later

TIMER = Timer()
SLUG = "bubble-sort-an-empirical-look"
PATH = f"files/posts/{SLUG}/"
Embed = partial(EmbedHoloviews, folder_path=PATH)

Numba Random

numba_random = default_rng(2021)

A Counter

This is going to be a reimplementation of the Bubble Sort function but with variables added to keep track of the counts. I originally did this class-based so I could re-use the code but once I switched from pypy to numba that started to get problematic, and the functions are short anyway so I guess it's all the same.

Some Types

ElementCount = int
ComparisonCount = int
SwapCount = int
SortedElements = MutableSequence
Counts = tuple[ElementCount, ComparisonCount, SwapCount, SortedElements]

BubbleOutput = namedtuple("BubbleOutput",
                          [
                              "element_count", "comparisons",
                              "swaps",
                              "elements"])

The Sorting Function (With Counts)

def bubble(elements: MutableSequence) -> Counts:
    """Sorts the list in place

    Args:
     elements: list of (in-place) sortable elements

    Returns:
     number of elements, count of comparisons, count of swaps, sorted elements
    """
    all_but_one = len(elements) - 1
    comparisons = swaps = 0
    for items_sorted in range(all_but_one):
        for in_front_of_us in range(all_but_one - items_sorted):
            comparisons += 1
            to_the_right = in_front_of_us + 1
            if elements[to_the_right] < elements[in_front_of_us]:
                (elements[in_front_of_us],
                 elements[to_the_right]) = (elements[to_the_right],
                                          elements[in_front_of_us])
                swaps += 1
    return BubbleOutput(len(elements), comparisons, swaps, elements)

One thing to note is that in the algorithm the outer loop stops at \(n-2\), which it also does here, but since the range function doesn't include the last number our argument is only \(n-1\), which might look confusing at first, and might sound even more confusing when you read this, but the point is they're the same even though they might not look exactly the same.

Testing the Counter

def check_non_decreasing(elements: list) -> None:
    """Checks that all the elements in the list are non-decreasing

    Args:
     elements: list of comparable sorted items

    Raises:
     AssertionError if something is out of order
    """
    last_one = elements[0]
    for index in range(1, n):
        this_one = elements[index]
        expect(this_one).to(be_above_or_equal(last_one))
        last_one = this_one
    return

A Random Input

Because we aren't using any kind of short-circuiting to stop once the data is sorted, it will always loop \(\Omega(n^2)\), but the number of swaps will depend on the input.

n = 1000
inputs = random.choices(list(range(n)), k=n)

output = bubble(inputs)

sorted_elements = output.elements
check_non_decreasing(sorted_elements)

expected_runtime = expected_swaps = (n * (n - 1))/2
expect(output.comparisons).to(equal(expected_runtime))

The Worst Case

In the worst case where it's in the exact opposite sorted order (non-increasing instead of non-decreasing) the number of comparisons should equaly the number of swaps.

output = bubble(list(reversed(list(range(n)))))
check_non_decreasing(output.elements)
expect(output.comparisons).to(equal(expected_runtime))
expect(output.swaps).to(equal(expected_swaps))

The Tracker

To visualize what the sort is doing I'm going to update the bubble sort to keep track of the order of the items as it's being sorted. I'm using a dictionary to map the list values to their locations so we can't use inputs where there are repeated elements or more than one list of locations will be mapped to a element value.

IndexHistory = list[int]
ElementValue = int
Swaps = dict[ElementValue, IndexHistory]
def swap_tracker(elements: MutableSequence) -> Swaps:
    """Does the bubble-sort and tracks the locations

    Args:
     elements: list of orderable items

    Returns:
     dict of element value: list of indices it was at during sort
    """
    all_but_one = len(elements) - 1

    swaps = {element: [index] for index, element in enumerate(elements)}

    for items_sorted in range(all_but_one):
        for in_front_of_us in range(all_but_one - items_sorted):
            to_the_right = in_front_of_us + 1
            if elements[to_the_right] < elements[in_front_of_us]:
                (elements[in_front_of_us],
                 elements[to_the_right]) = (elements[to_the_right],
                                          elements[in_front_of_us])
                for index, element in enumerate(elements):
                    swaps[element].append(index)
    return swaps

First a little sanity check just to make sure it still works.

inputs = [6, 3, 4, 1]

swaps = swap_tracker(inputs)
expect(len(swaps)).to(equal(len(inputs)))
n = 100
inputs = list(range(n))
random.shuffle(inputs)

swaps = swap_tracker(inputs)
expect(swaps.keys()).to(contain_only(*inputs))

check_non_decreasing(inputs)

Try Them Out

Comparisons

Let's look at how the comparisons and swaps change as the input gets bigger. To speed this up I'm going to run the sort in numba.

numba_bubble = njit(bubble)
runs = {}
things_to_sort = [numba_random.integers(low=0, high=count, size=count)
                  for count in range(1, 10**5+ 1, 1000)]

with TIMER:
    comparisons_and_swaps = Parallel(n_jobs=-1)(
        delayed(numba_bubble)(thing_to_sort)
        for thing_to_sort in things_to_sort)
Started: 2021-12-22 02:49:12.003456
Ended: 2021-12-22 02:50:44.485043
Elapsed: 0:01:32.481587

Now we'll plot it.

SIZE, COMPARISONS, SWAPS, ELEMENTS = 0, 1, 2, 3
unzipped = list(zip(*comparisons_and_swaps))
bubba_frame = pandas.DataFrame({"Elements": unzipped[SIZE],
                                "Comparisons": unzipped[COMPARISONS],
                                "Swaps": unzipped[SWAPS]})
bubba_frame["n^2"] = bubba_frame["Elements"]**2
tooltips_comparisons = [
    ("Elements", "@Elements{0,}"),
    ("Comparisons", "@Comparisons{0,}")
]

tooltips_swaps = [
    ("Elements", "@Elements{0,}"),
    ("Swaps", "@Swaps{0,}")
]

tooltips_n2 = [
    ("Elements", "@Elements{0,}"),
    ("n^2", "@{n^2}{0,}")
]

hover_comparisons = HoverTool(tooltips=tooltips_comparisons)
hover_swaps = HoverTool(tooltips=tooltips_swaps)
hover_n2 = HoverTool(tooltips=tooltips_n2)

swap_plots = bubba_frame.hvplot(x="Elements", y="Swaps").opts(
    tools=[hover_swaps])
comparison_plots = bubba_frame.hvplot(x="Elements", y="Comparisons").opts(
    tools=[hover_comparisons])
n_squared_plot = bubba_frame.hvplot(x="Elements", y="n^2").opts(
    tools=[hover_n2])

plot = (swap_plots * comparison_plots * n_squared_plot).opts(
    title="Comparisons, Swaps and n-squared Counts",
    height=700, width=800)
output = Embed(plot=plot, file_name="bubble_sort_comparisons")()

Figure Missing

The top line (yellow) is the square of the size of the inputs, the middle (red) is the number of comparisons, and the bottom line (blue) is the number of swaps. If you hover over the lines you can see that each line is roughly double the one below it - there are around twice as many comparisons as swaps for a given input and \(n^2\) is around twice as big as the comparison count for a given input.

Swaps

The comparisons and swaps are pretty much what we expected to see, they just confirm the theoretical assessment, but now let's look at plotting the swaps as they occur to see if we can understand what the bubble sort is doing.

COUNT = 20
inputs = random.sample(list(range(COUNT)), k=COUNT)
swaps = swap_tracker(inputs)

# swaps = {str(key): value for key, value in tracker.swaps.items()}
track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort", value_name="Location In Array", id_vars="Swap")

tooltips = [
    ("Item to Sort", "@{Value To Sort}"),
    ("Swap", "@{Swap}"),
    ("Current Location", "@{Location In Array}")
]

hover = HoverTool(tooltips=tooltips)

ticks = [(index, index) for index in range(COUNT)]
plot = melted.hvplot(x="Swap", y="Location In Array",
                     by="Value To Sort").opts(tools=[hover],
                                              show_legend=False,
                     width=800, height=700, yticks=ticks,
                            title="Bubble Sort Swaps",)


output = Embed(plot=plot, file_name="bubble_sort_swaps")()

Figure Missing

Aside:

HoloViews seems to not let you set the Tooltips if you use multiple columns, which is why I went through all the rigamarole of melting it. If you just plot it as the DataFrame with each column being one of the tracked locations for a sort value (e.g. the column name is '1' and the rows are the positions in the array at each swap) then the plot comes out okay, but the labels are kind of confusing.

The plot is hopefully a useful way to figure out what's going on. If you look at the unsorted values you can see that once they are the largest of the unsorted values, they "bubble up" in a diagonal but straight line. Before this plot I would have said that the largest elements are the ones that get sorted first, but if you look at the plot (assuming I don't re-run it and change the arrangements) and in particular you look at the least-valued elements (1 and 2) you can see that they reach their final position fairly early, just by virtue of being in a position to get pushed down.

Worst Case

The random-input gives an interesting view of how the algorithm might work in practice, but let's look at the worst-case input where the values are in the opposite of the sorted order.

COUNT = 20
inputs = list(reversed(range(COUNT)))
swaps = swap_tracker(inputs)

track_frame = pandas.DataFrame(swaps)
re_indexed = track_frame.reset_index().rename(columns={"index": "Swap"})
melted = re_indexed.melt(var_name="Value To Sort", value_name="Location In Array", id_vars="Swap")


ticks = [(index, index) for index in range(COUNT)]
plot = melted.hvplot(x="Swap", y="Location In Array", cmap="blues",
                     by="Value To Sort").opts(show_legend=False,
                     width=800, height=700, yticks=ticks,
                            title="Bubble Sort Swaps (Worst Case)",)


output = Embed(plot=plot, file_name="bubble_sort_worst_swaps")()

Figure Missing

This image gives an even better sense of the way that the bubble sort works. Since the Bubble Sort uses a left-to-right traversal and swapping as you go, the largest values shoot up to their final positions in straight lines, while the lesser values get pushed down a little with each traversal until they reach the correct position.

Onward

The final post in this series (maybe) will look at a variation on the Bubble Sort that can improve the performance in special cases.

Bubble Sort: The Implementation

This is part of a series on Bubble Sort that began with a look at the Bubble Sort Algorithm and continues now with a python translation of the algorithm.

Some Setup

Imports

# python
from random import choices

# pypi
from expects import contain_exactly, equal, expect

Bubble Sort

The Implementation

Here's a straight-forward translation of the pseudocode to python.

def bubble_sort(unsorted: list) -> list:
    """Sorts the unsorted

    Args:
     unsorted: mutable collect of orderable items

    Returns:
     original list sorted in ascending order
    """
    all_but_one = len(unsorted) - 1
    for items_bubbled_up in range(all_but_one):
        for left_hand in range(all_but_one - items_bubbled_up):
            right_hand = left_hand + 1
            if unsorted[right_hand] < unsorted[left_hand]:
                (unsorted[left_hand],
                 unsorted[right_hand]) = (unsorted[right_hand],
                                          unsorted[left_hand])
    return unsorted

Try It Out

inputs = [6, 3, 4, 5, 0]
expect(bubble_sort(inputs)).to(contain_exactly(0, 3, 4, 5, 6))

population = list(range(1000))
scrambled = choices(expected, k=1000)
expect(bubble_sort(scrambled)).to(contain_exactly(*list(sorted(scrambled))))

Well, that wasn't so exciting, but that's because I'm going to look more at how the sort performs in the next post.

Up Next

In the next post I'll run the function over some inputs to see how it performs.

Bubble Sort: Runtime Explained

What This Is

The number of comparisons for the Bubble Sort is equal to the number of times the loops run, so we get \(\Theta(n^2)\). I originally only did a truncated version of showing how you get \(n^2\) but then I had to struggle a little to remember what was going on when I looked back at it so I'm going to break it down more here.

Start With the Loops

We have an outer loop that runs from \(0\ldots n-2\) and an inner loop that goes from \(0 \dots n - 2 - i \) giving us a summation that looks like this

\[ C(n) = \sum_{i=0}^{n-2} \sum_{j=0}^{n - 2 - i} 1 \]

It's summing \(1\) because there's one comparison in each of the loops.

Decompose the Inner Sum

So first we have to recognize that since we're summing \(1\) (the comparison) what we're doing is adding \(1\) for every item in the summation which is the equivalent of counting how many numbers there are from \(0\) to \(n - 2 - i\). When you have a sequence of consecutive integers (say 5, 6, 7, 8) the way to count the number there are in the sequence is \(\textit{last} - \textit{first} + 1\). So for 5, 6, 7, 8 we have \(8 - 5 + 1 = 4\) so there are four numbers in the sequence. If we apply that to the sequence created by the inner summation we get this

\begin{align} C(n) &= \sum_{i=0}^{n-2} \sum_{j=0}^{n - 2 - i} 1\\ &= \sum_{i=0}^{n-2} (n - 2 - i) - 0 + 1\\ &= \sum_{i=0}^{n-2} n - 1 - i\\ \end{align}

Break Apart the Polynomial

With summations, if you are summing a polynomial, then you can break additions and subtractions apart into separate summations. Additionally, any multiplications can be moved outside of the summation. So now we get

\begin{align} C(n) &= \sum_{i=0}^{n-2} n - 1 - i\\ &= n \sum_{i=0}^{n-2} 1 - \sum_{i=0}^{n-2} 1 - \sum_{i=0}^{n-2} i\\ \end{align}

That Last Term

If you squint at the first two terms of that last line you can see that we're summing up \(1\) just like we did with the inner loop so it's going to be a similar outcome just with a different ending point (and we'll also multiply by n for the first term). But that last one is a little trickier. It's a summation of i rather than \(1\) so it'll be \(0 + 1 + \cdots + (n-1) + (n-2)\). This is a sequence that happens often enough that if you do this sort of thing a lot you'll just remember it, but I don't do it a lot and I'm rather of a forgetful bent anyway so I'll show how I remember to get it. First you have to remember that the sum of terms is the same no matter the order you put the terms in, so the summation comes out the same even when it's reversed. What we'll do is add the sequence with its reverse term by term we to get something like this

\begin{array}{ccccccccc} & 0 & + & 1 & + & \cdots & + & (n - 3) & + & (n - 2) \\ + & (n - 2) & + & (n - 3) & + & \cdots & + & 1 & + & 0 \\ \hline & (n - 2) & + & (n - 2) & + & \cdots & + & (n - 2) & + & (n - 2)\\ \end{array}

And using our counting equation we have

\begin{align} end - start + 1 &= n - 2 - 0 + 1 \\ &= n - 1 \end{align}

So we have \(n - 1\) terms in that sum we got by adding the reverse (each term being \(n - 2\)), but since we added the sequence with its reverse it's now twice as big as it should be so we need to halve it, and the third term of our summation becomes

\[ \frac{(n - 1)(n - 2)}{2} \]

Put the Three Terms Back Together

Now, going back to where we were in the summation - if do the expansion of the first two terms using the counting equation and add the third term from the previous section, we get this

\begin{align} C(n) &= n \sum_{i=0}^{n-2} 1 - \sum_{i=0}^{n-2} 1 - \sum_{i=0}^{n-2} i\\ &= n(n - 2 - 0 + 1) - (n - 2 - 0 + 1) - \frac{(n - 1)(n - 2)}{2}\\ &= n(n - 1) - (n - 1)- \frac{n^2 -2n - n + 2}{2}\\ &= (n^2 - n) - (n - 1) - \frac{n^2 - 3n + 2}{2}\\ &= \frac{2(n^2 - n)}{2} - \frac{2(n - 1)}{2} - \frac{n^2 - 3n + 2}{2}\\ &= \frac{2 n^2 - 2n - 2n + 2 - n^2 + 3n -2}{2}\\ &= \frac{n^2 - n}{2} \in \Theta(n^2) \end{align}

Oy. Now don't forget next time.

Re-Orienting

This is part of a series on Bubble Sort. More specifically, it expands on the post looking at the Bubble Sort Algorithms.

Bubble Sort: The Algorithm

This is part of a series on the Bubble Sort. In this post we'll look at the algorithm without worrying about the implementation details.

\(\def\sc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}}\)

The Parts

  • Specification: This is an instance of The Sorting Problem.
  • Basic Steps: Swap adjacent elements that are out of order.
  • Measure of Progress: The number of items sorted so far.
  • The Loop Invariant: The number of times the loop has been performed (k) is the number of elements at the end of the items which are the largest elements in sorted order.
  • Main Steps: Move from the start of the list swapping adjacent items that are out of order until the largest previously unsorted item is at the beginning of the sorted items.
  • Make Progress: The main step reduces the unsorted by 1.
  • Maintain the Loop Invariant:
    • \(\langle Loop-Invariant \rangle\): k items at the end of the list are sorted.
    • \(\lnot \langle Exit-Condition \rangle\): There is more than one item that is unsorted.
    • \(\textsc{CodeLoop}\): The k+1 item was the largest of the unsorted items and is now at the beginning of the sorted items.
  • Establish the Loop Invariant: Initially no items are swapped.
  • Exit Condition: Stop when there is only one item in the unsorted items.
  • Ending:
    • \(\langle Exit-Condition \rangle\): All but the smallest item are in the upper list.
    • \(\langle Loop-Invariant \rangle\): All the items in the upper list are sorted in non-decreasing order.
    • \(\langle Post-Condition \rangle \): The list is sorted in non-decreasing order.

The Algorithm

Pseudocode

The Bubble Sort algorithm is a Brute-Force sort which works by repeatedly traversing the input array, checking adjacent values and swapping them if they are out of order. This has the effect of "bubbling-up" the largest unsorted value, thus the name, probably (see Astrachan, 2003 for some history on it). Here's the algorithm in pseudocode.

\begin{algorithm}
\caption{BubbleSort}
\begin{algorithmic}
\INPUT An array of orderable items
\OUTPUT The array sorted in ascending order
\PROCEDURE{BubbleSort}{$A$}
  \FOR{$i \gets 0$ to $n - 2$}
    \FOR{$j \gets 0$ to $n - 2 - i$}
      \IF{$A[j+1] < A[j]$}
        \STATE swap $A[j]$ and $A[j+1]$
      \ENDIF
    \ENDFOR
  \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

If you squint a little you'll see that there's only really four lines of code, two for-loops, an if-then conditional, and a "swap", the rest is noise from the algorithmic formatting.

In translation, the \(i\) variable holds the amount of items at the end of the list that have been sorted so far, and the \(j\) variable moves from the start of the list up to the end of the unsorted (since the outer loop goes up by one each time and the inner loop is subtracting the outer loop's value when finding where to stop it goes up one less entry with each loop). Then the conditional says that if the item to the right of where we currently are is smaller than the item in front of us, swap the two items so that the bigger item is on the right.

Diagram

nil

The three colors in the diagram represent:

  • Blue: The outer for-loop
  • Red: The inner for-loop
  • Green: The transitioning to and from the inner loop to the outer loop.

A Simple Example

Let's step through how the Bubble Sort works with five items to sort.

  • The Array To Sort: A = [4, 5, 2, 3, 1]
  • n = 5
  • n - 1 = 4
  • n - 2 = 3

As a Table

# Sorted All Sorted? Last Unsorted     Exit Inner Loop?       Swap?  
i i == n - 1 n - 2 - i   j j > n - 2 - i A[j]   A[j + 1] A[j] > A[j + 1] A
0 \(\ne n - 1\) 3 \(\nless\) 0   4 < 5   4, 5, 2, 3, 1
      \(\nless\) 1   5 > 2 Swap 4, 2, 5, 3, 1
      \(\nless\) 2   5 > 3 Swap 4, 2, 3, 5, 1
      \(\nless\) 3   5 > 4 Swap 4, 2, 3, 1, 5
      < 4 Exit Loop          
1 \(\ne n - 1\) 2 \(\nless\) 0   4 > 2 Swap 2, 4, 3, 1, 5
      \(\nless\) 1   4 > 3 Swap 2, 3, 4, 1, 5
      \(\nless\) 2   4 > 1 Swap 2, 3, 1, 4, 5
      < 3 Exit Loop          
2 \(\ne n - 1\) 1 \(\nless\) 0   2 < 3   2, 3, 1, 4, 5
      \(\nless\) 1   3 > 1 Swap 2, 1, 3, 4, 5
      < 2 Exit Loop          
3 \(\ne n - 1\) 0 \(\nless\) 0   2 > 1 Swap 1, 2, 3, 4, 5
      < 1 Exit Loop          
4 \(== n - 1\) (Done)                  

Some Column Notes:

  • i is the count of items in the sorted section at the end of the array
  • i == n - 1: When there's only one item left in the unsorted section there's nothing to swap so the array is sorted.
  • n - 2 - i is the index of the item at the end of the unsorted section right before the sorted section of the array.
  • j > n - 2 - i: When j moves into the already sorted section restart the inner loop.
  • A[j] > A[j + 1]: If the item to the right of j is smaller than the item at j then swap them.

Assessing the Damage

Although it's nice to know that the sort works we're really not as concerned about how correct it is, as much as we are interested in how it performs. There's two things we can count:

  1. The Number of comparisons
  2. The Number of swaps.

The fact that you have those two loops makes it pretty likely that it's going to be \(\Theta\left(n^2\right)\) but since Bubble Sort is mostly an academic example let's work it out.

Comparisons

The number of comparisons is equal to the number of times the loops run, so we get \(\Theta\left(n^2\right)\). I made some notes on how I got that in this post.

Swaps

The number of swaps will depend on how the inputs are arranged, but in the worst case where the array is sorted backwards, every comparison will produce a swap so you'll end up with the same bounds as the comparisons.

\begin{align} S_{worst-case} &= C(n)\\ &= \frac{n^2 - n}{2} \in \Theta(n^2) \end{align}

Onward

The next post will look at translating the algorithm to python.

Sources

How To Think About Algorithms

Contents

  • Introduction

Part One: Iterative Algorithms and Loop Invariants

  1. Iterative Algorithms: Measures of Progress and Loop Invariants
  2. Examples Using More-of-the-Input Loop Invariants
  3. Abstract Data Types
  4. Narrowing the Search Space: Binary Search
  5. Iterative Sorting Algorithms
  6. Euclid's GCD Algorithm
  7. The Loop Invariant for Lower Bounds

Part Two: Recursion

  1. Abstractions, Technique, and Theory
  2. Some Simple Examples of Recursive Algorithms
  3. Recursion on Trees
  4. Recursive Images
  5. Parsing with Context-Free Grammars

Part Three: Optimization Problems

  1. Definition of Optimization Problems
  2. Graph Search Algorithms
  3. Network Flows and Linear Programming
  4. Greedy Algorithms
  5. Recursive Backtracking
  6. Dynamic Programming Algorithms
  7. Examples of Dynamic Programming
  8. Reductions and NP-Completeness
  9. Randomized Algorithms

Part Four: Appendix

  1. Existential and Universal Quantifiers
  2. Time Complexity
  3. Logarithms and Exponentials
  4. Asymptotic Growth
  5. Adding-Made-Easy Approximations
  6. Recurrence Relations
  7. A Formal Proof of Correctness

Part Five: Exercise Solutions

Conclusion

Citation

  • [HTTAA] Edmonds J. How to think about algorithms. Cambridge ; New York: Cambridge University Press; 2008. 448 p.

Introduction to the Design & Analysis of Algorithms (Levitin)

Levitin A. 2007. Introduction to the design & analysis of algorithms. 2nd ed. Boston: Pearson Addison-Wesley.

Main Chapters

  1. Introduction
  2. Fundamentals of the Analysis of Algorithm Efficiency
  3. Brute Force
  4. Divide-and-Conquer
  5. Decrease-and-Conquer
  6. Transform-and-Conquer
  7. Space and Time Tradeoffs
  8. Dynamic Programming
  9. Greedy Technique
  10. Iterative Improvement
  11. Limitations of Algorithm Power
  12. Coping with the Limitations of Algorithm Power
  13. Useful Formulas for the Analysis of Algorithms
  14. Short Tutorial on Recurrence Relations

Essential Algorithms

Citation

  • Stephens R. Essential algorithms: a practical approach to computer algorithms using Python and C#. 2nd ed. Indianapolis: John Wiley; 2019.

Introduction To Algorithms (CLRS)

Chapters

Foundations

The Role of Algorithms in Computing

  • Algorithms
  • Algorithms as a technology

Getting Started

  • Insertion Sort
  • Analyzing algorithms
  • Designing algorithms

Growth of Functions

  • Asymptotic notation
  • Standard notations and common functions

Divide-and-Conquer

  • The maximum-subarray problem
  • Strassen's algorithm for matrix multiplication
  • The substitution method for solving recurrences
  • The recursion-tree method for solving recurrences
  • The master method for solving recurrences
  • Proof of the master theorem

Probabilistic Analysis and Randomization Algorithms

  • The hiring problem
  • Indicator Random Variables
  • Randomized Algorithms
  • Probabilistic analysis and further uses of indicator random variables

Sorting and Order Statistics

Heapsort

  • Heaps
  • Maintaining the heap property
  • Building a heap
  • The heapsort algorithm
  • Priority queues

Quicksort

  • Description of quicksort
  • Performance of quicksort
  • A randomized version of quicksort
  • Analysis of quicksort

Sorting in Linear Time

  • Lower bounds for sorting
  • Counting sort
  • Radix sort
  • Bucket sort

Medians and Order Statistics

  • Minimum and maximum
  • Selection in expected linear time
  • Selection in worst-case linear time

Data Structures

Elementary Data Structures

  • Stacks and Queues
  • Linked Lists
  • Implementing pointers and objects
  • Representing rooted trees

Hash Tables

  • Direct-address tables
  • Hash tables
  • Hash functions
  • Open addressing
  • Perfect hashing

Binary Search Trees

  • What is a binary search tree?
  • Querying a binary search tree
  • Insertion and deletion
  • Randomly built binary search trees

Red-Black Trees

  • Properties of red-black trees
  • Rotations
  • Insertion
  • Deletion

Augmenting Data Structures

  • Dynamic order statistics
  • How to augment a ddata structure

Advanced Design and Analysis Techniques

Dynamic Programming

  • Rod cutting
  • Matrix-chain multiplication
  • Elements of dynamic programming
  • Longest common subsequence
  • Optimal binary search trees

Greedy Algorithms

  • An activity-selection problem
  • Elements of the greedy strategy
  • Huffman codes
  • Matroids and greedy methods
  • A task-scheduling problem as a matroid

Amortized Analysis

  • Aggregate analysis
  • The accounting method
  • The potential method
  • Dynamic tables

Advanced Data Structures

B-Trees

  • Definition of B-trees
  • Basic operations on B-trees
  • Deleting a key from a B-tree

Fibonacci Heaps

  • Structure of Fibonacci heaps
  • Mergeable-heap operations
  • Decreasing a key and deleting a node
  • Bounding the maximum degree

van Emde Boas Trees

  • Preliminary approaches
  • A recursive structure
  • The van Emde Boas Tree

Data Structures for Disjoint Sets

  • Disjoint-set operations
  • Linked-list representations of disjoint sets
  • Disjoint-set forests
  • Analysis of uninon by rank with path compression

Graph Algorithms

Elementary Graph Algorithms

  • Representations of graphs
  • Breadth-first search
  • Depth-first search
  • Topological sort
  • Strongly connected components

Minimum Spanning Trees

  • Growing a minimum spanning tree
  • The algorithms of Kruskal and Prim

Single-Source Shortest Paths

  • The Bellman-Ford algorithm
  • Single-source shortest paths in directed acyclic graphs
  • Dijkstra's algorithm
  • Difference constraints and shortest paths
  • Proofs of shortest-paths properties

All-Pairs Shortest Paths

  • Shortest paths and matrix multiplication
  • The Floyd-Warshall algorithm
  • Johnson's algorithm for sparse graphs

Maximum Flow

  • Flow networks
  • The Ford-Fulkerson method
  • Maximum bipartite matching
  • Push-relabel algorithms
  • The relabel-to-front algorithm

Selected Topics

Multithreaded Algorithms

  • The basics of dynamic multithreading
  • Multithreaded matrix multiplication
  • Multithreaded merge sort

Matrix Operations

  • Solving systems of linear equations
  • Inverting matrices
  • Symmetric positive-definite matrices and least-squares approximation

Linear Programming

  • Standard and slack forms
  • Formulating problems as linear programs
  • The simplex algorithm
  • Duality
  • The initial basic feasible solution

Polynomials and the FFT

  • Representing polynomials
  • The DFT and FFT
  • Efficient FFT Implementations

Number-Theoretic Algorithms

  • Elementary number-theoretic notions
  • Greatest common divisor
  • Modular arithmetic
  • Solving modular linear equations
  • The Chinese remainder theorem
  • Powers of an element
  • The RSA public-key cryptosystem
  • Primality testing
  • Integer factorization

String Matching

  • The naive string-matching algorithm
  • The Rabin-Karp algorithm
  • String matching with finite automata
  • The Knuth-Morris-Pratt algorithm

Computational Geometry

  • Line-segment properties
  • Determining whether any pair of segments intersects
  • Finding the convex hull
  • Finding the closest pair of points

NP-Completeness

  • Polynomial time
  • Polynomial-time verification
  • NP-completeness and reducibility
  • NP-completeness proofs
  • NP-complete problems

Approximation Algorithms

  • The vertex-coverc problem
  • The traveling-salesman problem
  • The set-covering problem
  • Randomization and linear programming
  • The subset-sum problem

Appendix: Mathematical Background

Summations

  • Summation formulas and properties
  • Bounding summations

Sets, Etc.

  • Sets
  • Relations
  • Functions
  • Graphs
  • Trees

Counting and Probability

  • Counting
  • Probability
  • Discrcete random variables
  • The geometric and binomial distributions
  • The tails of the binomial distribution

Matrices

  • Matrices and matrix operations
  • Basic matrix properties

Citation

  • Cormen TH, editor. Introduction to algorithms. 3rd ed. Cambridge, Mass: MIT Press; 2009. 1292 p.