Heap!

Beginning

What's a Heap?

This is a post that looks at Heaps. According to Introduction To Algorithms (CLRS), a heap is an array that can be thought of as a nearly complete binary tree which satisfies the Heap Property. There are actually two heap properties, one for a Max Heap and one for a Min Heap.

  • Max-Heap Property: \(A[Parent(i)]) \ge A[i]\)
  • Min-Heap Property: \(A[Parent(i)]) \le A[i]\)

Which means that for any node in a MaxHeap, its value is greater than or equal to that of any of its children, and for any node in a MinHeap, the node's value is less than or equal to its children.

The heap has (at least) two attributes.

  • A.length: the number of elements in the array
  • A.heap-size: the number of elements in the heap (not all elements in the array need to be in the heap)

The first element in the heap is the root (for the Max-Heap it is the largest element, for the Min-Heap it is the smallest).

Some Functions

For any Node in the Heap located at i:

\begin{algorithm}
\caption{Parent}
\begin{algorithmic}
\INPUT The index of a Child Node
\OUTPUT The index of the Child's Parent
\PROCEDURE{Parent}{$i$}
 \RETURN {\(\left \lfloor \frac{i}{2} \right \rfloor\)}
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\caption{Left}
\begin{algorithmic}
\INPUT The index of a Parent Node
\OUTPUT The index of the Parent's left child node
\PROCEDURE{Left}{$i$}
 \RETURN 2i
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

These Functions show the relationship between the location of a node and its children in the array that makes up the heap.

Some Requirements

There are also two requirements:

  • Shape Requirement: All levels but the last have to be complete (only the rightmost leaves can be incomplete)
  • Dominance Requirement: Every node is greater than or equal to its children (in the Max-Heap, less than or equal for the Min-Heap)

That Dominance Requirement is also covered by the Heap Property.

Heap Properties

These are from Introduction to the Design & Analysis of Algorithms (Levitin).

  1. There exists exactly one essentially complete binary tree with n nodes. It's a binary tree so its height is \(\lfloor \log_2 n\rfloor\).
  2. The root of a (Max) Heap always has the largest element.
  3. A node of a heap with all its descendants is also a heap.
  4. A heap implemented as an array with the heap's elements stored at \(1 \ldots n\) can be constructed using:
    • the parental node keys will be the first \(\lfloor \frac{n}{2} \rfloor\) elements while the leaf nodes will be in the last \(\lceil \frac{n}{2} \rceil\) positions
    • The childern of a parent node at position i will be in positions 2i and 2i + 1.
    • The parent of a child node at position i will be in position \(\lfloor \frac{i}{2}\rfloor\).

Property four is what was defined in the functions up above. Levitin also gives a slightly different variation on stating the heap property.

\[ H[i] \ge \max{H[2i], H[2i + 1]} \textrm{ for } i= 1, \ldots, \lfloor \frac{n}{2} \rfloor. \]

This is sort of an inversion of the Max-Heap Property stated above. While the Max-Heap Property was given as every node is less than or equal to its parent, this says that every parent is greater than or equal to its children.

The Max Heap

The Imports

# python
from collections.abc import MutableSequence
from functools import partial
from typing import Generator

import random

# pypi
from expects import contain_exactly, equal, expect, raise_error

# software under test
from bowling.data_structures.heap import MaxHeap

from graeae import EmbedHoloviews
SLUG = "max-heap"
path = f"files/posts/{SLUG}/"
Embed = partial(EmbedHoloviews, folder_path=path)

The Class

Imports

# pypi
# https://www.attrs.org/en/stable/index.html
from attrs import define

The Definition

Besides declaring the class definition, the MaxHeap will hold some constants to hopefully make the code easier to read.

@define
class MaxHeap:
    """Build and maintain a max-heap

    If you pass in the heap as a list pre-pend it with Infinity

    Otherwise use ~heap = MaxHeap.from_list(elements)~ to build it
    """
    INFINITY = float("inf")
    NEGATIVE_INFINITY = -INFINITY
    ROOT_NODE = 1

    heap: list = [INFINITY]
    _size: int = None

From List

This is a class method to create the list for the heap. The calculations for the locations in the array of parent and child nodes is easier if we have a 1-based list so I'll pad the list being passed in. Additionally, I'll set the first value to \(\infty\) so that the Heap Property will pass for the root node without needing any special consideration for what its parent is.

@classmethod
def from_list(cls, heap: list):
    """Builds a max-heap instance from the starter list

    Args:
     heap: list of elements to dump on the heap

    Returns:
     MaxHeap instance with the heap list added
    """
    return cls(heap = [cls.INFINITY] + heap)

The Heap Size

Since we padded the list holding the heap the length of the list will never be the same as the number of nodes in the heap. Additionally we'll sometimes manipulate things so things that were in the heap are later excluded but still in the list so the size property will help us to keep track of how big we think our heap is.

@property
def size(self) -> int:
    """The size of the max heap"""
    if self._size is None:
        self._size = len(self.heap) - 1
    return self._size

@size.setter
def size(self, new_size) -> int:
    """Set the size of the max heap

    Args:
     new_size: how much of the list is in the heap

    Raises:
     AssertionError if the size is out of bounds for the list
    """
    assert 0 <= new_size <= self.length
    self._size = new_size
    return

Length

This is the length of the array, not necessarily of the heap. This one's a little tricky, since the padding throws it off by one it would seem that it should be lowered by one, but if you use it to figure out the last index that will mess it up a little. Since we already have the size for the number of nodes I'll just pass the length of the list on and see what happens.

I was debating whether to use __len__ but I decided that this is really an internal measure and size is meant to be the attribute to use. I'm mostly keeping this around so that it matches the CLRS attributes.

@property
def length(self) -> int:
    """The size of the array for the heap

    Warning:
     This includes the padding at the beginning of the list
    """
    return len(self.heap)

Maximum

Since this is a max-heap the largest element is in the root, this just makes getting it more explicit.

@property
def maximum(self):
    """The value in the root node"""
    return self.heap[self.ROOT_NODE]

Finding the Parent, Left-Child, and Right-Child of a Node

These are the implementations of the functions at the start of the post.

def parent(self, node: int) -> int:
    """Find the parent of a node

    Args:
     node: the index of the node to check

    Returns:
     the index of the parent of the node
    """
    return node//2
def left_child(self, parent: int) -> int:
    """Find the left child of a parent

    Args:
     parent: the index of the parent node

    Returns:
     index of the left child of the parent
    """
    return 2 * parent
def right_child(self, parent: int) -> int:
    """Find the right child of a parent

    Args:
     parent: the index of the parent node

    Returns:
     index of the right child of the parent
    """
    return 2 * parent + 1

Heapify A Sub Tree

CLRS just calls this MaxHeapify. But then the bottoms-up heapification of the tree seemed more like it should be called heapify so I called it heapify_subtree to note that it starts at a specific node which might not be the root.

def heapify_subtree(self, node: int):
    """Heapify the tree rooted at the node

    Args:
     node: index of the node to compare to its descendants
    """
    left, right = self.left_child(node), self.right_child(node)
    largest = node

    if left <= self.size and self.heap[left] > self.heap[largest]:
        # the left child was larger than the current parent node
        largest = left

    if right <= self.size and self.heap[right] > self.heap[largest]:
        # the right child is larger than the left and the current parent
        largest = right

    if largest != node:
        # the current parent is out of place, swap it with the larger child
        self.heap[node], self.heap[largest] = (self.heap[largest],
                                               self.heap[node])

        # after the swap the item at "largest" is the value from the 
        # "node" we started with so try it again with this new location
        self.heapify_subtree(largest)
    return

The Call

This makes the MaxHeap callable and heapifies the entire heap using a bottoms-up construction.

def __call__(self):
    """Heapifies the heap

    Raises:
     AssertionError: something bad happened and the Heap Property failed
    """
    for parent in reversed(range(1, self.size//2 + 1)):
        self.heapify_subtree(parent)

    self.check_rep()
    return

Increase a Key

When we change the value of a node, if the value is higher than the previous value it might be in the wrong place in the heap (e.g. it might be bigger than its parent) so we need to traverse upward, swapping it with parents smaller than it, until we find where it should go. CLRS made it a requirement that the new value is larger than the old one, which makes sense in light of the name IncreaseKey, but it seems to me that you could just call it ChangeKey and use a conditional instead of raise an exception, but I'll stick with the error for now.

def increase_key(self, node, key):
    """Increase the node's value

    Args:
     node: index of node in heap to change
     key: new value for the node

    Raises:
     AssertionError if new value isn't larger than the previous value
    """
    assert key > self.heap[node], (f"{key} not greater than previous value {self.heap.node}")
    self.heap[node] = key

    while (node > self.ROOT_NODE and
           self.heap[self.parent(node)] < self.heap[node]):
        self.heap[node], self.heap[self.parent(node)] = (
            self.heap[self.parent(node)], self.heap[node])
        node = self.parent(node)
    return

Insert a Value

CLRS describes insert and increase_key as part of updating a priority queue, but Levitin's description of top-down heap construction seems to use them as an alternative way to create the heap. He describes this method of construction (top-down) as starting with an empty heap and repeatedly inserting elements from the original array until you have a heap.

def insert(self, key):
    """Insert the key into the heap

    Args:
     key: orderable item to insert into the heap
    """
    self.size += 1
    self.heap[self.size - 1] = self.NEGATIVE_INFINITY
    self.increase_key(self.size - 1, key)
    return

Check the Heap Property

This checks that the Heap Property holds for all the nodes.

def check_rep(self) -> None:
    """Checks the heap property

    Raises:
     AssertionError: the heap property has been violated
    """
    for node in range(1, self.size):
        assert self.heap[self.parent(node)] >= self.heap[node], (
            f"Parent node {self.parent(node)} = {self.heap[self.parent(node)]} "
            f"not >= {node}={self.heap[node]}")
    return

Get and Set Item

I threw these in because I kept forgetting that the heap is an attribute of the MaxHeap, but it's only for convenience.

def __getitem__(self, node: int):
    """Gets an item from the heap

    Args: 
     node: index of the heap to get the value
    """
    return self.heap[node]

def __setitem__(self, node, value):
    """Sets the value at the node in the heap

    Args:
     node: index of the heap to set the value
     value: what to set the location in the heap to
    """
    self.heap[node] = value
    return

The Tests

start = [10, 20, 5]
max_heap = MaxHeap.from_list(heap=start)

expect(max_heap.heap).to(equal([max_heap.INFINITY] + start))
expect(max_heap.size).to(equal(3))
expect(max_heap.length).to(equal(4))

expect(max_heap.parent(1)).to(equal(0))
expect(max_heap.parent(2)).to(equal(1))
expect(max_heap.parent(3)).to(equal(1))

expect(max_heap.left(1)).to(equal(2))
expect(max_heap.right(1)).to(equal(3))

def failure(): max_heap.check_rep()

expect(failure).to(raise_error(AssertionError))

expect(max_heap.maximum).to(equal(10))
start = [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]
heap = MaxHeap.from_list(start)
expect(heap.maximum).to(equal(16))

heap.heapify_subtree(2)
expect(heap[2]).to(equal(14))

heap.heapify_subtree(1)
expect(heap.maximum).to(equal(16))
expect(heap[2]).to(equal(14))
expect(heap[4]).to(equal(8))
expect(heap[9]).to(equal(4))
start = [10, 20, 30, 40]
heap = MaxHeap.from_list(start)
heap()
expect(heap.maximum).to(equal(40))
start = [1, 2, 3, 4, 7, 8, 9, 10, 14, 16]
heap = MaxHeap.from_list(start)
expect(heap.maximum).to(equal(1))

heap()
expect(heap.maximum).to(equal(16))

Heap Sort

<<heapsort-imports>>

<<heap-sort>>

Imports

# pypi
# https://www.attrs.org/en/stable/index.html
from attrs import define

# this project
from bowling.data_structures.heap import MaxHeap

The Heap Sort Class

The HeapSort uses the fact that a Max Heap always has the largest element at the root and repeatedly puts the root at the end of the list then shrinks the heap so it doesn't include the value that was moved over.

@define
class HeapSort:
    """Sort using a heap

    Args:
     items: collection of items for the sort
    """
    items: list
    _heap: MaxHeap=None

    @property
    def heap(self) -> MaxHeap:
        """The heap of items"""
        if self._heap is None:
            self._heap = MaxHeap.from_list(self.items)
            self._heap()
        return self._heap

    @property
    def without_root(self) -> list:
        """The items without the root """
        return self.heap.heap[self.heap.ROOT_NODE:]

    def __call__(self):
        """sorts the items"""
        self.heap()
        for node in range(self.heap.size, 1, -1):
            self.heap.heap[self.heap.ROOT_NODE], self.heap.heap[node] = (
                self.heap.heap[node],
                self.heap.maximum)
            self.heap.size -= 1
            self.heap()
        return

The Tests

from bowling.sort.heap import HeapSort
k = 100
items = random.choices(range(k), k=k)
sorter = HeapSort(items.copy())

sorter()

items.sort()
expect(sorter.without_root).to(contain_exactly(*items))

A Priority Queue

Although some books mention that MinHeaps are used for priority queues, CLRS shows a MaxHeap version. This involves adding a couple of methods to the MaxHeap so there's no special class.

The Tests

items = [1, 2, 3]
heap = MaxHeap.from_list(items)
heap()

def failure(): heap.increase_key(2, 0)
expect(failure).to(raise_error(AssertionError))

heap.increase_key(2, 5)
expect(heap.maximum).to(equal(5))

items = [1, 2, 3, 4, 7, 8, 9, 10, 14, 16]
heap = MaxHeap.from_list(items)
heap()
heap.increase_key(9, 15)
expect(heap[heap.left(1)]).to(equal(15))

heap.insert(20)
expect(heap.size).to(equal(len(items) + 1))
expect(heap.maximum).to(equal(20))

Plotting

from networkx import Graph
import holoviews

graph = Graph()
for node in range(1, len(heap.heap)//2+ 1):
    if heap.left(node) < heap.length:
        graph.add_edge(heap.heap[node], heap.heap[heap.left(node)])
    if heap.right(node) < heap.length:
        graph.add_edge(heap.heap[node], heap.heap[heap.right(node)])
positions = networkx.drawing.nx_pydot.graphviz_layout(graph, prog="dot")

plot = holoviews.Graph.from_networkx(graph, positions)

output = Embed(plot=plot, file_name="heap-plot")()

Figure Missing

End