Including DFS, BFS, Binary Search Trees & Self Balancing Trees.

In part 1 of this series I looked at common search and sort algorithms used on lists. Part 2 focused on hash functions, sets and maps. In this post I will look at *trees*, *depth and breadth first search*, *binary search trees* and *self balancing trees*.

## Basic Terminology

A *tree* is essentially an extension of a linked list. It consists of a set of connected elements, called *nodes*. In contrast to *linked lists* a *node* can be connected to multiple other *nodes* but there can be no cycles. A cycle is more than 1 way to get to a node. The first element is called the *root* and elements that contain no *descendants* are called *leaf nodes*. The connections between *nodes* are called *edges* and a group of *edges* is called a *path*

*Levels* describe how many connections it takes to reach the *root*, which resides at level 1. The *height* of a *node* describes the number of *edges* between it and the furthest *leaf node*. A *leaf node* will always have a height of 0. On the flip side, the *depth* of a *node* is the number *edges* to the *root node*.

## Tree Traversal

There are 2 types of traversal techniques, *depth first search (DFS)* and *breadth first search (BFS)*.

### Depth First Search

*DFS* prioritises searching down child nodes before exploring other nodes on the same level (*siblings*). Several DFS algorithms exist:

#### Pre-Order DFS

This algorithm first checks the node before visiting its children. It starts by visiting the left child and keeps going left, checking off nodes as it goes, until a leaf node is reached. At this point you step back to the parent and traverse to the right child.

#### In-Order DFS

Similar to Pre-Order it traverses down the left first but only checks a node once it has reached a leaf node. Then returns to the parent to check it off before traversing down the right. Its called "In-Order" because it traverses in order from left to right.

#### Post-Order DFS

Post-Order does not check off a node until all of its children have been checked off. Move down the left side until you reach a leaf node then back up and to the right before checking off the parent node.

### Breadth First Search

*BFS* explores all the nodes at the same level before traversing deeper into the tree. *Level order traversal* is a BFS algorithm that conventionally starts on the left and moves to the right.

## Search, Insert & Delete

A Binary tree has at most 2 children and no real order. Searching the tree using either DFS or BFS happens in linear time, $O(n)$.

Inserting a node when there is no order in the tree is done by traversing the tree until an open position is found. Worst case is when the deepest path in the tree is picked to traverse first. In that case you will travel through the number of nodes equal to the *height* of the tree. Each level of a tree has the capacity to store nodes equivalent to a power of 2, each level can have twice as many nodes as the level above. Insertion can happen in $O(log(n))$ time, since that defines the height of the tree.

When deleting a node, the children of that node should be considered, 3 potential scenarios exist:

*The target node has only 1 child*- Simply promote the child to the position of the target.*The target node has 2 children*- Promote either to the target position.*The target node's children have children*- Traverse until a leaf node is found and promote it to the target position.

## Binary Search Trees (BST)

Binary search trees define some order to the nodes. Every child on the left of a node is smaller than the current node and every child on the right is bigger. ($ \text{left_child} \lt \text{current_node} \lt \text{right_child}$). This characteristic of BST's enables quick search by comparing the target number to a node and moving left if its smaller or right if its bigger. Search complexity is equal to the height of the tree, $O(log(n))$. The same applies for insertion.

A complication that can arise is when the tree is *unbalanced*. An *unbalanced BST* has a distribution of nodes that are skewed to one direction, worst case being simply a linked list (all the nodes are on one side). In that scenario search will happen in linear time, $O(n)$.

## Self-Balancing Trees

A *Self-Balancing Tree* is a BST that tries to minimize the number of levels it uses by applying an algorithm during insertion and deletion to keep itself balanced. The most common of these are *Red-Black Trees*.

### Red-Black Tree Rules

Since a Red-Black tree is still a BST, all of the BST rules apply in addition to the following:

**Rule 1:** Each node must have an additional colour property which is either *red* or *black*.

**Rule 2** All Leaf Nodes must *have Black null leaf nodes*. This could be on both right and left or just left or right.

**Rule 3** If a node is red, all its children must be black.

**Rule 4 (Optional)** Root node must be black.

**Rule 5** The path from a node to all its descendant null nodes must contain the same number of black nodes.

### Insertion

Insertion is done by inserting a red node (following normal BST rules) in the tree and then recognising which of the following scenarios apply to the current state of the tree (Note that applying one case could immediately trigger another case):

**Case 1:**Inserting the first node.- Change the colour to black since its the root node.
- Add 2 black null leaf nodes
**Case 2:**Adding a red node where parent is black.- Since a red node is being added, Rule 5 is not violated.
- Add 2 black null leaf nodes
**Case 3:**Adding a red node where the parent is red.- If the parent and its sibling are red, change their colour to black and change the grandparent to red. This maintains the number of black nodes in a path.
- Add 2 black null leaf nodes
- If the rules are violated by changing the grandparent, you can treat it as a newly inserted element and apply the rules.
**Case 4:**Adding a red node where the parent is red and sibling is black. The parent is on the opposite side of the child ( inserted node is a right child and the parent is a left child)- Perform a left rotation
**Case 5:**Adding a red node where the parent is red and its sibling is black, but both the parent and the inserted node are on the same side (both left or right children)- Suppose both parent and inserted node are on the left, rotate parent to the right and switch the colour.

Worst case scenario for insertion happens in $O(log(n))$.

Try out this great tool to practice applying these rules.

## Heaps

A *heap* is a kind of tree where the elements are arranged by increasing (*Min heap*) or decreasing (*Max heap*) order, such that the root element is either the min or max value. This allows lookup of the root node (*peek*) to be in $O(1)$. In a max heap scenario a node must always be greater than its children, the opposite applies for a min heap. In a heap each node can have more than 2 children, but it must always be a *complete tree*, meaning that all the levels are full except potentially the last level. Values are added from the left to the right.

Searching a heap cannot use comparison to decide which way to traverse, worst case is that the entire tree must be traversed which is a linear operation, $O(n)$.

### Heapify & Extract

*Heapify* is an operation used during insertion. Start by adding a new value to the next open spot, then compare its value to the parent, swapping if its larger (max heap scenario). Repeat until the new value is in the right position.

*Extract* operation is applied when the root is removed. Find the right most leaf and place it in the root position. Then compare it to its children and swap where necessary.

Worst case of both *heapify* and *extract* is when a node needs to move all the way up or down the tree, which will result in $O(log(n))$ time complexity.

## References

https://www.cs.usfca.edu/~galles/visualization/RedBlack.html