Johan Zietsman

Johan Zietsman


I'm Johan Zietsman, and welcome to my software blog, where I'm committed to sharing valuable insights, best practices, and in-depth tutorials.

Curious software developer, motorcycle enthusiast, rugby fanatic and an avid gamer. My code always works sometimes.

Share


Tags


Johan Zietsman

Tech Interview CheatSheet - Trees

Overview of Trees, including depth first search (DFS), breadth first search (BFS), Binary Search Trees & Self Balancing Trees.

In part 1 of this series I looked at common search and sort algorithms used on lists. Part 2 focused on hash functions, sets and maps. In this post I will look at trees, depth and breadth first search, binary search trees and self balancing trees.

Basic Terminology

A tree is essentially an extension of a linked list. It consists of a set of connected elements, called nodes. In contrast to linked lists a node can be connected to multiple other nodes but there can be no cycles. A cycle is more than 1 way to get to a node. The first element is called the root and elements that contain no descendants are called leaf nodes. The connections between nodes are called edges and a group of edges is called a path

Levels describe how many connections it takes to reach the root, which resides at level 1. The height of a node describes the number of edges between it and the furthest leaf node. A leaf node will always have a height of 0. On the flip side, the depth of a node is the number edges to the root node.

wide

Tree Traversal

There are 2 types of traversal techniques, depth first search (DFS) and breadth first search (BFS).

Depth First Search

DFS prioritises searching down child nodes before exploring other nodes on the same level (siblings). Several DFS algorithms exist:

Pre-Order DFS

This algorithm first checks the node before visiting its children. It starts by visiting the left child and keeps going left, checking off nodes as it goes, until a leaf node is reached. At this point you step back to the parent and traverse to the right child.

In-Order DFS

Similar to Pre-Order it traverses down the left first but only checks a node once it has reached a leaf node. Then returns to the parent to check it off before traversing down the right. Its called "In-Order" because it traverses in order from left to right.

Post-Order DFS

Post-Order does not check off a node until all of its children have been checked off. Move down the left side until you reach a leaf node then back up and to the right before checking off the parent node.

Breadth First Search

BFS explores all the nodes at the same level before traversing deeper into the tree. Level order traversal is a BFS algorithm that conventionally starts on the left and moves to the right.

Search, Insert & Delete

A Binary tree has at most 2 children and no real order. Searching the tree using either DFS or BFS happens in linear time, $O(n)$.

Inserting a node when there is no order in the tree is done by traversing the tree until an open position is found. Worst case is when the deepest path in the tree is picked to traverse first. In that case you will travel through the number of nodes equal to the height of the tree. Each level of a tree has the capacity to store nodes equivalent to a power of 2, each level can have twice as many nodes as the level above. Insertion can happen in $O(log(n))$ time, since that defines the height of the tree.

When deleting a node, the children of that node should be considered, 3 potential scenarios exist:

Binary Search Trees (BST)

Binary search trees define some order to the nodes. Every child on the left of a node is smaller than the current node and every child on the right is bigger. ($ \text{left_child} \lt \text{current_node} \lt \text{right_child}$). This characteristic of BST's enables quick search by comparing the target number to a node and moving left if its smaller or right if its bigger. Search complexity is equal to the height of the tree, $O(log(n))$. The same applies for insertion.

A complication that can arise is when the tree is unbalanced. An unbalanced BST has a distribution of nodes that are skewed to one direction, worst case being simply a linked list (all the nodes are on one side). In that scenario search will happen in linear time, $O(n)$.

Self-Balancing Trees

A Self-Balancing Tree is a BST that tries to minimize the number of levels it uses by applying an algorithm during insertion and deletion to keep itself balanced. The most common of these are Red-Black Trees.

Red-Black Tree Rules

Since a Red-Black tree is still a BST, all of the BST rules apply in addition to the following:

Rule 1: Each node must have an additional colour property which is either red or black.

Rule 2 All Leaf Nodes must have Black null leaf nodes. This could be on both right and left or just left or right.

Rule 3 If a node is red, all its children must be black.

Rule 4 (Optional) Root node must be black.

Rule 5 The path from a node to all its descendant null nodes must contain the same number of black nodes.

Insertion

Insertion is done by inserting a red node (following normal BST rules) in the tree and then recognising which of the following scenarios apply to the current state of the tree (Note that applying one case could immediately trigger another case):

Worst case scenario for insertion happens in $O(log(n))$.

Try out this great tool to practice applying these rules.

Heaps

A heap is a kind of tree where the elements are arranged by increasing (Min heap) or decreasing (Max heap) order, such that the root element is either the min or max value. This allows lookup of the root node (peek) to be in $O(1)$. In a max heap scenario a node must always be greater than its children, the opposite applies for a min heap. In a heap each node can have more than 2 children, but it must always be a complete tree, meaning that all the levels are full except potentially the last level. Values are added from the left to the right.

Searching a heap cannot use comparison to decide which way to traverse, worst case is that the entire tree must be traversed which is a linear operation, $O(n)$.

Heapify & Extract

Heapify is an operation used during insertion. Start by adding a new value to the next open spot, then compare its value to the parent, swapping if its larger (max heap scenario). Repeat until the new value is in the right position.

Extract operation is applied when the root is removed. Find the right most leaf and place it in the root position. Then compare it to its children and swap where necessary.

Worst case of both heapify and extract is when a node needs to move all the way up or down the tree, which will result in $O(log(n))$ time complexity.

References

https://www.cs.usfca.edu/~galles/visualization/RedBlack.html

Curious software developer, motorcycle enthusiast, rugby fanatic and an avid gamer. My code always works sometimes.

View Comments