monkey codes
monkey codes

Random bits of knowledge and laughable mistakes from a real world code monkey.

Curious software developer, motorcycle enthusiast, rugby fanatic and biltong connoisseur. My code always works sometimes.

Share


Tags


Twitter


monkey codes

Tech Interview CheatSheet - Maps & Hashing

Johan ZietsmanJohan Zietsman

In Part 1 of this series I looked at common search and sorting algorithms used on Lists. This post will look at Hash functions and how they are applied to Sets and Maps to offer constant time lookup performance.

Hash Functions

A Hash function takes a value and produces a number that is often used as an index in an Array.

For example, when storing large random numbers, a simple Hash function could take the last few digits, those most likely to change often, and divide it by some constant using the remainder as an index (Modulo or % operator).

Collisions

Collisions happen when the Hash function returns the same value for 2 different inputs. There are 2 ways to address Collisions:

Load Factor

$$ \text{load_factor} = \text{no_of_entries} \div \text{no_of_buckets}$$

Gives a sense of how full a Hash Table is. A Load Factor of 0.01 means only 10 values are stored in a Hash Table with a 1000 buckets, a huge waste of memory.

Load Factor can be used to determine when to rehash, as it approaches 0 the Hash Table is sparsely populated. On the flip side, as it approaches 1 it is densely populated and more buckets should be added, since a value larger than 1 will certainly cause a Collision.

Question: If a Hash function divides by 100 and uses the remainder as a key, how many buckets does that provide?

Answer: Remainders will range from 0 - 99, thus the function will yield a 100 Buckets.

Sets

Sets are similar to Lists in that it stores a collection of values, but with 2 key distinctions:

A typical implementation of Sets, called HashSets, uses Hash functions to store the elements. This type of Set offers constant time performance ( $O(1)$ ) for basic operations like add, remove and contains.

Hash Maps

One of the main applications of Hash functions are Hash Maps. Both the key and value are stored in the position provided by the function. Since keys in a Map are part of a Set, they are unique, allowing the Hash function to be designed in a way that gives each key its own bucket.

String keys

A String can be converted into a number by using something like ASCII codes. Java favours large hash codes over the risk of collisions by using Hash function similar to this: ($s[x]$ would contain the ASCII code for the character in position x of the String):

$$s[0]\times 31^\text{(n-1)} + s[1]\times 31^\text{(n-2)} +\ ...\ +s[n-1]$$

A simple Hash table implementation in python:

In part 3 I will look at trees.

Curious software developer, motorcycle enthusiast, rugby fanatic and biltong connoisseur. My code always works sometimes.