Recently I am reading books about trees, which are data structures that can be used to stored data elements. Among various types of trees, balanced binary search tree (I will use BBST for short in this post) is a notable one. The time complexity of BBST for insertion, deletion and search are all O(logN).
Firstly, BBST is a binary tree. Every node in the tree has two children, one pointing to the left subtree and one pointing to the right subtree. It also contains one data element.
Secondly, BBST is a binary search tree. Every node in the tree contains the data element that is larger than all the data elements contained in the left subtree, and is smaller than all the data elements contained in the right subtree. This kind of data structure can greatly reduce the effort that is used to search a data element.
Thirdly, BBST is a balanced binary tree. The difference between the heights of the left and right subtree of every node is not more than a fixed value. This can prevent the binary tree from becoming a linked list, which can largely reduce the efficiency of the data structure operation, e.g. insert, delete and search.
AVLTree and RedBlackTree are two popular BBST. Every node in AVLTree contains the height of the node. The difference between the heights of the left and right subtree of every node in AVLTree is not allowed to be more than one. This rule enforces the balance of the AVLTree. The search operation in AVLTree will only take one top-down pass through the tree. However, the insertion and deletion operation in AVLTree will first take one top-down pass through the tree to insert or delete the node, then take another down-top pass through the tree to calculate and update the height of each passed node, and rotate the tree if needed. This extra down-top pass will cause a little penalty in performance. Because of the relatively strict balance characteristics of the AVLTree, the search operation will be very efficient. Let’s assume there are 1,000,000 elements in the AVLTree. The height of the AVLTree is about 20. It will be very efficient to find an element in the AVLTree.
RedBlackTree is another BBST. In China, the question about how the RedBlackTree keeps a binary search tree balanced appears in the interview with software developers now and then. Many developers complained that the question is really hard to answer. Even if I read two books about RedBlackTree, I also could hardly prove why the RedBlackTree could keep a binary search tree balanced. Just as Robert Lofore, the author of the book “Data Structures and Algorithms in Java”, said in the book, some very clever people invented the RedBlackTree. He mentioned that it is not obvious how the rules of the RedBlackTree lead to a balanced tree, but they really do. (^_^) In my opinion, for the theorems or rules that have been widely adopted and proved, we can just use the theorems or rules in the engineering without proving them in person, if they are really hard to prove. Using the theorems or rules in the engineering is relatively a different thing from proving the theorems or rules. Proving a theorem or rule that is mysterious will distract the attention and energy of engineers. We can just make use of it and benefit from it in the engineering.
There are four rules in the RedBlackTree. They are as below:
- each node either has color red or black
- the root node always has color black
- if a node has color red, its children must have color black
- every path from the root node to the leaf node or a null child must have the same number of the black nodes.
As long as the RedBlackTree enforces these rules, the RedBlackTree is a balanced binary search tree. However, the balance of the RedBlackTree is not as good as that of the AVLTree. The height of the RedBlackTree may be 2log(N+1) in the worst case. Even though the balance of the RedBlackTree is not so restrict, the RedBlackTree is still a relatively good BBST. The time complexity of insertion, deletion and search is still O(logN).
There are two ways to implement insertion and deletion for RedBlackTree. One is top-down, while the other is down-top. However, the down-top method is less efficient. It requires a top-down pass to insert or delete the node, then a down-top pass to enforce the red black rules. However, the top-down method only requires one top-down pass to insert or delete the node, and at the same time to enforce the red black rules. There is one interesting thing. I checked the source code of TreeMap in JDK 1.8, after I read books about RedBlackTree. I found that the TreeMap in JDK 1.8 uses the less efficient down-top method, which is a reference implementation introduced by the book ‘Introduction to Algorithms’.
For insertion, I recommend the book ‘Data Structures and Algorithms in Java, second edition’ by Robert Lafore. It uses a whole chapter to describe how to insert an element into a RedBlackTree in detail. It is easy to understand the insertion algorithm. However, it skips the deletion algorithm. For deletion, I recommend the book ‘Data Structures and Problem Solving Using Java, fourth edition’ by Mark Allen Weiss. It briefly describes how to delete an element from a RedBlackTree.
When I use google to search materials about RedBlackTree, I find a remarkable blog post as below:
https://eternallyconfuzzled.com/red-black-trees-c-the-most-common-balanced-binary-search-tree
It uses some tricks to implement the RedBlackTree. For example, it uses a two-element array to store the root node of the left and right subtree, instead of two object references. It also use a variable to represent insertion or deletion index in the array. You may take the blog post for further detail of the implementation.
Because the RedBlackTree can use only one top-down pass to complete the insertion and deletion, and can avoid the height computation that is needed by the AVLTree, the RedBlackTree is faster than the AVLTree for the insertion and deletion. It is a better choice for BBST.