Tree and database index

A binary search tree is a binary tree with a special property, the key in each node must be:

  • greater than all keys stored in the left sub-tree
  • smaller than all keys stored in the right sub-tree

Let’s see what it means visually

The idea

This tree has N=15 elements. Let’s say I’m looking for 208:

  • I start with the root whose key is 136. Since 136<208, I look at the right sub-tree of the node 136.
  • 398>208 so, I look at the left sub-tree of the node 398
  • 250>208 so, I look at the left sub-tree of the node 250
  • 200<208 so, I look at the right sub-tree of the node 200. But 200 doesn’t have a right subtree, the value doesn’t exist (because if it did exist it would be in the right subtree of 200)

Now let’s say I’m looking for 40

  • I start with the root whose key is 136. Since 136>40, I look at the left sub-tree of the node 136.
  • 80>40 so, I look at the left sub-tree of the node 80
  • 40= 40, the node exists. I extract the id of the row inside the node (it’s not in the figure) and look at the table for the given row id.
  • Knowing the row id let me know where the data is precisely on the table and therefore I can get it instantly.

In the end, both searches cost me the number of levels inside the tree. If you read carefully the part on the merge sort you should see that there are log(N) levels. So the cost of the search is log(N), not bad!

Back to our problem

But this stuff is very abstract so let’s go back to our problem. Instead of a stupid integer, imagine the string that represents the country of someone in the previous table. Suppose you have a tree that contains the column “country” of the table:

  • If you want to know who is working in the UK
  • you look at the tree to get the node that represents the UK
  • inside the “UK node” you’ll find the locations of the rows of the UK workers.

This search only costs you log(N) operations instead of N operations if you directly use the array. What you’ve just imagined was a database index.

You can build a tree index for any group of columns (a string, an integer, 2 strings, an integer and a string, a date …) as long as you have a function to compare the keys (i.e. the group of columns) so that you can establish an orderamong the keys (which is the case for any basic types in a database).

B+Tree Index

Although this tree works well to get a specific value, there is a BIG problem when you need to get multiple elementsbetween two values. It will cost O(N) because you’ll have to look at each node in the tree and check if it’s between these 2 values (for example, with an in-order traversal of the tree). Moreover this operation is not disk I/O friendly since you’ll have to read the full tree. We need to find a way to efficiently do a range query. To answer this problem, modern databases use a modified version of the previous tree called B+Tree. In a B+Tree:

  • only the lowest nodes (the leaves) store information (the location of the rows in the associated table)
  • the other nodes are just here to route to the right node during the search.

As you can see, there are more nodes (twice more). Indeed, you have additional nodes, the “decision nodes” that will help you to find the right node (that stores the location of the rows in the associated table). But the search complexity is still in O(log(N)) (there is just one more level). The big difference is that the lowest nodes are linked to their successors.

With this B+Tree, if you’re looking for values between 40 and 100:

  • You just have to look for 40 (or the closest value after 40 if 40 doesn’t exist) like you did with the previous tree.
  • Then gather the successors of 40 using the direct links to the successors until you reach 100.

Let’s say you found M successors and the tree has N nodes. The search for a specific node costs log(N) like the previous tree. But, once you have this node, you get the M successors in M operations with the links to their successors. This search only costs M + log(N) operations vs N operations with the previous tree. Moreover, you don’t need to read the full tree (just M + log(N) nodes), which means less disk usage. If M is low (like 200 rows) and N large (1 000 000 rows) it makes a BIG difference.

But there are new problems (again!). If you add or remove a row in a database (and therefore in the associated B+Tree index):

  • you have to keep the order between nodes inside the B+Tree otherwise you won’t be able to find nodes inside the mess.
  • you have to keep the lowest possible number of levels in the B+Tree otherwise the time complexity in O(log(N)) will become O(N).

I other words, the B+Tree needs to be self-ordered and self-balanced. Thankfully, this is possible with smart deletion and insertion operations. But this comes with a cost: the insertion and deletion in a B+Tree are in O(log(N)). This is why some of you have heard that using too many indexes is not a good idea. Indeed, you’re slowing down the fast insertion/update/deletion of a row in a table since the database needs to update the indexes of the table with a costly O(log(N)) operation per index. Moreover, adding indexes means more workload for the transaction manager (we will see this manager at the end of the article).

For more details, you can look at the Wikipedia article about B+Tree. If you want an example of a B+Tree implementation in a database, look at this article and this article from a core developer of MySQL. They both focus on how innoDB (the engine of MySQL) handles indexes.

Note: I was told by a reader that, because of low-level optimizations, the B+Tree needs to be fully balanced.

results matching ""

    No results matching ""