Buffer-Replacement strategies

Most modern databases (at least SQL Server, MySQL, Oracle and DB2) use an LRU algorithm.

LRU

LRU stands for Least Recently Used. The idea behind this algorithm is to keep in the cache the data that have been recently used and, therefore, are more likely to be used again.

Here is a visual example:

For the sake of comprehension, I’ll assume that the data in the buffer are not locked by latches (and therefore can be removed). In this simple example the buffer can store 3 elements:

  • 1: the cache manager uses the data 1 and puts the data into the empty buffer
  • 2: the CM uses the data 4 and puts the data into the half-loaded buffer
  • 3: the CM uses the data 3 and puts the data into the half-loaded buffer
  • 4: the CM uses the data 9. The buffer is full so data 1 is removed since it’s the last recently used data . Data 9 is added into the buffer
  • 5: the CM uses the data 4. Data 4 is already in the buffer therefore it becomes the first recently used data again .
  • 6: the CM uses the data 1. The buffer is full so data 9 is removed since it’s the last recently used data . Data 1 is added into the buffer

This algorithm works well but there are some limitations. What if there is a full scan on a large table? In other words, what happens when the size of the table/index is above the size of the buffer? Using this algorithm will remove all the previous values in the cache whereas the data from the full scan are likely to be used only once.

Improvements

To prevent this to happen, some databases add specific rules. For example according to Oracle documentation:

“For very large tables, the database typically uses a direct path read, which loads blocks directly […], to avoid populating the buffer cache. For medium size tables, the database may use a direct read or a cache read. If it decides to use a cache read, then the database places the blocks at the end of the LRU list to prevent the scan from effectively cleaning out the buffer cache.”

There are other possibilities like using an advanced version of LRU called LRU-K. For example SQL Server uses LRU-K for K =2.

This idea behind this algorithm is to take into account more history. With the simple LRU (which is also LRU-K for K=1), the algorithm only takes into account the last time the data was used. With the LRU-K:

  • It takes into account the K last times the data was used .
  • A weight is put on the number of times the data was used
  • If a bunch of new data is loaded into the cache, the old but often used data are not removed (because their weights are higher).
  • But the algorithm can’t keep old data in the cache if they aren’t used anymore.
  • So the weights decrease over time if the data is not used .

The computation of the weight is costly and this is why SQL Server only uses K=2. This value performs well for an acceptable overhead.

For a more in-depth knowledge of LRU-K, you can read the original research paper (1993): The LRU-K page replacement algorithm for database disk buffering.

Other algorithms

Of course there are other algorithms to manage cache like

  • 2Q (a LRU-K like algorithm)
  • CLOCK (a LRU-K like algorithm)
  • MRU (most recently used, uses the same logic than LRU but with another rule)
  • LRFU (Least Recently and Frequently Used)

Some databases let the possibility to use another algorithm than the default one.

results matching ""

    No results matching ""