The power of the merge sort

Why this algorithm is so powerful?

Because:

  • You can modify it in order to reduce the memory footprint, in a way that you don’t create new arrays but you directly modify the input array.

Note: this kind of algorithms is called in-place.

  • You can modify it in order to use disk space and a small amount of memory at the same time without a huge disk I/O penalty. The idea is to load in memory only the parts that are currently processed. This is important when you need to sort a multi-gigabyte table with only a memory buffer of 100 megabytes.

Note: this kind of algorithms is called external sorting.

  • You can modify it to run on multiple processes/threads/servers.

For example, the distributed merge sort is one of the key components of Hadoop (which is THE framework in Big Data).

  • This algorithm can turn lead into gold (true fact!).

This sorting algorithm is used in most (if not all) databases but it’s not the only one. If you want to know more, you can read this research paper that discusses the pros and cons of the common sorting algorithms in a database.

results matching ""

    No results matching ""