Real optimizers

[You can skip to the next part, what I’m going to say is not important]

But, all this blabla is very theoretical. Since I’m a developer and not a researcher, I like concrete examples.

Let’s see how the SQLite optimizer works. It’s a light database so it uses a simple optimization based on a greedy algorithm with extra-rules to limit the number of possibilities:

  • SQLite chooses to never reorder tables in a CROSS JOIN operator
  • joins are implemented as nested joins
  • outer joins are always evaluated in the order in which they occur
  • Prior to version 3.8.0, SQLite uses the “Nearest Neighbor” greedy algorithm when searching for the best query plan

Wait a minute … we’ve already seen this algorithm! What a coincidence!

  • Since version 3.8.0 (released in 2015), SQLite uses the “ N Nearest Neighborsgreedy algorithm when searching for the best query plan

Let’s see how another optimizer does his job. IBM DB2 is like all the enterprise databases but I’ll focus on this one since it’s the last one I’ve really used before switching to Big Data.

If we look at the official documentation, we learn that the DB2 optimizer let you use 7 different levels of optimization:

  • Use greedy algorithms for the joins
    • 0 – minimal optimization, use index scan and nested-loop join and avoid some Query Rewrite
    • 1 – low optimization
    • 2 – full optimization
  • Use dynamic programming for the joins
    • 3 – moderate optimization and rough approximation
    • 5 – full optimization, uses all techniques with heuristics
    • 7 – full optimization similar to 5, without heuristics
    • 9 – maximal optimization spare no effort/expense considers all possible join orders, including Cartesian products

We can see that DB2 uses greedy algorithms and dynamic programming. Of course, they don’t share the heuristics they use since the query optimizer is the main power of a database.

FYI, the default level is 5. By default the optimizer uses the following characteristics:

  • All available statistics , including frequent-value and quantile statistics, are used.
  • All query rewrite rules (including materialized query table routing) are applied, except computationally intensive rules that are applicable only in very rare cases.
  • Dynamic programming join enumeration is used, with:
    • Limited use of composite inner relation
    • Limited use of Cartesian products for star schemas involving lookup tables
  • A wide range of access methods is considered, including list prefetch (note: will see what is means), index ANDing (note: a special operation with indexes), and materialized query table routing.

By default, DB2 uses dynamic programming limited by heuristics for the join ordering.

The others conditions (GROUP BY, DISTINCT…) are handled by simple rules.

results matching ""

    No results matching ""