Simplified example

We’ve just seen 3 types of join operations.

Now let’s say we need to join 5 tables to have a full view of a person. A PERSON can have:

  • multiple MOBILES
  • multiple MAILS
  • multiple ADRESSES
  • multiple BANK_ACCOUNTS

In other words we need a quick answer for the following query:

SELECT*fromPERSON, MOBILES, MAILS,ADRESSES, BANK_ACCOUNTSWHEREPERSON.PERSON_ID = MOBILES.PERSON_IDANDPERSON.PERSON_ID = MAILS.PERSON_IDANDPERSON.PERSON_ID = ADRESSES.PERSON_IDANDPERSON.PERSON_ID = BANK_ACCOUNTS.PERSON_ID

As a query optimizer, I have to find the best way to process the data. But there are 2 problems:

  • What kind of join should I use for each join?

I have 3 possible joins (Hash Join, Merge Join, Nested Join) with the possibility to use 0,1 or 2 indexes (not to mention that there are different types of indexes).

  • What order should I choose to compute the join?

For example, the following figure shows different possible plans for only 3 joins on 4 tables

So here are my possibilities:

  • 1) I use a brute force approach

Using the database statistics, I compute the cost for every possible plan and I keep the best one. But there are many possibilities. For a given order of joins, each join has 3 possibilities: HashJoin, MergeJoin, NestedJoin. So, for a given order of joins there are 34 possibilities. The join ordering is a permutation problem on a binary tree and there are (2*4)!/(4+1)! possible orders. For this very simplified problem, I end up with 34*(2*4)!/(4+1)! possibilities.

In non-geek terms, it means 27 216 possible plans. If I now add the possibility for the merge join to take 0,1 or 2 B+Tree indexes, the number of possible plans becomes 210 000. Did I forget to mention that this query is VERY SIMPLE?

  • 2) I cry and quit this job

It’s very tempting but you wouldn’t get your result and I need money to pay the bills.

  • 3) I only try a few plans and take the one with the lowest cost.

Since I’m not superman, I can’t compute the cost of every plan. Instead, I can arbitrary choose a subset of all the possible plans, compute their costs and give you the best plan of this subset.

  • 4) I apply smart rules to reduce the number of possible plans.

There are 2 types of rules:

I can use “logical” rules that will remove useless possibilities but they won’t filter a lot of possible plans. For example: “the inner relation of the nested loop join must be the smallest data set”

I accept not finding the best solution and apply more aggressive rules to reduce a lot the number of possibilities. For example “If a relation is small, use a nested loop join and never use a merge join or a hash join”

In this simple example, I end up with many possibilities. But a real query can have other relational operators like OUTER JOIN, CROSS JOIN, GROUP BY, ORDER BY, PROJECTION, UNION, INTERSECT, DISTINCT … which means even more possibilities.

So, how a database does it?

results matching ""

    No results matching ""