Global overview

We’ve just seen the basic components inside a database. We now need to step back to see the big picture.

A database is a collection of information that can easily be accessed and modified. But a simple bunch of files could do the same. In fact, the simplest databases like SQLite are nothing more than a bunch of files. But SQLite is a well-crafted bunch of files because it allows you to:

  • use transactions that ensure data are safe and coherent
  • quickly process data even when you’re dealing with millions of data

More generally, a database can be seen as the following figure:

Before writing this part, I’ve read multiple books/papers and every source had its on way to represent a database. So, don’t focus too much on how I organized this database or how I named the processes because I made some choices to fit the plan of this article. What matters are the different components; the overall idea is that a database is divided into multiple components that interact with each other.

The core components:

  • The process manager: Many databases have a pool of processes/threads that needs to be managed. Moreover, in order to gain nanoseconds, some modern databases use their own threads instead of the Operating System threads.
  • The network manager: Network I/O is a big issue, especially for distributed databases. That’s why some databases have their own manager.
  • File system manager: Disk I/O is the first bottleneck of a database . Having a manager that will perfectly handle the Operating System file system or even replace it is important.
  • The memory manager: To avoid the disk I/O penalty a large quantity of ram is required. But if you handle a large amount of memory, you need an efficient memory manager. Especially when you have many queries using memory at the same time.
  • Security Manager: for managing the authentication and the authorizations of the users
  • Client manager: for managing the client connections

The tools:

  • Backup manager: for saving and restoring a database.
  • Recovery manager: for restarting the database in a coherent state after a crash
  • Monitor manager: for logging the activity of the database and providing tools to monitor a database
  • Administration manager: for storing metadata (like the names and the structures of the tables) and providing tools to manage databases, schemas, tablespaces, …

The query Manager:

  • Query parser: to check if a query is valid
  • Query rewriter: to pre-optimize a query
  • Query optimizer: to optimize a query
  • Query executor: to compile and execute a query

The data manager:

  • Transaction manager: to handle transactions
  • Cache manager: to put data in memory before using them and put data in memory before writing them on disk
  • Data access manager: to access data on disk

For the rest of this article, I’ll focus on how a database manages an SQL query through the following processes:

  • the client manager
  • the query manager
  • the data manager (I’ll also include the recovery manager in this part)

results matching ""

    No results matching ""