Global overview
We’ve just seen the basic components inside a database. We now need to step back to see the big picture.
A database is a collection of information that can easily be accessed and modified. But a simple bunch of files could do the same. In fact, the simplest databases like SQLite are nothing more than a bunch of files. But SQLite is a well-crafted bunch of files because it allows you to:
- use transactions that ensure data are safe and coherent
- quickly process data even when you’re dealing with millions of data
More generally, a database can be seen as the following figure:
Before writing this part, I’ve read multiple books/papers and every source had its on way to represent a database. So, don’t focus too much on how I organized this database or how I named the processes because I made some choices to fit the plan of this article. What matters are the different components; the overall idea is that a database is divided into multiple components that interact with each other.
The core components:
- The process manager: Many databases have a pool of processes/threads that needs to be managed. Moreover, in order to gain nanoseconds, some modern databases use their own threads instead of the Operating System threads.
- The network manager: Network I/O is a big issue, especially for distributed databases. That’s why some databases have their own manager.
- File system manager: Disk I/O is the first bottleneck of a database . Having a manager that will perfectly handle the Operating System file system or even replace it is important.
- The memory manager: To avoid the disk I/O penalty a large quantity of ram is required. But if you handle a large amount of memory, you need an efficient memory manager. Especially when you have many queries using memory at the same time.
- Security Manager: for managing the authentication and the authorizations of the users
- Client manager: for managing the client connections
- …
The tools:
- Backup manager: for saving and restoring a database.
- Recovery manager: for restarting the database in a coherent state after a crash
- Monitor manager: for logging the activity of the database and providing tools to monitor a database
- Administration manager: for storing metadata (like the names and the structures of the tables) and providing tools to manage databases, schemas, tablespaces, …
- …
The query Manager:
- Query parser: to check if a query is valid
- Query rewriter: to pre-optimize a query
- Query optimizer: to optimize a query
- Query executor: to compile and execute a query
The data manager:
- Transaction manager: to handle transactions
- Cache manager: to put data in memory before using them and put data in memory before writing them on disk
- Data access manager: to access data on disk
For the rest of this article, I’ll focus on how a database manages an SQL query through the following processes:
- the client manager
- the query manager
- the data manager (I’ll also include the recovery manager in this part)