I’m on acid

An ACID transaction is a unit of work that ensures 4 things:

  • Atomicity: the transaction is “all or nothing”, even if it lasts 10 hours. If the transaction crashes, the state goes back to before the transaction (the transaction is rolled back).
  • Isolation: if 2 transactions A and B run at the same time, the result of transactions A and B must be the same whether A finishes before/after/during transaction B.
  • Durability: once the transaction is committed (i.e. ends successfully), the data stay in the database no matter what happens (crash or error).
  • Consistency: only valid data (in terms of relational constraints and functional constraints) are written to the database. The consistency is related to atomicity and isolation.

During the same transaction, you can run multiple SQL queries to read, create, update and delete data. The mess begins when two transactions are using the same data. The classic example is a money transfer from an account A to an account B. Imagine you have 2 transactions:

  • Transaction 1 that takes 100$ from account A and gives them to account B
  • Transaction 2 that takes 50$ from account A and gives them to account B

If we go back to the ACID properties:

  • Atomicity
    ensures that no matter what happens during T1 (a server crash, a network failure …), you can’t end up in a situation where the 100$ are withdrawn from A and not given to B (this case is an inconsistent state).

  • I
    solation
    ensures that if T1 and T2 happen at the same time, in the end A will be taken 150$ and B given 150$ and not, for example, A taken 150$ and B given just $50 because T2 has partially erased the actions of T1 (this case is also an inconsistent state).

  • Durability
    ensures that T1 won’t disappear into thin air if the database crashes just after T1 is committed.

  • Consistency
    ensures that no money is created or destroyed in the system.

[You can skip to the next part if you want, what I’m going to say is not important for the rest of the article]

Many modern databases don’t use a pure isolation as a default behavior because it comes with a huge performance overhead. The SQL norm defines 4 levels of isolation:

  • Serializable
    (default behaviour in SQLite): The highest level of isolation. Two transactions happening at the same time are 100% isolated. Each transaction has its own “world”.

  • Repeatable read
    (default behavior in MySQL): Each transaction has its own “world” except in one situation. If a transaction ends up successfully and adds new data, these data will be visible in the other and still running transactions. But if A modifies a data and ends up successfully, the modification won’t be visible in the still running transactions. So, this break of isolation between transactions is only about new data, not the existing ones.

For example, if a transaction A does a “SELECT count(1) from TABLE_X” and then a new data is added and committed in TABLE_X by Transaction B, if transaction A does again a count(1) the value won’t be the same.

This is called a phantom read.

  • Read committed (default behavior in Oracle, PostgreSQL and SQL Server): It’s a repeatable read + a new break of isolation. If a transaction A reads a data D and then this data is modified (or deleted) and committed by a transaction B, if A reads data D again it will see the modification (or deletion) made by B on the data.

This is called a non-repeatable read.

  • Read uncommitted : the lowest level of isolation. It’s a read committed + a new break of isolation. If a transaction A reads a data D and then this data D is modified by a transaction B (that is not committed and still running), if A reads data D again it will see the modified value. If transaction B is rolled back, then data D read by A the second time doesn’t make no sense since it has been modified by a transaction B that never happened (since it was rolled back).

This is called a dirty read.

Most databases add their own custom levels of isolation (like the snapshot isolation used by PostgreSQL, Oracle and SQL Server). Moreover, most databases don’t implement all the levels of the SQL norm (especially the read uncommitted level).

The default level of isolation can be overridden by the user/developer at the beginning of the connection (it’s a very simple line of code to add).

results matching ""

    No results matching ""