Distributed transactions

What is a transaction

In a single database, a transaction is a sequence of one or more operations, such as reading, updating, or deleting data, that are executed together as a single, logical unit of work. Transactions ensure that the database remains consistent and follows certain properties, known as ACID properties:

Atomicity: A transaction is either fully completed or not executed at all. If any part of the transaction fails, the entire transaction is rolled back. Really should be called “abortability”.
Consistency: The database starts in a consistent state and ends in a consistent state after the transaction is completed. More of a property of the app, not the database. Simply tossed in to make the acronym work.
Isolation: Transactions appear to be executed independently, even if they are running concurrently.
Durability: Once a transaction is completed, its changes are permanent in the database. An obvious property, since it’s a database, not volatile memory. This is more of a property of the database in general, mostly to make the acronym work.

We have seen how transactions work in a single database in a previous section. Now let’s take a look at how transactions can be implemented in a distributed system.

What is a distributed transaction

Now, in a distributed system, a distributed transaction is a transaction that involves multiple nodes, services, or databases. For example, imagine an online shopping system where you have separate services for managing user accounts, inventory, and payment processing. When a user places an order, the system must coordinate operations across these different services to ensure that the user's account is updated, the inventory is adjusted, and the payment is processed.

distributed-transaction-sample

In microservices design, we normally have one database per service. Order service, user service and inventory service each have their own database. When a checkout request is processed, each service tries to commit to their own database. The commits into each database should either all succeed or all fail. In other words, the commit should be “atomic” (as in A in ACID). This is called the atomic commit and is the key property of a distributed transaction.

As you might expect, this is a difficult issue to tackle because coordinating commits in a distributed system, which is susceptible to failures, must be carefully managed. Let's look at some more examples.

Trip Reservations

A travel booking service may allow users to book flights, hotels, and rental cars in a single transaction. Its components include:

Flight Reservation System
Hotel Booking System
Car Rental Service

All reservations must either succeed together or fail as a group. If the flight reservation is confirmed but the hotel reservation fails, the system must cancel the flight to maintain consistency. This is where the Saga pattern is often used, with compensating transactions to undo previous reservations.

Bank Transfers

In cross-bank money transfers, funds are moved between two separate banking institutions. Its components include:

Bank A’s System
Bank B’s System
Clearing House
Bank A: Debits the amount from the sender’s account.
Bank B: Credits the amount to the receiver’s account.
Clearing House: Acts as the intermediary to coordinate the transaction.

If Bank A successfully debits the amount but Bank B fails to credit it, the system must rollback the debit. Distributed transactions prevent situations where money disappears during a transfer due to failures at one end.

Now let’s talk about the few ways to implement distributed transactions.

Two Phase Commit

two-phase-commit

Two-Phase Commit (2PC) is a protocol for coordinating distributed transactions in a way that ensures all participants either commit or abort the transaction.

Besides the databases we want to commit to, 2PC uses a new component, the coordinator, that does not normally appear in single-node transactions.

The best way to explain 2PC is to use a wedding ceremony as an analogy, imagine the following scenario:

The wedding ceremony is the transaction, and the different participants (bride, groom, officiant, and guests) are the nodes in the distributed system.

Phase 1: Prepare

The officiant (acting as the coordinator) asks the bride and groom (the participants) if they are ready to commit to the wedding ceremony (the transaction). This is similar to the "prepare" phase in 2PC, where the coordinator asks participants if they are ready to commit the transaction.
The bride and groom both reply with either a "yes" or "no." If both say "yes," it means they are prepared to commit to the wedding ceremony. If either of them says "no," the ceremony cannot proceed.

Phase 2: Commit or Abort

If both the bride and groom agree to proceed from Phase 1, the officiant announces the commitment and instructs the participants (bride, groom, and guests) to finalize the ceremony (similar to the "commit" phase in 2PC). Everyone celebrates, and the wedding is complete.
If either the bride or the groom is not ready to commit from Phase 1, the officiant cancels the ceremony and informs all the participants (similar to the "abort" phase in 2PC). The ceremony does not take place, and everyone leaves.

The Two-Phase Commit protocol works in a similar way, with a coordinator (like the officiant) ensuring that all participants (nodes) either commit the transaction (complete the wedding ceremony) or abort it (cancel the ceremony). This ensures consistency and coordination across the distributed system.

In the case of the e-commerce example, the coordinator would send a prepare command to the order, user and inventory services. Once it received OK from them, it’ll ask them to proceed to the commit phase.

The problem with 2PC

This is a fun and high level explanation. Let’s take a look 2PC in more details.

Once the coordinator receives the “yes” from all nodes, it will proceed to the commit phase. At this point, it writes the decision into its transaction log on disk (the officiant records it in the book). If the coordinator crashes, it can check the log to know which decision it made when it recovers. The record in the transaction log is called the commit point.
Once the coordinator’s decision has been written to the transaction log, the commit or abort request is sent to all the nodes. At this point, there is no going back. The decision has to be enforced even if the request fails or times out. The coordinator needs to retry as many times as possible until it succeeds. If a participant node fails in the 2nd phase, it has to commit after it recovered since it voted yes in the first phase. To use the wedding analogy, once you said yes, the officiant announces “You are now duly married”. The marriage is in effect even if the bride or groom leaves the room at this point.

Now it all sounds really good but do you see the potential problem here?

Participant failure

If a participant fails, the coordinator has to retry sending the request until it succeeds. This blocks the entire operation. If a participant node fails or becomes unreachable during the commit phase, the entire transaction can be blocked. The system may have to wait for the failed node to recover, which can lead to performance degradation and decreased availability.

Coordinator failure

The coordinator is a single-point of failure. The coordinator node is a critical component in the 2PC protocol, and its failure can lead to delays and uncertainty about the transaction's final state. If the coordinator fails during the commit phase, the participants may be left waiting indefinitely or need to rely on timeouts to decide their next actions.

Consider this scenario, the prepare phase succeeds, the coordinator sends commit request to database #2 but it crashes before being able to send to database #1. Now database #1 is stuck in the prepare state. It cannot unilaterally commit because it does not know whether the other databases voted yes or not. The only way is to wait for the coordinator to recover.

coordinator-failure-in-commit-phase

Coordinator failure in commit phase

As a result of these problems, 2PC carries a heavy performance penalty.

Databases that use 2PC

PostgreSQL: PostgreSQL, an open-source relational database management system, supports 2PC for managing distributed transactions. It allows you to coordinate transactions across multiple PostgreSQL instances.
Microsoft SQL Server: Microsoft's SQL Server, a widely-used relational database management system, supports distributed transactions using 2PC. It relies on the Microsoft Distributed Transaction Coordinator (MSDTC) for managing distributed transactions across different SQL Server instances or other resource managers.
Oracle Database: Oracle Database, a popular relational database management system, supports distributed transactions using 2PC. It can coordinate transactions across multiple Oracle instances or even heterogeneous databases.
IBM DB2: IBM DB2, a family of data management products, supports 2PC for managing distributed transactions. It can coordinate transactions across multiple instances of DB2 databases or other resource managers.
MySQL: MySQL, another popular open-source relational database management system, supports 2PC for InnoDB and NDB storage engines. These storage engines allow for distributed transactions across multiple MySQL instances.

The performance penalty of 2PC has been reported to be as much as 10x slower than single-node transactions for MySQL.So many cloud providers choose not to implement it.

Three-phase commit (3PC)

An alternative protocol, known as the Three-Phase Commit (3PC), has been suggested to address some limitations of the 2PC protocol. However, 3PC assumes that the network has limited delays and nodes respond within a specific time frame. In reality, most practical systems deal with unpredictable network delays and varying response times from nodes, making it challenging for 3PC to guarantee atomicity. As a result, while 3PC deserves an honorable mention for its attempt to improve distributed transaction management, we won't delve into its details due to these constraints.

Compensating Transactions and Saga Pattern

magic-the-gathering-

If you are a fan of Magic: The Gathering (a card game) like me, you have probably used a Saga. A Saga is the card that has multiple steps. Each Saga tells the story of a key event from the past as it unfolds during each of your turn.

Just like this card, the Saga Pattern in distributed systems allows you to manage a sequence of local transactions, each with a corresponding compensating transaction, to ensure data consistency across multiple services.

Instead of relying on a coordinator whose failure may block the entire service, the Saga Pattern orchestrates a series of local transactions, and if any of them fail, it triggers compensating transactions to undo the previous steps, maintaining data consistency.

The key components in a Saga Pattern are:

The events and their compensating events.
The message queues (also called event broker, event bus or event channels) that the events pass through.
The microservices that create and subscribe to events.

In the Saga Pattern, each local transaction is often paired with an event, similar to the event sourcing pattern. These events can be stored in a message queue, such as Kafka, and used to communicate between the different services involved in the distributed transaction.

Let's consider an e-commerce example with order, user, and inventory services:

A user places an order for a product.
The order service creates the order and publishes an "Order Created" event to the message queue.
The user service listens for the "Order Created" event, updates the user's account, and publishes a "User Account Updated" event to the message queue.
The inventory service listens for the "User Account Updated" event, checks the product's availability, and, if successful, reduces the stock and publishes a "Stock Reduced" event.

If any of these steps fail, the Saga Pattern triggers compensating transactions to roll back the changes:

If the user service fails to update the account, it publishes an "Account Update Failed" event. The order service listens for this event and cancels the order.
If the inventory service fails to reduce the stock, it publishes a "Stock Reduction Failed" event. The user service listens for this event, reverts the account update, and publishes an "Account Update Reverted" event. Finally, the order service listens for the "Account Update Reverted" event and cancels the order.

compensating-transactions-and-saga-pattern

Compensating Transactions and Saga Pattern

The advantages of the Saga Pattern are:

Asynchronous communication: Sagas rely on asynchronous communication using events and message queues, which allows for better system performance and responsiveness, as services can continue processing other requests without waiting for the completion of the distributed transaction.
Scalability: Since there is no central coordinator and services communicate asynchronously, the Saga Pattern can scale more effectively than the 2PC protocol, making it more suitable for large distributed systems and microservices architectures.
Loose coupling: The Saga Pattern promotes loose coupling between services, as they communicate through events and don't need direct knowledge of each other's internal implementations. This characteristic makes it easier to maintain, evolve, and deploy individual services independently.

The disadvantages are also quite obvious:

Increased complexity: Saga Pattern requires designing and managing multiple local transactions and compensating transactions, as well as handling events and message queues.
Eventual consistency: The Saga Pattern relies on eventual consistency, meaning that the system may not be consistent at all times during the execution of a distributed transaction. While this approach can improve performance and availability, it may not be suitable for scenarios where strong consistency is required.

Choreography vs Orchestration

There are two patterns to implement a Saga Pattern: Choreography and Orchestration.

Choreography

Service A ------------> Event ------------> Service B
    ^                                             |
    |                                             |
    |<-------------- Event -----------------------|

In choreography, services interact with each other in a decentralized manner, without a central coordinator. They communicate using events, where one service publishes an event, and other services listen for those events and react accordingly. Each service is responsible for knowing which events to listen for and what actions to take when they receive them. It's like a group dance, where each dancer knows their own moves and reacts to others' moves without needing a conductor.

Pros:

Decentralized: No single point of failure or bottleneck, which makes the system more resilient and scalable.
Flexibility: Services can evolve independently, as long as they continue to communicate using the agreed-upon events.

Cons:

Increased complexity: Managing dependencies and understanding the overall system flow can be challenging due to the decentralized nature of the interactions.
Error handling: It can be difficult to handle errors and rollback transactions, as there is no central coordinator to manage the process.

Orchestration

               Orchestrator
                    |
                    |--- Command ---> Service A
                    |                      |
                    |<-- Response ---------|
                    |
                    |--- Command ---> Service B
                    |                      |
                    |<-- Response ---------|

In orchestration, there is a central coordinator (often called an orchestrator) that is responsible for managing the interactions between services. The orchestrator sends commands to each service, telling them what actions to perform and when. It's like a conductor directing an orchestra, with each musician waiting for their cues and following the conductor's instructions.

Pros:

Clear transaction flow: The central orchestrator makes it easier to understand the overall system flow and monitor the progress of transactions.
Simplified error handling: The orchestrator can manage errors and rollback transactions more easily, as it has a centralized view of the process.

Cons:

Single point of failure: The orchestrator can become a single point of failure or a performance bottleneck, affecting the system's resilience and scalability.
Reduced flexibility: Changes to the system may require updates to the orchestrator, which can slow down the evolution of individual services.

What to use for your interviews?

Now the real questions is which one do you use when you are asked about distributed transactions in an interview. If your system follows a one-database-per-service pattern, which is what you'd use most of the time in system design, 2PC is not an option. This is because transaction happens across multiple databases and there is no single controller that can coordinate the process.

And that left us with Saga Pattern being a default choice. For simple scenarios, Choreography based Saga is good enough. For more complex operations, using a central coordinator like Orchestrator with a high redundancy backup is the way to go.