Disaster Recovery and System Wide Failure (2PC vs 3PC)

Two-Phase Commit (2PC) protocol and Three-Phase Commit (3PC) protocol are two most popular algorithms of managing how to commit or abort distributed transactions in Distributed Database Management System (DDBMS).

Two-phase commit (2PC) enables databases to be returned to a former state if an error condition occurs.  It helps databases remain synchronized.  A coordinator is required and has the role of trying to determine consensus among a set of processes in two phases.  In terms of a sequence, first the coordinator contacts all the processes and suggests a value and solicits their response.  After getting the responses, the coordinator makes a decision to commit if all processes agreed upon the value or abort if there is a disagreement.  In the second phase, the coordinator then contacts all the processes again and communicates the commit or abort decision.

In a three-phase commit (3PC) protocol, all the nodes in a distributed system agree to commit to a transaction.  Unlike two-phase commit, the three-phase commit is non-blocking.  The phases include preparing to commit and then if the coordinate receives a yes from all processes during the prepare to commit phase then it asks for all the processes to commit.

In terminating a distributed transaction, since the two-phase commit is a blocking protocol, the system can get stuck.  It can get stuck because the system cannot resolve the transaction.  If the cohort sends an agreement message to the coordinator it holds the resources associated with consensus until it receives the commit or abort message of the coordinator.  The failure of the coordinator then prevents the cohorts from recovering from failure.

On the other hand, the three-phase commit protocol eliminates this blocking problem.  If a message times out, for example, other processes can unanimously agree that the operation was aborted.  The pre-commit phase helps the recovery when a process failure or both coordinator and process node failure during the commit phase occur.  In the event of a system wide power off failure, two-phase commit protocol might not recover data to the initial state when in a blocking state.  With a three-phase commit, the model is able to prevent blocking as crashes can be detected accurately.  One limitation though, for example, is that this protocol will not function with network partitions or asynchronous communication.  It is also important in this situation to do a system-wide backup as part of your disaster recovery plan.

In conclusion, 3PC is a better protocol for both terminating a distributed transactions and recovering from a system wide power off failure.

#3PC #ComputerScience #DisasterRecovery #SystemFailure


One thought on “Disaster Recovery and System Wide Failure (2PC vs 3PC)”

Leave a Reply

Your email address will not be published. Required fields are marked *