I’ve been hacking on a tool to allow resynchronizing an old master server after failover. Please take a look: https://github.com/vmware/pg_rewind.
Two-phase commit was implemented in PostgreSQL in version 8.1, and driver support for it was added to the JDBC driver at the same time. However, we never added support for two features required by the Java Transaction API: transaction interleaving and suspend/resume. Even though they are required by the specification, they are not necessary for reliable two-phase commit, and many other JTA implementations also don’t implement them. They are difficult to implement if the underlying protocol used to talk to the database doesn’t support them natively.
People have been using the driver happily without those features for years, but every now and then someone runs into issues, when trying to configure a new application server to use the PostgreSQL’s XADataSource implementation. With some configuration, you usually can get things to work – major application servers don’t rely on those features, because they are not widely implemented, or at least have an option to do so if it’s not the default.
But what are these missing features, and why are they so difficult to implement?
The Java Transaction API
The way to implement two-phase commit in a JDBC driver is specified by the Java Transaction API specification. The API is modeled after the older X/Open XA specification, which defines the same thing for native Unix applications. That explains some peculiarities of the API; it’s quite different than most Java APIs. I believe the history also explains how the suspend/resume and transaction interleaving features sneaked into the Java standard: they made a lot more sense in a traditional single-threaded Unix process, than they do in a multi-threaded Java environment.
To implement the API, a driver needs to provide a class implementing the XAResource interface. It consists of five basic methods, plus some support methods related to timeouts and heuristic commits, which are optional features. The basic required methods are:
public void start(Xid xid, int flags);
Associates the current JDBC connection with a global transaction.
public void end(Xid xid, int flags);
Disassociates the current JDBC connection from a global transaction.
public int prepare(Xid xid);
The 1st, prepare phase of committing a transaction.
public void commit(Xid xid, boolean onePhase);
The 2nd phase commit.
public void rollback(Xid xid);
Rolls back a transaction.
public Xid recover(int flag);
Returns a list of global transactions that have been prepared but not committed yet. Used for crash recovery.
Basic two-phase commit
Here are the minimum steps to perform a global transaction using the API:
- Do your stuff with the Connection.
Simple, really. If you stick to the above steps, your transaction manager will work with any driver.
To work reliably with any JDBC driver, perform steps 1-4 in the same connection. The specification is more flexible than that, but not all implementations support transferring a global transaction from one connection to another, including PostgreSQL. Do not use the connection for anything else in between, consider the connection to be reserved for that transaction for the whole duration. If the connection is lost after step 4, you can use a new connection to perform the 2nd-phase commit() step, however.
Note that the above steps are performed by the transaction manager, typically included in a Java Application Server. The application developer doesn’t see any of that, he will use a completely different set of APIs to interact with the transaction manager.
According to the JTA specification, it’s possible to temporarily disassociate (suspend) a global transaction from the connection. Ie. you can do this:
- Do some stuff with the Connection.
XAResource.end(xid1, TMSUSPEND); // Suspend the old transaction
XAResource.end(xid2, TMNOFLAGS); // Start a new transaction using the same connection
- Do other stuff with the Connection.
So, the same connection is used for two different transactions, before preparing either one. The problem is that in PostgreSQL, a single database connection can only perform work for a single transaction at a time. If you want to use the connection for another transaction, you have to prepare, commit or rollback the current transaction first. And once you do that, you cannot return to the old transaction, continuing to add more work to it.
This is no problem for an application server. It can simply open a new connection for the 2nd transaction. But the JTA specification requires the driver to support that! There is no way for the driver to say “hey, I can’t do this stuff, please use one connection for one transaction only”. Furthermore, this is completely transparent to the application – it’s the application servers choice to use suspend/resume or not.
There are other complications in the JTA spec, like support for “joining” a connection to an in-progress global transaction. Like suspend/resume, that’s also totally unnecessary for reliable two-phase commit.
The authors of the JTA specification screwed up. Instead of writing a simple API for two-phase commit, they mixed other non-essential features into the specification that have nothing to do with two-phase commit, and made them required. One can argue that there is more to the JTA specification than two-phase commit, but in reality, if you ask any Java App Server administrator, the only reason to ever use an XA-enabled driver is if you want to use two-phase commit.
The result is that some drivers implement the whole specification, while others implement only the minimum set of features required for reliable two-phase commit. But a transaction manager doesn’t know which features are actually implemented – they’re all required by the spec – so to work with real-world drivers, a transaction manager has to be written to the lowest common denominator. Had the JTA authors kept the API simple, and at least made the non-essential features optional, more JDBC implementations could fulfill the letter of the spec, and TM developers could code against the spec instead of an ill-defined subset of the spec.
So, why would an application server use the advanced features of the spec? One use case I’ve seen is to interleave transactions to improve concurrency, but in practice it’s unlikely to give any performance benefit. Even on a DBMS that supports that natively, switching transaction contexts isn’t free – you will most likely achieve the same or better performance by just opening one more connection.
If you’re developing a Transaction Manager, please stick to the basic steps listed above! Don’t use suspend/resume or other advanced features of the JTA specification. This ensures that your software works reliably with the widest possible range of drivers, including the PostgreSQL driver, and saves your users from the headache of debugging strange concurrency bugs.
Today is my last day at EnterpriseDB. I’ve had a good time working there for the past six years, and will miss many of my colleagues. Tomorrow, I’m starting at VMWare. I will continue to be an active member of the PostgreSQL community; VMWare wants to see a healthy community and wants to be part of it.
Thank you EnterpriseDB for all the good time I’ve had, and thank you VMWare for all the good time I will have in the future!