Column Families in CockroachDB
Using column families to reduce contention in CockroachDB
In this article, we will take a closer look at using column families in CockroachDB to reduce contention on concurrent updates.
Column families are a group of columns in a table that are stored as single key-value pairs in the storage layer. By default, there is a single implicit column family created for all columns.
Adding column families is a method to impose a different structure on the table where each row is represented by multiple key-value pairs. This can improve performance for write operations (INSERT, UPDATE and DELETE) and given certain constraints also reduce contention from concurrent transactions, which is the topic of this article.
The test project we're going to use is a basic Spring Boot application that uses plain JPA and Hibernate for data access.
Source Code
The source code for examples of this article can be found on GitHub.
Use Case
The demo application uses a single table to store purchase orders. It comes in two flavors: one with a single-column family and one with two families. Other than that, the tables are identical.
The operations we will perform will also be the same for both tables, but we are going to witness different transaction behavior due to the nature of the operations and the use of column families.
The table schema is as follows:
create table purchase_order1
(
id integer not null default unordered_unique_rowid(),
bill_address1 varchar(255),
bill_address2 varchar(255),
bill_city varchar(255),
bill_country varchar(16),
bill_postcode varchar(16),
bill_to_first_name varchar(255),
bill_to_last_name varchar(255),
date_placed date not null default current_date(),
deliv_to_first_name varchar(255),
deliv_to_last_name varchar(255),
deliv_address1 varchar(255),
deliv_address2 varchar(255),
deliv_city varchar(255),
deliv_country varchar(16),
deliv_postcode varchar(16),
order_status varchar(64),
total_price decimal(18, 2) not null,
primary key (id)
);
create table purchase_order2
(
id integer not null default unordered_unique_rowid(),
bill_address1 varchar(255),
bill_address2 varchar(255),
bill_city varchar(255),
bill_country varchar(16),
bill_postcode varchar(16),
bill_to_first_name varchar(255),
bill_to_last_name varchar(255),
date_placed date not null default current_date(),
deliv_to_first_name varchar(255),
deliv_to_last_name varchar(255),
deliv_address1 varchar(255),
deliv_address2 varchar(255),
deliv_city varchar(255),
deliv_country varchar(16),
deliv_postcode varchar(16),
order_status varchar(64),
total_price decimal(18, 2) not null,
primary key (id),
FAMILY f1 (id, bill_address1, bill_address2, bill_city, bill_country, bill_postcode, bill_to_first_name, bill_to_last_name, date_placed, deliv_to_first_name, deliv_to_last_name, deliv_address1, deliv_address2, deliv_city, deliv_country, deliv_postcode, order_status),
family f2 (total_price)
);
The sequence of SQL operations is to first read a single row and then update the same row but use different columns. This will be done concurrently in separate sessions to visualize the effect.
The sequence of operations will be something like the following, for transaction T1 and T2 (time flows vertically):
Time | T1 | T2 |
1 | begin; | begin; |
2 | select id,status from purchase_order1 where id=1; | |
3 | (reads 1,'PLACED') | select id,total_price from purchase_order1 where id=1; |
4 | update purchase_order1 set status = 'CONFIRMED' where id = 1; | (reads 1,100.00) |
5 | update purchase_order1 set total_price = total_price + 5 where id = 1; | |
6 | (blocked on T1) | |
7 | commit; | |
8 | commit; --- "ERROR: restart transaction.." | |
9 | (rollback) |
The rollback for T2 is expected (yet undesired) behavior since we have contending read and write operations on the same row, even if the transactions write back to different columns.
Next, let's see how the same sequence works on the other table (purchase_order2
) with a separate column family for the total_price
column.
Time | T1 | T2 |
1 | begin; | begin; |
2 | select id,status from purchase_order2 where id=1; | |
3 | (reads 1,'PLACED') | select id,total_price from purchase_order2 where id=1; |
4 | update purchase_order2 set status = 'CONFIRMED' where id = 1; | (reads 1,100.00) |
5 | update purchase_order2 set total_price = total_price + 5 where id = 1; | |
6 | commit; | commit; |
As we can see, both transactions can commit which is allowed in a serializable history since we are not writing to the same key, but to disjoint columns in different key-value pairs in the underlying key-value storage.
JPA and Hibernate Considerations
There are a few considerations in terms of Hibernate entity mappings and query operations to make the best use of CockroachDB column families to reduce contention.
One is to always use projection in the reading part of a transaction. This is the default, but the problem is that all columns are read like using a star projection. This is the case for example when using the entityManager#find
method. If all columns would be read in the last example above, then it would fail with a transient SQL error.
For the demo, we are using named queries for the different order entities, as follows:
@Entity
@Table(name = "purchase_order1")
@NamedQuery(name = "Order1.findByIdForUpdateStatus",
query = "SELECT o.id,o.orderStatus,o.totalPrice FROM Order1 o WHERE o.id = ?1")
@NamedQuery(name = "Order1.findByIdForUpdatePrice",
query = "SELECT o.id,o.totalPrice,o.orderStatus FROM Order1 o WHERE o.id = ?1")
public class Order1 extends AbstractOrder {
}
.. and ..
@Entity
@Table(name = "purchase_order2")
@NamedQuery(name = "Order2.findByIdForUpdateStatus",
query = "SELECT o.id,o.orderStatus FROM Order2 o WHERE o.id = ?1")
@NamedQuery(name = "Order2.findByIdForUpdatePrice",
query = "SELECT o.id,o.totalPrice FROM Order2 o WHERE o.id = ?1")
public class Order2 extends AbstractOrder {
}
The downside with projection is that we can't use a typed query to read the persistent domain object (PDO) into its attached state, but only use scalar values. It's fine for this example though, where we just write the data again being selective of which columns to update:
entityManager
.createQuery("update " + orderType.getSimpleName()
+ " o set o.orderStatus = :status"
+ " where o.id = :id")
.setParameter("status", status)
.setParameter("id", orderId)
.executeUpdate();
// ..and..
entityManager
.createQuery("update " + orderType.getSimpleName()
+ " o set o.totalPrice = o.totalPrice + :price"
+ " where o.id = :id")
.setParameter("price", price)
.setParameter("id", orderId)
.executeUpdate();
Another method of being selective on which columns to update is to use the @DynamicInsert
and @DynamicUpdate
annotations in Hibernate:
@Entity
@Table(name = "purchase_order1")
@DynamicInsert
@DynamicUpdate
public class Order1 extends AbstractOrder {
}
This works for the update part, but not for projection.
Running the Demo
For demo cloning and building instructions, see the Github project.
Single Column Family
Get a form to create a new order:
shell curl "
http://localhost:8090/order/v1/template
" > o1.json
Submit the order form:
curl "http://localhost:8090/order/v1/" -H "Content-Type:application/json" -X POST -d "@o1.json"
Take note of generated id
value (which is 1 by default) and be prepared to rotate the order status with a commit delay of 5 sec to allow for another update to come in between:
curl "http://localhost:8090/order/v1/1/status?delay=5" -i -X PUT
Now, within 5 sec after the previous PUT, increment the price on the same order ID, which will cause a serialization conflict:
curl "http://localhost:8090/order/v1/1/price?price=5&delay=0" -i -X PUT
This request will fail with an expected 40001 error. That's fine, but let's try to avoid this contention effect.
Multiple Column Families
Submit the order form again but using a different URL:
curl "http://localhost:8090/order/v2/" -H "Content-Type:application/json" -X POST -d "@o1.json"
Let's fire the status update and delay the commit with 5 sec:
curl "http://localhost:8090/order/v2/1/status?delay=5" -i -X PUT
Now, within 5 sec, increment the price on the same order ID which will succeed:
curl "http://localhost:8090/order/v2/1/price?price=5&delay=0" -i -X PUT
That's it, no retry errors.
Conclusion
Column families are a simple technique that can help to reduce contention and improve write performance in CockroachDB. The key consideration to make this work in JPA and Hibernate is to use both projections and selective updates when writing entity state back to the database.