Application migration from PostgreSQL to CockroachDB
Step-by-step instructions for application migration from PostgreSQL to CockroachDB
Table of contents
This article provides a hands-on guide for migrating a business service component using PostgreSQL to CockroachDB. Specifically, when the application stack is roughly based on the following:
Spring Boot + Spring Data
JPA and Hibernate
PostgreSQL 9+
Getting Started
First off, this is not a complete guide for data migrations. For schema and data migration techniques, see the official Migrate from PostgreSQL guide provided by Cockroach Labs.
The focus here is instead on the application tier and what type of refactoring efforts and other key considerations there are from an application architecture standpoint during a migration.
Let's begin by describing the process in the form of a gap analysis.
Current State
You have just migrated your PostgreSQL database to CockroachDB and the new cluster runs like clockwork. Now you turn the focus at your codebase, thinking OK, what's next to make this thing work?
Future State
You have migrated both the database and the application codebase that previously used PostgreSQL to also be capable to use CockroachDB. Not necessarily at the same time but configurable at startup time using Spring profiles.
The Gap
CockroachDB is compatible with the PostgreSQL v3.0 wire protocol (pg-wire) and works with the majority of tools that also work with PostgreSQL. This includes most PostgreSQL drivers, object-relational mapping frameworks, database schema version management tools, etc.
It is, however, not a clone or derivative of the PostgreSQL codebase, but a completely separate implementation written from the ground up in Go for solving different business problems that PostgreSQL and similar single-leader database architectures were crafted for. For more background see: Why is CockroachDB compatible with PostgreSQL?
Some features in PostgreSQL are not available in CockroachDB, most notably triggers and stored procedures. Depending on your development philosophy, this may not be a huge loss since adding business logic into the database itself could be a no-no.
For triggers, there's something conceptually similar but stronger called change-data-capture (CDC). For stored procedures, there's currently no other option than to move that logic to the app tier which is probably what you would want to do anyway.
CockroachDB also runs under the serializable isolation level, meaning that applications are more subject to transient retriable errors for explicit transactions under contended workloads. Hence a transaction retry strategy is recommended, even if contention can be largely avoided by design.
Bridging the Gap
The effort level for an application-level migration depends much on the characteristics of the workload, the scope of the project and what type of dependencies there are on PostgreSQL features not available in CockroachDB.
The first order of business would be to assess which unimplemented features that are currently used in PostgreSQL. For each unsupported item, you can check the product roadmap for upcoming support or look for an alternative approach. Rewriting or refactoring application code also presents an opportunity to re-design and work through the technical debt mountain.
For tracking issues and more details, see: https://www.cockroachlabs.com/docs/stable/sql-feature-support.html#miscellaneous
Feature | Alternative |
Stored procedures | Migrate SP-level logic to app-tier |
Triggers | CDC |
Events | See SQL Feature Support |
FULLTEXT functions and indexes | See FAQ |
Drop primary key | See SQL Feature Support |
XML functions | See SQL Feature Support |
Column-level privileges | See SQL Feature Support |
XA syntax | Inbox and outbox pattern |
Creating a database from a template | See SQL Feature Support |
Dropping a single partition from a table | See SQL Feature Support |
Foreign data wrappers | See SQL Feature Support |
Migration Steps
Now let's cover some fundamental migration steps. Some of these may already be in place in your application since it's all common good practice regardless of the RDBMS used. These steps are based on a specific tech stack, as outlined in the overview.
Step 1: Check Dependencies
CockroachDB works perfectly fine with the PostgreSQL JDBC driver (pg-jdbc), so it's just a matter of picking the appropriate version.
If you are using the spring-boot-starter-parent
as your parent Maven POM (as you should), then the driver version is inherited via postgresql.version
. For Spring 2.7.6 it's 42.3.8 and for Spring 3.0.2 it's 42.5.1. In most cases, the pg-jdbc
driver is pulled in transitively so it's often not necessary to define it in your project pom.xml
.
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>${postgresql.version}</version>
</dependency>
Flyway is a great schema management tool integrated into spring-boot, meaning there's auto-configuration for it when detected on the classpath.
<dependency>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-core</artifactId>
</dependency>
Flyway is very simple to use with its naming convention on the DDL SQL files that are in plain SQL.
Yaml configuration example:
spring:
flyway:
locations: classpath:db/migration
Migration script example:
[src/main/resources/db/migration/V1_1__create.sql]
create table .. (..);
create table .. (..);
Another equally capable tool is Liquibase, which is more centered around XML configuration.
<dependency>
<groupId>org.liquibase</groupId>
<artifactId>liquibase-core</artifactId>
</dependency>
Step 2: Add Connection Details
CockroachDB uses the same JDBC driver prefix as PostgreSQL with different connection parameters depending on a secure or insecure cluster for development purposes. In this example, we are connecting to an insecure local cluster.
[src/main/resources/application.yml]
spring:
datasource:
url: jdbc:postgresql://localhost:26257/sleipner?sslmode=disable
driver-class-name: org.postgresql.Driver
username: root
password:
In this example, we are connecting to a secure Cockroach Dedicated cluster with user credentials:
spring:
datasource:
url: jdbc:postgresql://odin-gc8.gcp-europe-west4.cockroachlabs.cloud:26257/hugin?sslmode=require
driver-class-name: org.postgresql.Driver
username: admin
password: ...
In this example, we are connecting to a secure Cockroach Dedicated cluster with user credentials and the root certificate embedded in the executable JAR:
spring:
datasource:
url: jdbc:postgresql://odin-gc8.gcp-europe-west4.cockroachlabs.cloud:26257/munin?sslmode=verify-full&sslfactory=org.postgresql.ssl.SingleCertValidatingFactory&sslfactoryarg=classpath:certs/sleipner-secure.crt
driver-class-name: org.postgresql.Driver
username: root
password: ...
Step 3: Setup Connection Pooling
Connection pooling is very important for both performance and graceful JDBC connection management regardless of RDBMS. Connection establishment is an expensive operation and running through that for each short-lived transaction would add significant overhead otherwise.
For the Java platform, HikariCP is becoming the de-facto standard for client-side connection pooling, and it's also the default pooling option in Spring Boot.
Example configuration:
spring:
datasource:
url: "jdbc:postgresql://localhost:26257/sleipner?sslmode=disable"
username: root
password:
hikari:
pool-name: my-service-pool
maximum-pool-size: 48
minimum-idle: 48
In this example, we configured the two most important parameters for the connection pool (max size and min idle). The maximum pool size should be set to 4 times the total vCPU count you have for the CockroachDB cluster, divided by the number of connection pool instances. So this would map against a 12 vCPU cluster or 4 vCPUs per node, using 3 nodes and one application instance with one datasource / connection pool.
The minimum idle set to the same value makes the pool act as a fixed-sized pool, ideal for bursty workloads (this is the default Hikari behavior). The max pool size is not an exact figure since all workloads are different, but more of a ballpark number at which point it's better to start applying backpressure client-side. "Better" in terms of there's a diminishing return in having more concurrent, active open connections due to scheduling and context-switching overhead.
@Bean
@Primary
public DataSource primaryDataSource() {
return hikariDataSource();
}
@Bean
@ConfigurationProperties("spring.datasource.hikari")
public HikariDataSource hikariDataSource() {
HikariDataSource ds = dataSourceProperties()
.initializeDataSourceBuilder()
.type(HikariDataSource.class)
.build();
ds.addDataSourceProperty("reWriteBatchedInserts", "true");
ds.setPoolName("my-service-pool");
ds.setAutoCommit(false);
ds.setMaximumPoolSize(80);
ds.setMinimumIdle(80);
return ds;
}
The reWriteBatchedInserts
property (case sensitive) enables driver-level rewriting of INSERT statements to use multi-value inserts with JDBC batch statements.
Setting auto commit to false
should be used in combination with Hibernates Environment.CONNECTION_PROVIDER_DISABLES_AUTOCOMMIT
set to true, which is a minor optimization skipping the check for auto commits. This is mostly helpful when you use a transaction strategy with explicit transaction boundaries @Transactional(propagation=REQUIRES_NEW)
.
For more details, see Connection pooling with Spring Boot and CockroachDB.
Step 4: Implement Retry Logic
Any client-server remote call (RPC) over the network can potentially fail with a transient, retriable error. As a good design principle, all remote calls should therefore have a mechanism for doing retries including to the database.
Transient errors are less likely to happen in PostgreSQL because it runs in read-committed (RC) isolation level by default, whereas CockroachDB runs in serializable (1SR). For a contended workload where overlapping concurrent reads and writes to the same keys, it's more likely to see serialization conflicts. CockroachDB is also distributed by nature and depends on semi-synchronized clocks for correctness, which under some circumstances may also force transactions to be retried.
The How to retry failed transactions post goes into detail on how retry-logic can be easily implemented in a spring boot application. The Spring Annotations for CockroachDB go into more detail on using meta-annotations to declare transaction boundaries across the codebase.
Step 5: Review JPA and Hibernate Configuration
CockroachDB Dialect
CockroachDB provides a custom Hibernate dialect that is derived from the PostgreSQL dialect.
spring:
jpa:
properties:
hibernate:
# Hibernate 5.x
dialect: org.hibernate.dialect.CockroachDB201Dialect
# Hibernate 6.x
# dialect: org.hibernate.dialect.CockroachDialect
jdbc:
lob:
non_contextual_creation: true
The non_contextual_creation
is optional but it prevents the warning about createBlob
if you find that annoying.
Primary Keys
The JPA specification offers four different primary key generation strategies:
AUTO - The persistence provider attempts to figure out the best strategy based on database dialect and key type (default).
IDENTITY - The persistence provider depends on a database-generated ID.
SEQUENCE - The persistence provider depends on a database sequence.
TABLE - Legacy method to simulate sequences (avoid).
The most performant option and ideal for data distribution are using UUID as the primary key for tables and AUTO. It also works well with batch INSERTs, which by the way are automatically disabled when using IDENTITY. SEQUENCE is not ideal since indexing on sequential primary and secondary indexes may cause range hot spots (for which hash sharding can help a bit). TABLE isn't used much these days and is considered legacy.
To consistently apply the same ID strategy for all entities, you could create an AbstractEntity
implementing Persistable
that also hosts the id column. Alternatively, move the id to the concrete subclass.
@Entity
@Table(name = "account")
class AccountEntity extends AbstractEntity<UUID> {
@Id
@Column(updatable = false, nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private UUID id;
...
}
// ---
import org.springframework.data.domain.Persistable;
@MappedSuperclass
abstract class AbstractEntity<ID> implements Persistable<ID> {
@Transient
private boolean isNew = true;
@PostPersist
@PostLoad
void markNotNew() {
this.isNew = false;
}
@Override
public boolean isNew() {
return isNew;
}
}
There are more details in this article.
Batch Statements
Using batch INSERTs with multi-value rewrites in the PostgreSQL driver will have a big impact on performance. To enable INSERT rewrites, just set the following pg-jdbc
driver property either in the connection URL or as a datasource property:
spring:
datasource:
url: jdbc:postgresql://localhost:26257/spring_boot?sslmode=disable&reWriteBatchedInserts=true
Alternatively, in the datasource bean factory method:
@Bean
@ConfigurationProperties("spring.datasource.hikari")
public HikariDataSource hikariDataSource() {
HikariDataSource ds = dataSourceProperties()
.initializeDataSourceBuilder()
.type(HikariDataSource.class)
.build();
ds.addDataSourceProperty("reWriteBatchedInserts", "true");
return ds;
}
Lastly, you also need to enable batching in Hibernate (batch_size
> 0):
spring:
jpa:
show-sql: true
hibernate:
ddl-auto: none
properties:
hibernate:
order_updates: true
order_inserts: true
jdbc:
batch_versioned_data: true
batch_size: 128
fetch_size: 256
cache:
use_second_level_cache: false
There are more details in this article.
Step 6: Profile Activation
One quite useful approach when switching to CockroachDB is to use Spring Profiles. That way you can keep the CockroachDB connection and Hibernate dialect specifics in a separate application.yml file which is activated at startup time.
For example:
src/main/resources/
application.yml # contains common spring boot config
application-psql.yml # contains datasource config for PSQL
application-crdb.yml # contains datasource config for CockroachDB
Then its a matter of activating the proper profile at startup:
java -jar app.jar --spring.profiles.active=psql
# or
java -jar app.jar --spring.profiles.active=crdb
Next Steps
This guide only scratches the surface of data and service migration projects which can have different types of challenges. One particular challenge often arises from zero downtime requirements, which effectively requires the service to run both databases at the same time and gradually "strangle" the old databases and logic. To implement this pattern inspired by the strangler fig (below), you could for example set up separate instances and use CDC to stream writes from the current primary to the other.
Additional Resources
https://www.cockroachlabs.com/docs/stable/postgresql-compatibility.html
https://www.cockroachlabs.com/docs/stable/migration-overview.html
https://www.cockroachlabs.com/docs/stable/transaction-retry-error-reference.html
Summary
This article provides a hands-on guide for migrating a business service component from PostgreSQL to CockroachDB, focusing on application architecture and refactoring efforts. By following these steps, you can ensure a smooth transition from PostgreSQL to CockroachDB for your application.