Application migration from PostgreSQL to CockroachDB

Application migration from PostgreSQL to CockroachDB

Step-by-step instructions for application migration from PostgreSQL to CockroachDB

·

10 min read

This article provides a hands-on guide for migrating a business service component using PostgreSQL to CockroachDB. Specifically, when the application stack is roughly based on the following:

  • Spring Boot + Spring Data

  • JPA and Hibernate

  • PostgreSQL 9+

Getting Started

First off, this is not a complete guide for data migrations. For schema and data migration techniques, see the official Migrate from PostgreSQL guide provided by Cockroach Labs.

The focus here is instead on the application tier and what type of refactoring efforts and other key considerations there are from an application architecture standpoint during a migration.

Let's begin by describing the process in the form of a gap analysis.

Current State

You have just migrated your PostgreSQL database to CockroachDB and the new cluster runs like clockwork. Now you turn the focus at your codebase, thinking OK, what's next to make this thing work?

Future State

You have migrated both the database and the application codebase that previously used PostgreSQL to also be capable to use CockroachDB. Not necessarily at the same time but configurable at startup time using Spring profiles.

The Gap

CockroachDB is compatible with the PostgreSQL v3.0 wire protocol (pg-wire) and works with the majority of tools that also work with PostgreSQL. This includes most PostgreSQL drivers, object-relational mapping frameworks, database schema version management tools, etc.

It is, however, not a clone or derivative of the PostgreSQL codebase, but a completely separate implementation written from the ground up in Go for solving different business problems that PostgreSQL and similar single-leader database architectures were crafted for. For more background see: Why is CockroachDB compatible with PostgreSQL?

Some features in PostgreSQL are not available in CockroachDB, most notably triggers and stored procedures. Depending on your development philosophy, this may not be a huge loss since adding business logic into the database itself could be a no-no.

For triggers, there's something conceptually similar but stronger called change-data-capture (CDC). For stored procedures, there's currently no other option than to move that logic to the app tier which is probably what you would want to do anyway.

CockroachDB also runs under the serializable isolation level, meaning that applications are more subject to transient retriable errors for explicit transactions under contended workloads. Hence a transaction retry strategy is recommended, even if contention can be largely avoided by design.

Bridging the Gap

The effort level for an application-level migration depends much on the characteristics of the workload, the scope of the project and what type of dependencies there are on PostgreSQL features not available in CockroachDB.

The first order of business would be to assess which unimplemented features that are currently used in PostgreSQL. For each unsupported item, you can check the product roadmap for upcoming support or look for an alternative approach. Rewriting or refactoring application code also presents an opportunity to re-design and work through the technical debt mountain.

For tracking issues and more details, see: https://www.cockroachlabs.com/docs/stable/sql-feature-support.html#miscellaneous

FeatureAlternative
Stored proceduresMigrate SP-level logic to app-tier
TriggersCDC
EventsSee SQL Feature Support
FULLTEXT functions and indexesSee FAQ
Drop primary keySee SQL Feature Support
XML functionsSee SQL Feature Support
Column-level privilegesSee SQL Feature Support
XA syntaxInbox and outbox pattern
Creating a database from a templateSee SQL Feature Support
Dropping a single partition from a tableSee SQL Feature Support
Foreign data wrappersSee SQL Feature Support

Migration Steps

Now let's cover some fundamental migration steps. Some of these may already be in place in your application since it's all common good practice regardless of the RDBMS used. These steps are based on a specific tech stack, as outlined in the overview.

Step 1: Check Dependencies

CockroachDB works perfectly fine with the PostgreSQL JDBC driver (pg-jdbc), so it's just a matter of picking the appropriate version.

If you are using the spring-boot-starter-parent as your parent Maven POM (as you should), then the driver version is inherited via postgresql.version. For Spring 2.7.6 it's 42.3.8 and for Spring 3.0.2 it's 42.5.1. In most cases, the pg-jdbc driver is pulled in transitively so it's often not necessary to define it in your project pom.xml.

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>${postgresql.version}</version>
</dependency>

Flyway is a great schema management tool integrated into spring-boot, meaning there's auto-configuration for it when detected on the classpath.

<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-core</artifactId>
</dependency>

Flyway is very simple to use with its naming convention on the DDL SQL files that are in plain SQL.

Yaml configuration example:

spring:
  flyway:
    locations: classpath:db/migration

Migration script example:

[src/main/resources/db/migration/V1_1__create.sql]

create table .. (..);
create table .. (..);

Another equally capable tool is Liquibase, which is more centered around XML configuration.

<dependency>
    <groupId>org.liquibase</groupId>
    <artifactId>liquibase-core</artifactId>
</dependency>

Step 2: Add Connection Details

CockroachDB uses the same JDBC driver prefix as PostgreSQL with different connection parameters depending on a secure or insecure cluster for development purposes. In this example, we are connecting to an insecure local cluster.

[src/main/resources/application.yml]

spring:
  datasource:
    url: jdbc:postgresql://localhost:26257/sleipner?sslmode=disable
    driver-class-name: org.postgresql.Driver
    username: root
    password:

In this example, we are connecting to a secure Cockroach Dedicated cluster with user credentials:

spring:
  datasource:
    url: jdbc:postgresql://odin-gc8.gcp-europe-west4.cockroachlabs.cloud:26257/hugin?sslmode=require
    driver-class-name: org.postgresql.Driver
    username: admin
    password: ...

In this example, we are connecting to a secure Cockroach Dedicated cluster with user credentials and the root certificate embedded in the executable JAR:

spring:
  datasource:
    url: jdbc:postgresql://odin-gc8.gcp-europe-west4.cockroachlabs.cloud:26257/munin?sslmode=verify-full&sslfactory=org.postgresql.ssl.SingleCertValidatingFactory&sslfactoryarg=classpath:certs/sleipner-secure.crt
    driver-class-name: org.postgresql.Driver
    username: root
    password: ...

Step 3: Setup Connection Pooling

Connection pooling is very important for both performance and graceful JDBC connection management regardless of RDBMS. Connection establishment is an expensive operation and running through that for each short-lived transaction would add significant overhead otherwise.

For the Java platform, HikariCP is becoming the de-facto standard for client-side connection pooling, and it's also the default pooling option in Spring Boot.

Example configuration:

spring:
  datasource:
    url: "jdbc:postgresql://localhost:26257/sleipner?sslmode=disable"
    username: root
    password:
    hikari:
      pool-name: my-service-pool
      maximum-pool-size: 48
      minimum-idle: 48

In this example, we configured the two most important parameters for the connection pool (max size and min idle). The maximum pool size should be set to 4 times the total vCPU count you have for the CockroachDB cluster, divided by the number of connection pool instances. So this would map against a 12 vCPU cluster or 4 vCPUs per node, using 3 nodes and one application instance with one datasource / connection pool.

The minimum idle set to the same value makes the pool act as a fixed-sized pool, ideal for bursty workloads (this is the default Hikari behavior). The max pool size is not an exact figure since all workloads are different, but more of a ballpark number at which point it's better to start applying backpressure client-side. "Better" in terms of there's a diminishing return in having more concurrent, active open connections due to scheduling and context-switching overhead.

    @Bean
    @Primary
    public DataSource primaryDataSource() {
        return hikariDataSource();
    }

    @Bean
    @ConfigurationProperties("spring.datasource.hikari")
    public HikariDataSource hikariDataSource() {
        HikariDataSource ds = dataSourceProperties()
                .initializeDataSourceBuilder()
                .type(HikariDataSource.class)
                .build();
        ds.addDataSourceProperty("reWriteBatchedInserts", "true");
        ds.setPoolName("my-service-pool");
        ds.setAutoCommit(false);
        ds.setMaximumPoolSize(80);
        ds.setMinimumIdle(80);
        return ds;
    }

The reWriteBatchedInserts property (case sensitive) enables driver-level rewriting of INSERT statements to use multi-value inserts with JDBC batch statements.

Setting auto commit to false should be used in combination with Hibernates Environment.CONNECTION_PROVIDER_DISABLES_AUTOCOMMIT set to true, which is a minor optimization skipping the check for auto commits. This is mostly helpful when you use a transaction strategy with explicit transaction boundaries @Transactional(propagation=REQUIRES_NEW).

For more details, see Connection pooling with Spring Boot and CockroachDB.

Step 4: Implement Retry Logic

Any client-server remote call (RPC) over the network can potentially fail with a transient, retriable error. As a good design principle, all remote calls should therefore have a mechanism for doing retries including to the database.

Transient errors are less likely to happen in PostgreSQL because it runs in read-committed (RC) isolation level by default, whereas CockroachDB runs in serializable (1SR). For a contended workload where overlapping concurrent reads and writes to the same keys, it's more likely to see serialization conflicts. CockroachDB is also distributed by nature and depends on semi-synchronized clocks for correctness, which under some circumstances may also force transactions to be retried.

The How to retry failed transactions post goes into detail on how retry-logic can be easily implemented in a spring boot application. The Spring Annotations for CockroachDB go into more detail on using meta-annotations to declare transaction boundaries across the codebase.

Step 5: Review JPA and Hibernate Configuration

CockroachDB Dialect

CockroachDB provides a custom Hibernate dialect that is derived from the PostgreSQL dialect.

spring:
  jpa:
    properties:
      hibernate:
        # Hibernate 5.x 
        dialect: org.hibernate.dialect.CockroachDB201Dialect
        # Hibernate 6.x 
        # dialect: org.hibernate.dialect.CockroachDialect
        jdbc:
          lob:
            non_contextual_creation: true

The non_contextual_creation is optional but it prevents the warning about createBlob if you find that annoying.

Primary Keys

The JPA specification offers four different primary key generation strategies:

  • AUTO - The persistence provider attempts to figure out the best strategy based on database dialect and key type (default).

  • IDENTITY - The persistence provider depends on a database-generated ID.

  • SEQUENCE - The persistence provider depends on a database sequence.

  • TABLE - Legacy method to simulate sequences (avoid).

The most performant option and ideal for data distribution are using UUID as the primary key for tables and AUTO. It also works well with batch INSERTs, which by the way are automatically disabled when using IDENTITY. SEQUENCE is not ideal since indexing on sequential primary and secondary indexes may cause range hot spots (for which hash sharding can help a bit). TABLE isn't used much these days and is considered legacy.

To consistently apply the same ID strategy for all entities, you could create an AbstractEntity implementing Persistable that also hosts the id column. Alternatively, move the id to the concrete subclass.

@Entity
@Table(name = "account")
class AccountEntity extends AbstractEntity<UUID> {
  @Id
  @Column(updatable = false, nullable = false)
  @GeneratedValue(strategy = GenerationType.AUTO)
  private UUID id;
  ...
}

// ---

import org.springframework.data.domain.Persistable;

@MappedSuperclass
abstract class AbstractEntity<ID> implements Persistable<ID> {
    @Transient
    private boolean isNew = true;

    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }
}

There are more details in this article.

Batch Statements

Using batch INSERTs with multi-value rewrites in the PostgreSQL driver will have a big impact on performance. To enable INSERT rewrites, just set the following pg-jdbc driver property either in the connection URL or as a datasource property:

spring:
  datasource:
    url: jdbc:postgresql://localhost:26257/spring_boot?sslmode=disable&reWriteBatchedInserts=true

Alternatively, in the datasource bean factory method:

@Bean
    @ConfigurationProperties("spring.datasource.hikari")
    public HikariDataSource hikariDataSource() {
        HikariDataSource ds = dataSourceProperties()
                .initializeDataSourceBuilder()
                .type(HikariDataSource.class)
                .build();
        ds.addDataSourceProperty("reWriteBatchedInserts", "true");
        return ds;
    }

Lastly, you also need to enable batching in Hibernate (batch_size > 0):

spring:
  jpa:
    show-sql: true
    hibernate:
      ddl-auto: none
    properties:
      hibernate:
        order_updates: true
        order_inserts: true
        jdbc:
          batch_versioned_data: true
          batch_size: 128
          fetch_size: 256
        cache:
          use_second_level_cache: false

There are more details in this article.

Step 6: Profile Activation

One quite useful approach when switching to CockroachDB is to use Spring Profiles. That way you can keep the CockroachDB connection and Hibernate dialect specifics in a separate application.yml file which is activated at startup time.

For example:

src/main/resources/
  application.yml      # contains common spring boot config
  application-psql.yml # contains datasource config for PSQL
  application-crdb.yml # contains datasource config for CockroachDB

Then its a matter of activating the proper profile at startup:

java -jar app.jar --spring.profiles.active=psql 
# or
java -jar app.jar --spring.profiles.active=crdb

Next Steps

This guide only scratches the surface of data and service migration projects which can have different types of challenges. One particular challenge often arises from zero downtime requirements, which effectively requires the service to run both databases at the same time and gradually "strangle" the old databases and logic. To implement this pattern inspired by the strangler fig (below), you could for example set up separate instances and use CDC to stream writes from the current primary to the other.

Additional Resources

Summary

This article provides a hands-on guide for migrating a business service component from PostgreSQL to CockroachDB, focusing on application architecture and refactoring efforts. By following these steps, you can ensure a smooth transition from PostgreSQL to CockroachDB for your application.

Did you find this article valuable?

Support Kai Niemi by becoming a sponsor. Any amount is appreciated!