JPA Best Practices - Identity Generators

JPA and CockroachDB - Part 4

·

6 min read

Introduction

This article is part four of a series of data access best practices when using JPA and CockroachDB. The goal is to help reduce the impact of workload contention and to optimize performance. Although most of the principles are fairly framework and database agnostic, it’s mainly targeting the following technology stack:

  • JPA 2.x and Hibernate 5.+

  • CockroachDB v22+

  • Spring Boot 2.7.+

  • Spring Data JPA

  • Spring AOP with AspectJ

Example Code

The code examples are available on Github.

Chapter 4: Identity Generators

This chapter goes into more detail around JPA/Hibernate identity generators also covered in part III of this series.

4.1 How to use ID generators

Problem

You want to know how to use different primary key generators with JPA/Hibernate in the context of CockroachDB.

Solution

The JPA specification offers four different primary key generation strategies defined in the @javax.persistence.GenerationType annotation:

  1. AUTO - The persistence provider attempts to figure out the best strategy based on database dialect and key type (default).

  2. IDENTITY - The persistence provider depends on a database-generated ID.

  3. SEQUENCE - The persistence provider depends on a database sequence.

  4. TABLE - Legacy method to simulate sequences.

One alternative to primary key generators is to assign the IDs to the entities directly, like UUIDs.

AUTO

This strategy is the default. It allows the persistence provider to choose a strategy that lands in either SEQUENCE or IDENTITY depending on database dialect and ID column data type.

Using a UUID type for example maps to Hibernate's internal UUIDv4 generator:

@Id
@Column(updatable = false, nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private UUID id;

The AUTO example is therefore equal to:

@Id
@Column(name = "id", updatable = false, insertable = true)
@GeneratedValue(generator = "UUID")
@GenericGenerator(
        name = "UUID",
        strategy = "org.hibernate.id.UUIDGenerator"
)
private UUID id;

In contrast, a numeric ID data type maps to SEQUENCE (see below for details):

@Id
@Column(updatable = false, nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private Long id;

Identity

Caution: Using this strategy with Hibernate will disable batch INSERTs which severely impacts performance.

This strategy relies on a database ID generator method to create the IDs. The method name can be provided through the Hibernate dialect's org.hibernate.dialect.identity.IdentityColumnSupport#getIdentityInsertString method, in which case it will be injected into the SQL statements. If the return value is null then Hibernate will rely on the column default defined in the schema.

create table account
(
    id             int          not null default unordered_unique_rowid(),
    balance        float        not null,
    primary key (id)
);

In CockroachDB, this could be a globally unique, ordered 64-bit integer provided via unique_rowid() or an unordered int via unordered_unique_rowid(). The latter is slightly better for key/range distribution since it doesn't depend on ordering that incurs write hotspots similar to sequences.

If the primary key is a UUID, then this strategy will not work since that maps to the pg-uuid type which is not compatible. There is still a way to use database-generated UUIDs rather than JVM generated if that would be preferred. The JDK however uses the same UUID specification (v4) that CockroachDB uses, so there's not much point in doing so. Still, here's how:

@Entity
@Table(name = "account_uuid_db")
@TypeDefs({@TypeDef(name = "crdb-uuid", typeClass = CockroachUUIDType.class)})
@DynamicInsert
@DynamicUpdate
public class DatabaseUUIDAccountEntity extends AccountEntity<UUID> {
    @Id
    @Column(updatable = false, nullable = false)
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Type(type = "crdb-uuid", parameters = @Parameter(name = "column", value = "id"))
    private UUID id;

    @Override
    public UUID getId() {
        return id;
    }
}

In the entity above, we have declared a custom UUID type for the ID column which maps to the standard UUID type. The type of implementation is quite straightforward:

public class CockroachUUIDType extends PostgresUUIDType
        implements ResultSetIdentifierConsumer, ParameterizedType {
    private String idColumnName = "id";

    @Override
    public String getName() {
        return "crdb-uuid";
    }

    @Override
    public void setParameterValues(Properties params) {
        idColumnName = params.getProperty("column");
    }

    @Override
    public UUID consumeIdentifier(ResultSet resultSet) throws IdentifierGenerationException {
        try {
            return nullSafeGet(resultSet, idColumnName, wrapperOptions());
        } catch (SQLException e) {
            throw new IdentifierGenerationException("Error converting type", e);
        }
    }

    private WrapperOptions wrapperOptions() {
        return new WrapperOptions() {
            @Override
            public boolean useStreamForLobBinding() {
                return false;
            }

            @Override
            public LobCreator getLobCreator() {
                return null;
            }

            @Override
            public SqlTypeDescriptor remapSqlTypeDescriptor(final SqlTypeDescriptor sqlTypeDescriptor) {
                return PostgresUUIDSqlTypeDescriptor.INSTANCE;
            }

            @Override
            public TimeZone getJdbcTimeZone() {
                return TimeZone.getDefault();
            }
        };
    }
}

Sequence

This strategy uses a database sequence to generate IDs. In CockroachDB, this is not recommended for optimal performance since indexing on sequential keys will cause range hotspots.

One option is to use a virtual sequence that provides unique_rowid() values, but these values are still sequential and therefore also result in hotspots. There is a feature request 87290 however to provide unordered_unique_rowid() instead which will provide for better key distribution.

Sequence strategy example:

@Id
@Column(updatable = false, nullable = false)
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "account_generator")
@SequenceGenerator(name = "account_generator", sequenceName = "account_seq")
private Long id;

In this example, we are using a sequence named account_seq with an increment of 50 to match JPA defaults and also cached. This can be further tailored with the @SequenceGenerator annotation.

Example SQL for the sequence:

create sequence if not exists account_seq increment by 50 cache 10;

Example SQL when inserting a batch of 8 entities (notice BatchSize:8):

20:49:40.443 TRACE [SQL_TRACE] 
Name:, Connection:1006, Time:4, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["select nextval ('account_seq')"]
Params:[()]

The second statement:

20:49:40.448 TRACE [SQL_TRACE] 
Name:, Connection:1006, Time:5, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:8
Query:["insert into account_sequence (balance, closed, creation_time, currency, description, name, id) values (?, ?, ?, ?, ?, ?, ?)"]
Params:[(2911.81453383622,false,2022-09-12 20:49:40.435252,USD,yzytQ2Ea5FpwTNu3mpoOOfdTiSt7T6PQOtyLM8WfOx5oK-fRbEkIjkkOTSephTMA4eSXOWZQhTXsTYzgRfmEnw,fm6N_uu2YORFXqNOkG3z_K2qxtEBe0MGG9n-tPJcebU,48902),(772.3476591134778,false,2022-09-12 20:49:40.435259,USD,66_-C_R0f-e87GJEl5ZC0jtjOwKLf7IO9ueG_WvvFfQchHbewVGQJg_55W1TZdJ99jS-ZdOdx0Lagm4Xib921g,lldO4yKGhH24p0EufXXRbueScH0j9x_dwyMkLjwh2H0,48903),(812.3109117127407,false,2022-09-12 20:49:40.435259,USD,GvXDU_pZ813sykrIN687gjRBxCtpvSCEzPRgmwxskKLqeOroJuMpCCHWwjmJVxoQjmIt4SdTwNPZ3MX0bsdJag,84oBbEYmNpFPOkq4mNLSmxDbNo6Z9J5fGK4MOnVq3jQ,48904),(117.05857312158308,false,2022-09-12 20:49:40.43526,USD,XvQGd-KM32EdmoOdAUP2suEeb78GbCzbcFSNg0gWs8U-86W4EM7xPVCMQfkEDl_l-2Oij42-v2hvRiJ-KTvVUA,7C_f4orhcv_BL_Tqs5h2O0QQBrrk3ZzC45X2Sul1tvM,48905),(1062.0079187656315,false,2022-09-12 20:49:40.435261,USD,vqaG7maWqE46-yKGP1Kxezj_f0Ln_Lbm3RDOZ_On2e94TWJ7HizifPen_mfO0ag3ZSep-ebqU6A3vsaXudUlHw,tQK-JKkFm-wOKEJt90a16yTQXwi3dR-uyBVEmHNgwbA,48906),(1794.0608391535081,false,2022-09-12 20:49:40.435261,USD,KgdU0UmMVjn5gJdFa_6RHt8bbUvuUSDWbExoWsopKdQ3enFNXddksovabEa1GDx0n24B4L7-hAjTFPZYkyh-7A,FRE8gbGCRfKevZRIAKAe7Ek0BzAbQ4WqJL00-udxGYE,48907),(4679.81994020478,false,2022-09-12 20:49:40.435262,USD,Z62QaX7BKGFmTVx-mFKbpYKNOn8-aMa9pVPkMD9QikyCXHi2iAwvwHyI2XqETjvXRExnrrUH9vi0iCkEOuJBnQ,erbBpAappWiZTmvOsnmBrKYhsZzb2PRMukOfaGwAxbI,48908),(1689.8465103685626,false,2022-09-12 20:49:40.435263,USD,huFjltN3XcLsD78nggj3Glw7xEL6BWNy-4xwm4fncEHZeQNew_koiC_FHtgDCCI_MOSEnz5UBRaIhaU4Oolt8g,SaTUThr5VuaVDo5zAu__B6nhZu_9OSMM6OAd7RLL3JE,48909)]

Custom

Last but not least, we can also create our very own custom ID generator. One example of this could be to use a numeric primary key with auto-generated IDs that can also be batched. Batch inserts are critical for performance if creating entities through JPA.

Remember we can't use numeric IDs and IDENTITY which narrows the option to sequences, which again isn't great for performance due to hotspots.

So what gives then if we want the following:

  • Numeric primary keys

  • Support for JDBC batch inserts

  • Database generated keys

  • Not using a sequence

First, let's define our custom ID generator strategy:

@Id
@Column(updatable = false, nullable = false)
@GeneratedValue(generator = "custom-generator")
@GenericGenerator(name = "custom-generator",
        parameters = @Parameter(name = "batchSize", value = "64"),
        strategy = "io.roach.spring.identity.config.hibernate.CustomIDGenerator")
private Long id;

Next, let's look at the custom ID generator:

public class CustomIDGenerator implements IdentifierGenerator {
    private final Deque<Long> cachedIds = new LinkedList<>();

    private int batchSize;

    private String idQuery;

    @Override
    public void configure(Type type, Properties properties,
                          ServiceRegistry serviceRegistry) throws MappingException {
        this.batchSize = Integer.parseInt(properties.getProperty("batchSize", "32"));

        StringBuilder sb = new StringBuilder("select ");
        IntStream.rangeClosed(1, batchSize).forEach(value -> {
            sb.append("unordered_unique_rowid() as id").append(value);
            if (value < batchSize) {
                sb.append(",");
            }
        });

        this.idQuery = sb.toString();
    }

    @Override
    public Serializable generate(SharedSessionContractImplementor session, Object obj)
            throws HibernateException {
        if (cachedIds.isEmpty()) {
            Stream<Object[]> ids = session.createNativeQuery(idQuery).stream();
            ids.collect(Collectors.toList()).forEach(arr -> {
                Arrays.stream(arr).forEach(o1 -> {
                    BigInteger bi = (BigInteger) o1;
                    this.cachedIds.add(bi.longValue());
                });
            });
        }
        return cachedIds.poll();
    }
}

This strategy will fetch batches of unique, unordered IDs using the unordered_unique_rowid() method and then consume these for each ID generation method call. Ordering doesn't matter and the IDs are globally unique so caching them in each JVM instance is safe.

Table

This strategy simulates a sequence by using a custom database table. It's not used anymore, in particular when real sequences are available.

Recommendations

For best performance, use UUID primary keys and the AUTO generation type. The UUIDs will then be generated by the JVM and batch inserts are fully supported.

If you prefer a numeric primary key, consider either using sequences or the custom strategy above with unordered 64-bit integers that also support batch inserts.

Conclusion

In this article, we looked at primary key generation strategies for Hibernate and best practices for CockroachDB.