Running CockroachDB TPC-C benchmark on GKE

Running the TPC-C benchmark on a self-hosted, single-region CockroachDB cluster on GKE

·

6 min read

Overview

This article will demonstrate how to run a TPC-C 2.5K benchmark on a self-hosted, 3-node, single-region CockroachDB cluster on Google Kubernetes Engine (GKE).

The TPC-C workload is modeled around the concept of a warehouse which is used as the throughput "knob" to measure scalability. The number of warehouses used maps to a given data volume baseline where a warehouse count of 2,500 translates to approximately a 200 GiB dataset.

The demo cluster is using 3x c2-standard-16 machines to match that volume, along with provisioned IOPS to follow CockroachDB recommended production guidelines on hardware ratios. The workload is run through an internal client pod with an option to run through either and internal or external load-balancer as well.

About the TPC-C Benchmark

The CockroachDB's built-in TPC-C workload is based on official TPC-C, the industry standard benchmark for On-line Transaction Processing (OLTP) performance. It simulates an industry-agnostic business with an OLTP database that manages, sells, or distributes a product.

The TPC-C workload measures databases across two different metrics:

  • Throughput: Measured as throughput-per-minute (tpm), which in practical terms measures the number of orders processed per minute.
  • Scale: Measured as the total number of warehouses supported. Each warehouse is of a fixed data size and has a max amount of tpm that it is allowed to support, so the total data size of the benchmark is scaled proportionally to throughput.

The CockroachDB TPC-C implementation can be found here and the schema can be found here.

The TPC-C workload is constructed to validate that the efficiency rate can be sustained when aiming for an increasingly higher tpmC (max throughput). Efficiency is measured in an explicit way. There's a limit to the number of tpmC allowed per warehouse, which is 12.86. The data amount per warehouse is about 200MiB, so for 2,500 warehouses, the maximum throughput is 2500 x 12.86 tpmC, which is 32,150.

Because TPC-C is constrained to a maximum amount of throughput per warehouse (12.86 tpmC), we often discuss TPC-C performance as the maximum number of warehouses for which a database can maintain the maximum throughput per minute (tpmC). In TPC-C, the required minimum to qualify is P9585%.

To take a few examples, assume:

  • 100 warehouses at 200MiB 100 gives 1240 tpmC (max is 1286), that's an efficiency rate of 96.4% or (1240/(100 12.86)).

  • 1000 warehouses at 200MiB 1000 gives 12,500 tpmC, that's an efficiency rate of 97.2% (12500/(1000 12.86)).

  • 2500 warehouses at 200MiB 2500 gives 30,837 tpmC, that's an efficiency rate of 95.9% (30837/(2500 12.86)).

  • 100,000 warehouses at ~20TiB gives 1,200,000 tpmC, that's an efficiency rate of 93.3%.

The largest [published result (cockroachlabs.com/docs/v22.1/performance#be..) for CockroachDB is 1.7M tpmC with 140,000 warehouses on 81 nodes, resulting in an efficiency score of 95%.

TPC-C Test Setup

Overview of the cluster setup for a TPC-C workload size small (2.5K warehouses) on a 3 node cluster in a single region. For more details, see Performance Benchmarking with TPC-C Small. and also Deploy CockroachDB with Kubernetes.

Layout:

  • Single region: europe-west-1
  • Machines: c2-standard-16
  • 3 CockroachDB nodes + 1 client node
  • Secure cluster
  • Manual StatefulSet configuration

Optional:

  • External load balancer service
  • 1x client outside of k8s for controlling the tpcc workload

Setup Steps

Step 1 - Start GKE cluster

gcloud container clusters create cockroachdb --machine-type c2-standard-16 --region europe-west1 --num-nodes 1

Step 2 - Create RBAC roles

kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<email>

Step 3 - Configure cluster

curl -O https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml

Edit cockroachdb-statefulset.yaml and update:

Resource requests / limits to reflect c2-standard-16 machines:

    resources:
      requests:
        cpu: "15"
        memory: "55Gi"
      limits:
        cpu: "15"
        memory: "55Gi"

Add a custom pd-ssd storage class:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: gocrazy
provisioner: kubernetes.io/gce-pd
parameters:
    type: pd-ssd

Add storageClassName and change storage size to 2TiB:

volumeClaimTemplates:
- metadata:
      name: datadir
  spec:
      accessModes:
      - "ReadWriteOnce"
      storageClassName: gocrazy
      resources:
          requests:
              storage: 2Ti

pd-ssd is recommended for pods < 32 vCPU and a minimum of 500 IOPS per vCPU is needed for optimal performance.

See also:

Step 4 - Create certificates

mkdir certs my-safe-directory

cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key

cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key

kubectl create secret generic cockroachdb.client.root --from-file=certs

cockroach cert create-node localhost 127.0.0.1 cockroachdb-public cockroachdb-public.default cockroachdb-public.default.svc.cluster.local "*.cockroachdb" "*.cockroachdb.default" "*.cockroachdb.default.svc.cluster.local" --certs-dir=certs --ca-key=my-safe-directory/ca.key

kubectl create secret generic cockroachdb.node --from-file=certs

kubectl get secrets

Step 5 - Initialize cluster

kubectl create -f cockroachdb-statefulset.yaml

kubectl get pods

kubectl get pv

kubectl exec -it cockroachdb-0 -- /cockroach/cockroach init --certs-dir=/cockroach/cockroach-certs

kubectl get pods

Step 6 - Create secure pod for SQL cli

kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/bring-your-own-certs/client.yaml

kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public

Create user in CLI:

CREATE USER roach WITH PASSWORD '123456'; -- maintains the rank..
GRANT admin to roach;

Step 7 - Access DB console (optional)

Setup port forwarding:

kubectl port-forward service/cockroachdb-public 8080

open https://localhost:8080/#/overview/list

Step 8 - Add external Load Balancer (optional)

kubectl get services

kubectl expose service cockroachdb --port=26257 --target-port=26257 --name=cockroachdb-external --type=LoadBalancer

kubectl get services
(wait for external ip)

Example:

NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)              AGE
cockroachdb            ClusterIP      None           <none>         26257/TCP,8080/TCP   11m
cockroachdb-external   LoadBalancer   10.3.248.15    34.140.51.58   26257:30143/TCP      48s
cockroachdb-public     ClusterIP      10.3.244.188   <none>         26257/TCP,8080/TCP   11m
kubernetes             ClusterIP      10.3.240.1     <none>         443/TCP              26m

Try connecting which should fail:

cockroach sql --url "postgres://root@34.140.51.58:26257" --certs-dir=certs

ERROR: x509: certificate is valid for 127.0.0.1, not 34.140.51.58

Update certificates with external IP:

cockroach cert create-node 34.140.51.58 localhost 127.0.0.1 cockroachdb-public cockroachdb-public.default cockroachdb-public.default.svc.cluster.local "*.cockroachdb" "*.cockroachdb.default" "*.cockroachdb.default.svc.cluster.local" --certs-dir=certs --ca-key=my-safe-directory/ca.key --overwrite

kubectl delete secret cockroachdb.node --ignore-not-found

kubectl create secret generic cockroachdb.node --from-file=certs

Restart pods and reconnect with success:

cockroach sql --url "postgres://root@34.140.51.58:26257" --certs-dir=certs

Benchmark Steps

Step 1 - Import dataset

2,500 warehouses is about 200GiB of data - see jobs in db console.

Use either alternative:

Option 1 - via public ip:

cockroach workload fixtures import tpcc --warehouses=2500 'postgres://root@<external-ip>:26257?sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/node.crt&sslkey=certs/node.key'

Option 2 - via client pod and public service:

kubectl exec -it cockroachdb-client-secure -- ./cockroach workload fixtures import tpcc --warehouses=2500 'postgres://root@cockroachdb-public:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key'

Step 2 - Run TPC-C workload for 30m

Use either alternative:

Option 1 - via external lb:

ulimit -n 100000 && cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=15m 'postgres://root@34.140.51.58:26257?sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/node.crt&sslkey=certs/node.key'

Option 2 - via client pod and public service:

kubectl exec -it cockroachdb-client-secure -- ./cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=30m 'postgres://root@cockroachdb-public:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key'

Option 3 - via client pod directly to pods:

create an addrs file:

postgres://root@cockroachdb-0.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key postgres://root@cockroachdb-1.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key postgres://root@cockroachdb-2.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key

run:

kubectl exec -it cockroachdb-client-secure -- ./cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=30m $(cat addrs)

Step 3 - Review results

Ex:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
1800.1s    30837.5  95.9%    326.2    302.0    604.0    704.6    973.1   3623.9

Benchmark passing criteria for our derivative TPC-C results:

  • P90 Latency < 5 Seconds
  • Efficiency rate over 95%.

TPC-C requirements are P95<10s and efficiency rate over 85%.

Cleanup Steps

Ensure the storage claims are deleted as well since it's not automatic in GKE.

kubectl delete pods,statefulsets,services,poddisruptionbudget,jobs,rolebinding,clusterrolebinding,role,clusterrole,serviceaccount -l app=cockroachdb

kubectl delete pod cockroachdb-client-secure

gcloud container clusters delete cockroachdb --region europe-west1

Conclusion

This was a tutorial of running the TPC-C workload in CockroachDB on a self-hosted GKE cluster.

Related Resources