# Running CockroachDB TPC-C benchmark on GKE

# Overview

This article will demonstrate how to run a TPC-C 2.5K benchmark on a self-hosted, 3-node, 
single-region CockroachDB cluster on Google Kubernetes Engine (GKE).
  
The TPC-C workload is modeled around the concept of a warehouse which is used as the 
throughput "knob" to measure scalability. The number of warehouses used maps to a given data volume baseline where a warehouse count of 2,500 translates to approximately a 200 GiB dataset. 

The demo cluster is using 3x `c2-standard-16` machines to match that volume, along with provisioned IOPS to follow CockroachDB recommended production guidelines on [hardware ratios](https://www.cockroachlabs.com/docs/stable/recommended-production-settings.html#hardware). The workload is run through an internal client pod with an option to run through either and internal or external load-balancer as well.

## About the TPC-C Benchmark

The CockroachDB's built-in TPC-C workload is based on official TPC-C, the industry standard benchmark for On-line Transaction Processing (OLTP) performance. It simulates an industry-agnostic business with an OLTP database that manages, sells, or distributes a product.

The TPC-C workload measures databases across two different metrics:

- **Throughput**: Measured as throughput-per-minute (tpm), which in practical terms measures the number of orders processed per minute.
- **Scale**: Measured as the total number of warehouses supported. Each warehouse is of a fixed data size and has a max amount of tpm that it is allowed to support, so the total data size of the benchmark is scaled proportionally to throughput.

The CockroachDB TPC-C implementation can be found [here](https://github.com/cockroachdb/cockroach/tree/master/pkg/workload/tpcc) and the schema can be found [here](https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/tpcc/ddls.go). 

The TPC-C workload is constructed to validate that the efficiency rate can be sustained
when aiming for an increasingly higher tpmC (max throughput). Efficiency is measured in
an explicit way. There's a limit to the number of tpmC allowed per warehouse, which is 12.86. The data amount per warehouse is about 200MiB, so for 2,500 warehouses, the maximum throughput is `2500 x 12.86` tpmC, which is `32,150`.

Because TPC-C is constrained to a maximum amount of throughput per warehouse (12.86 tpmC), we often discuss TPC-C performance as the maximum number of warehouses for which a database can maintain the maximum throughput per minute (tpmC). In TPC-C, the required minimum to qualify is P95<10sec and efficiency number >85%. 

To take a few examples, assume:

- 100 warehouses at 200MiB * 100 gives 1240 tpmC (max is 1286), that's an efficiency
  rate of 96.4% or (1240/(100 * 12.86)).

- 1000 warehouses at 200MiB * 1000 gives 12,500 tpmC, that's an efficiency
  rate of 97.2% (12500/(1000 * 12.86)).

- 2500 warehouses at 200MiB * 2500 gives 30,837 tpmC, that's an efficiency
  rate of 95.9% (30837/(2500 * 12.86)).

- 100,000 warehouses at ~20TiB gives 1,200,000 tpmC, that's an efficiency
  rate of 93.3%.

The largest [published result (https://www.cockroachlabs.com/docs/v22.1/performance#benchmarks-used) for CockroachDB is 1.7M tpmC with 140,000 warehouses on 81 nodes, resulting in an efficiency score of 95%.

# TPC-C Test Setup

Overview of the cluster setup for a TPC-C workload size small (2.5K warehouses) on a 3 node cluster in a single region. For more details, see
[Performance Benchmarking with TPC-C Small](https://www.cockroachlabs.com/docs/v22.1/performance-benchmarking-with-tpcc-small). 
and also 
[Deploy CockroachDB with Kubernetes](https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-with-kubernetes.html). 

Layout:

- Single region: europe-west-1
- Machines: c2-standard-16
- 3 CockroachDB nodes + 1 client node
- Secure cluster
- Manual StatefulSet configuration

Optional:

- External load balancer service
- 1x client outside of k8s for controlling the tpcc workload

## Setup Steps

### Step 1 - Start GKE cluster

    gcloud container clusters create cockroachdb --machine-type c2-standard-16 --region europe-west1 --num-nodes 1

### Step 2 - Create RBAC roles

    kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<email>

### Step 3 - Configure cluster

    curl -O https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml

Edit `cockroachdb-statefulset.yaml` and update:

Resource requests / limits to reflect `c2-standard-16` machines:

        resources:
          requests:
            cpu: "15"
            memory: "55Gi"
          limits:
            cpu: "15"
            memory: "55Gi"

Add a custom `pd-ssd` storage class:

    ---
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
        name: gocrazy
    provisioner: kubernetes.io/gce-pd
    parameters:
        type: pd-ssd

Add `storageClassName` and change storage size to 2TiB:

    volumeClaimTemplates:
    - metadata:
          name: datadir
      spec:
          accessModes:
          - "ReadWriteOnce"
          storageClassName: gocrazy
          resources:
              requests:
                  storage: 2Ti

pd-ssd is recommended for pods < 32 vCPU and a minimum of 500 IOPS per vCPU is needed for optimal performance.

See also:

- https://www.cockroachlabs.com/docs/v22.1/recommended-production-settings.html#disk-i-o
- https://cloud.google.com/compute/docs/disks

### Step 4 - Create certificates

    mkdir certs my-safe-directory

    cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key

    cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key

    kubectl create secret generic cockroachdb.client.root --from-file=certs

    cockroach cert create-node localhost 127.0.0.1 cockroachdb-public cockroachdb-public.default cockroachdb-public.default.svc.cluster.local "*.cockroachdb" "*.cockroachdb.default" "*.cockroachdb.default.svc.cluster.local" --certs-dir=certs --ca-key=my-safe-directory/ca.key

    kubectl create secret generic cockroachdb.node --from-file=certs

    kubectl get secrets

### Step 5 - Initialize cluster

    kubectl create -f cockroachdb-statefulset.yaml

    kubectl get pods
    
    kubectl get pv

    kubectl exec -it cockroachdb-0 -- /cockroach/cockroach init --certs-dir=/cockroach/cockroach-certs

    kubectl get pods

### Step 6 - Create secure pod for SQL cli

    kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/bring-your-own-certs/client.yaml

    kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public

Create user in CLI:

    CREATE USER roach WITH PASSWORD '123456'; -- maintains the rank..
    GRANT admin to roach;

### Step 7 - Access DB console (optional)

Setup port forwarding:

    kubectl port-forward service/cockroachdb-public 8080

    open https://localhost:8080/#/overview/list

### Step 8 - Add external Load Balancer (optional)

    kubectl get services

    kubectl expose service cockroachdb --port=26257 --target-port=26257 --name=cockroachdb-external --type=LoadBalancer

    kubectl get services
    (wait for external ip)

Example:

    NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)              AGE
    cockroachdb            ClusterIP      None           <none>         26257/TCP,8080/TCP   11m
    cockroachdb-external   LoadBalancer   10.3.248.15    34.140.51.58   26257:30143/TCP      48s
    cockroachdb-public     ClusterIP      10.3.244.188   <none>         26257/TCP,8080/TCP   11m
    kubernetes             ClusterIP      10.3.240.1     <none>         443/TCP              26m

Try connecting which should fail:

    cockroach sql --url "postgres://root@34.140.51.58:26257" --certs-dir=certs

    ERROR: x509: certificate is valid for 127.0.0.1, not 34.140.51.58

Update certificates with external IP:

    cockroach cert create-node 34.140.51.58 localhost 127.0.0.1 cockroachdb-public cockroachdb-public.default cockroachdb-public.default.svc.cluster.local "*.cockroachdb" "*.cockroachdb.default" "*.cockroachdb.default.svc.cluster.local" --certs-dir=certs --ca-key=my-safe-directory/ca.key --overwrite

    kubectl delete secret cockroachdb.node --ignore-not-found

    kubectl create secret generic cockroachdb.node --from-file=certs

Restart pods and reconnect with success:

    cockroach sql --url "postgres://root@34.140.51.58:26257" --certs-dir=certs

## Benchmark Steps

### Step 1 - Import dataset 

2,500 warehouses is about 200GiB of data - see jobs in db console.

Use either alternative:

Option 1 - via public ip:

    cockroach workload fixtures import tpcc --warehouses=2500 'postgres://root@<external-ip>:26257?sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/node.crt&sslkey=certs/node.key'

Option 2 - via client pod and public service:

    kubectl exec -it cockroachdb-client-secure -- ./cockroach workload fixtures import tpcc --warehouses=2500 'postgres://root@cockroachdb-public:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key'

### Step 2 - Run TPC-C workload for 30m

Use either alternative:

Option 1 - via external lb:

    ulimit -n 100000 && cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=15m 'postgres://root@34.140.51.58:26257?sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/node.crt&sslkey=certs/node.key'

Option 2 - via client pod and public service:

    kubectl exec -it cockroachdb-client-secure -- ./cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=30m 'postgres://root@cockroachdb-public:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key'

Option 3 - via client pod directly to pods:

create an addrs file:

    postgres://root@cockroachdb-0.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key postgres://root@cockroachdb-1.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key postgres://root@cockroachdb-2.cockroachdb.default.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=/cockroach-certs/ca.crt&sslcert=/cockroach-certs/client.root.crt&sslkey=/cockroach-certs/client.root.key

run:

    kubectl exec -it cockroachdb-client-secure -- ./cockroach workload run tpcc --tolerate-errors --warehouses=2500 --ramp=1m --duration=30m $(cat addrs)

### Step 3 - Review results

Ex:

    _elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
    1800.1s    30837.5  95.9%    326.2    302.0    604.0    704.6    973.1   3623.9

Benchmark passing criteria for our derivative TPC-C results:

- P90 Latency < 5 Seconds
- Efficiency rate over 95%.

TPC-C requirements are P95<10s and efficiency rate over 85%.

## Cleanup Steps

Ensure the storage claims are deleted as well since it's not automatic in GKE.

    kubectl delete pods,statefulsets,services,poddisruptionbudget,jobs,rolebinding,clusterrolebinding,role,clusterrole,serviceaccount -l app=cockroachdb

    kubectl delete pod cockroachdb-client-secure

    gcloud container clusters delete cockroachdb --region europe-west1

# Conclusion

This was a tutorial of running the TPC-C workload in CockroachDB on a self-hosted GKE cluster. 

# Related Resources

- https://www.cockroachlabs.com/guides/2022-cloud-report/
- https://www.cockroachlabs.com/docs/v22.1/kubernetes-performance.html
- https://www.cockroachlabs.com/docs/v22.1/operate-cockroachdb-kubernetes.html
- https://www.cockroachlabs.com/docs/v22.1/recommended-production-settings.html
- https://cloud.google.com/compute/docs/disks
- https://www.cockroachlabs.com/docs/v22.1/performance-benchmarking-with-tpcc-small
- https://www.cockroachlabs.com/docs/v22.1/performance#scale
- http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf

