CockroachDB is a geo-distributed SQL database purpose-built from the ground up for high scalability, fault tolerance, cloud neutrality and usability for developers and operators. It also offers the highest SQL standard for transactional integrity - serializable isolation.
The term geo-distribution is to emphasise its capability to break out of the low-latency, stable networking assumptions of a single data center or single region deployment. CockroachDB clusters can span the globe and still offer one logical database towards applications with intact semantics and guarantees. No more need for manual sharding.
One major influence on performance, when nodes need to perform some level of coordination, is network latency. There are numerous mechanisms at work in CockroachDB to mitigate the effects of high cross-link latencies. For performance and to ensure safety and liveness in volatile and ephemeral hosting environments.
One key ingredient for high performance and linear scalability in CockroachDB is the ability to distribute workload both vertically and horizontally and thereby leverage the aggregate compute/IO capacity of a cluster. This is mainly achieved by the database itself but application designers can help by using some best practices around schema design and query patterns and load balance traffic across the cluster. In general terms, it's about avoiding hotspots from forming, avoiding contention if possible and reducing large table scans.
CockroachDB uses a distributed SQL execution engine at its core, which means many things:
The SQL layer is parallelized and pushes processors close to the proximity of data.
The SQL optimizer is purpose-built with latency as a cost factor and locality awareness.
The transaction layer uses a sophisticated pipelining and parallel commit protocol to reduce round trips to the theoretical minimum for consensus.
Backup and restore are locality-aware.
The vectorized execution engine provides good performance for a wide range of queries.
Load-based splitting and rebalancing heuristics help to balance load across a cluster of machines.
Scaling reads is generally considered to be slightly easier than scaling writes. In a global or multi-region deployment topology, there are a couple of useful patterns including global tables, regional-by-row table localities and follower reads.
A global table means that all voting range replicas reside on nodes in the primary region, and non-voting replicas in remote regions to service consistent reads. The database automatically adjusts the replication factor (RF) to ensure there are range replicas for these tables in each configured region. It also uses something called non-blocking transactions in combination with non-voting replicas to provide low-latency global reads, also during workload contention. This concept is useful if you have a table which has a low volume of writes but high volumes of reads from different regions and the reads must be authoritative.
alter database test primary region "eu-north-1"; alter database test add region "eu-west-3"; alter database test add region "us-east-1"; create table postal_codes ( id int primary key, code string ); ALTER TABLE postal_codes SET LOCALITY GLOBAL;
Regional by Row
Regional by row is a table locality in which the home region is defined at the row level in a table. In contrast to regional tables, where all rows in a table have the same home region.
alter database test primary region "eu-north-1"; alter database test add region "eu-west-3"; alter database test add region "us-east-1"; CREATE TABLE users ( user_id INT NOT NULL, name STRING NULL, postal_code int NULL, PRIMARY KEY (user_id ASC) ); ALTER TABLE users SET LOCALITY regional by row; insert into users (user_id, crdb_region) select no, 'eu-north-1' from generate_series(1, 100) no; insert into users (user_id, crdb_region) select no, 'eu-west-3' from generate_series(101, 200) no; insert into users (user_id, crdb_region) select no, 'us-east-1' from generate_series(201, 300) no;
Follower reads are akin to Content Delivery Networks (CDN) by not having to chase the leaseholder for a given range that can potentially be located in another part of the world. Instead, the closest replica to a gateway node (receiving the request) can service the read with some staleness. There are two variants of follower reads called exact staleness and bounded staleness reads.
On the surface, follower reads may appear similar to global tables but the latter works quite differently through non-voting replicas and non-blocking transactions.
Follower reads are useful for more ad-hoc SQL queries for both partitioned and unpartitioned tables, where reads are allowed to be non-authoritative (potentially stale). Global tables are always authoritative (no staleness bounds) but pay for that in higher write latency.
The choice between follower reads and global tables should be driven by staleness requirements, read vs write volumes and survival goals. In other words, if the decision to write something is based on a read, and the value read must be authoritative, then a global table is a better choice. On the other hand, if write performance is a priority and a staleness window is acceptable, then follower-reads are better.
-- per statement: SELECT ID FROM USERS AS OF SYSTEM TIME follower_read_timestamp() WHERE id=1; -- alt via session var: BEGIN; SET TRANSACTION AS OF SYSTEM TIME follower_read_timestamp(); SELECT .. COMMIT;
Enabling the Next Generation of Multi-Region Applications with CockroachDB
CockroachDB: The Resilient Geo-Distributed SQL Database
CockroachDB is a geo-distributed SQL database designed for scalability, fault tolerance, cloud neutrality, and usability. It offers distributed SQL execution and concepts like global tables, and follower reads to help balance read-heavy load across a cluster of machines. When deciding between follower reads and global tables, factors such as staleness requirements, read vs write volumes, and survival goals should be taken into consideration. Global tables are better for authoritative reads, while follower reads are better for write performance with an acceptable staleness window.