<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Kai Niemi's Blog]]></title><description><![CDATA[Kai Niemi's Blog]]></description><link>https://blog.cloudneutral.se</link><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 14:59:16 GMT</lastBuildDate><atom:link href="https://blog.cloudneutral.se/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Handling Null Values Efficiently with Spring Named Parameter Binding]]></title><description><![CDATA[Introduction
Using Spring’s NamedParameterJdbcTemplate and binding null values with a SQL type code and type name may lead to the pgJDBC driver performing costly metadata queries to infer internal types. This is by JDBC specification and pgJDBC desig...]]></description><link>https://blog.cloudneutral.se/handling-null-values-efficiently-with-spring-named-parameter-binding</link><guid isPermaLink="true">https://blog.cloudneutral.se/handling-null-values-efficiently-with-spring-named-parameter-binding</guid><category><![CDATA[JDBC]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Thu, 27 Nov 2025 08:50:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Yz2yEUfijmo/upload/9e6a667bb95d3846ad5d0fbec3ef583f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Using Spring’s <code>NamedParameterJdbcTemplate</code> and binding <code>null</code> values with a SQL type code and type name may lead to the <a target="_blank" href="https://github.com/pgjdbc/pgjdbc">pgJDBC</a> driver performing costly metadata queries to infer internal types. This is by JDBC specification and pgJDBC design, but it’s good to know for performance reasons as it may go unnoticed. This post provides a few tips on how to avoid such queries entirely.</p>
<h1 id="heading-problem">Problem</h1>
<p>Any application or framework that use <code>PreparedStatement.setNull(int,int,String)</code> is candidate to these meta queries. The <a target="_blank" href="https://docs.oracle.com/en/java/javase/25/docs/api/java.sql/java/sql/PreparedStatement.html#setNull\(int,int,java.lang.String\)">javadoc</a> states:</p>
<blockquote>
<p>void setNull(int parameterIndex, int sqlType, <a target="_blank" href="https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/String.html">String</a> t<a target="_blank" href="https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/String.html">ypeNam</a>e) throws <a target="_blank" href="https://docs.oracle.com/en/java/javase/25/docs/api/java.sql/java/sql/SQLException.html">SQLException</a></p>
<p>Sets the designated parameter to SQL <code>NULL</code>. This version of the method <code>setNull</code> should be used for user-defined types and REF type parameters. Examples of user-defined types include: STRUCT, DISTINCT, JAVA_OBJECT, and named array types.</p>
<p><strong>Note:</strong> To be portable, applications must give the SQL type code and the fully-qualified SQL type name when specifying a NULL user-defined or REF parameter. In the case of a user-defined type the name is the type name of the parameter itself. For a REF parameter, the name is the type name of the referenced type. If a JDBC driver does not need the type code or type name information, it may ignore it. Although it is intended for user-defined and Ref parameters, this method may be used to set a null parameter of any JDBC type. If the parameter does not have a user-defined or REF type, the given typeName is ignored.</p>
</blockquote>
<p>The SHOULD phrase means you can use it for any column type, including primitives and JSONB for example. Typically, the pgJDBC implementation lways attempts to resolve the internal type when both the <code>sqlType</code> and <code>typeName</code> is specified. In case of <code>ARRAY</code>, it also attempts to resolve the array element type, which is another even more involved meta data query. This repeats for every parameter bind operation, so it can really amplify write latency in the worst case.</p>
<p>A metadata query can look like:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> pg_type.oid, typname
 <span class="hljs-keyword">FROM</span> pg_catalog.pg_type <span class="hljs-keyword">LEFT
   JOIN</span> (<span class="hljs-keyword">SELECT</span> ns.oid <span class="hljs-keyword">AS</span> nspoid, ns.nspname, r.r
 <span class="hljs-keyword">FROM</span> pg_namespace <span class="hljs-keyword">AS</span> ns
   <span class="hljs-keyword">JOIN</span> (<span class="hljs-keyword">SELECT</span> s.r, (current_schemas(_))[s.r] <span class="hljs-keyword">AS</span> nspname
 <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">ROWS</span>
 <span class="hljs-keyword">FROM</span> (generate_series(_, array_upper(current_schemas(_), _))) <span class="hljs-keyword">AS</span> s (r)) <span class="hljs-keyword">AS</span> r <span class="hljs-keyword">USING</span> (nspname)) <span class="hljs-keyword">AS</span> sp
    <span class="hljs-keyword">ON</span> sp.nspoid = typnamespace
   <span class="hljs-keyword">WHERE</span> typname = _
 <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> sp.r, pg_type.oid <span class="hljs-keyword">DESC</span>
 <span class="hljs-keyword">LIMIT</span> _
</code></pre>
<p>To demonstrate this, the following code sample passes <code>null</code> for the “price” and “description” columns. If <code>null</code> values are passed along with a column type constant and a type name, then the above query is sent by the driver to resolve the internal PG type.</p>
<pre><code class="lang-java">MapSqlParameterSource namedParameters = <span class="hljs-keyword">new</span> MapSqlParameterSource()
        .addValue(<span class="hljs-string">"id"</span>, UUID.randomUUID())
        .addValue(<span class="hljs-string">"inventory"</span>, <span class="hljs-number">99</span>)
        .addValue(<span class="hljs-string">"name"</span>, <span class="hljs-string">"product-x99"</span>)
        .addValue(<span class="hljs-string">"sku"</span>, <span class="hljs-string">"X-99"</span>)
        .addValue(<span class="hljs-string">"price"</span>, <span class="hljs-keyword">null</span>, Types.DECIMAL, <span class="hljs-string">"decimal"</span>)
        .addValue(<span class="hljs-string">"description"</span>, <span class="hljs-keyword">null</span>, Types.OTHER, <span class="hljs-string">"jsonb"</span>);

<span class="hljs-keyword">new</span> NamedParameterJdbcTemplate(jdbcTemplate)
   .update(<span class="hljs-string">"insert into product (id, inventory, name, description, price, sku) "</span> +
           <span class="hljs-string">"values (:id,:inventory,:name,:description,:price,:sku)"</span>, namedParameters);
</code></pre>
<h1 id="heading-solution">Solution</h1>
<ul>
<li><p>Prefer indexed placeholders <code>(?,?,..)</code> over named parameter placeholders, thus avoiding the <code>NamedParameterJdbcTemplate</code> problem entirely</p>
</li>
<li><p>Omit specifying the column type name. For example:</p>
<ul>
<li><p><code>addValue("price", null)</code></p>
</li>
<li><p><code>addValue("description", null)</code></p>
</li>
</ul>
</li>
<li><p>If still binding a <code>NULL</code> value with a type name, use another type constant like <code>Types.NULL</code>:</p>
<ul>
<li><p><code>addValue("price", null, Types.NULL, "decimal")</code></p>
</li>
<li><p><code>addValue("description", null, Types.NULL, "jsonb")</code></p>
</li>
</ul>
</li>
<li><p>Set the system property <code>-Dspring.jdbc.getParameterType.ignore=true</code> (<a target="_blank" href="https://github.com/spring-projects/spring-framework/blob/2641b5d783aac7c76acb8db86%5B%E2%80%A6%5Dn/java/org/springframework/jdbc/core/StatementCreatorUtils.java">source</a>) to avoid similar situations with ARRAYs</p>
</li>
</ul>
<h1 id="heading-how-to-verify">How to verify</h1>
<p>Set the pgJDBC and spring jdbc core namespace logger levels to <code>TRACE</code>.</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">logger</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"org.postgresql"</span> <span class="hljs-attr">level</span>=<span class="hljs-string">"TRACE"</span>/&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">logger</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"org.springframework.jdbc.core"</span> <span class="hljs-attr">level</span>=<span class="hljs-string">"TRACE"</span>/&gt;</span>
</code></pre>
<p>The log output will be verbose but watch out for all <code>pg_catalog</code> queries before your actual INSERT or UPDATE statements:</p>
<pre><code class="lang-pgsql"><span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]   simple <span class="hljs-keyword">execute</span>, <span class="hljs-keyword">handler</span>=org.postgresql.jdbc.PgStatement$StatementResultHandler@<span class="hljs-number">5</span>c134052, maxRows=<span class="hljs-number">0</span>, fetchSize=<span class="hljs-number">0</span>, flags=<span class="hljs-number">17</span>
<span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]  FE=&gt; Parse(stmt=<span class="hljs-keyword">null</span>,query="SELECT pg_type.oid, typname   FROM pg_catalog.pg_type   LEFT   JOIN (select ns.oid as nspoid, ns.nspname, r.r           from pg_namespace as ns           join ( select s.r, (current_schemas(false))[s.r] as nspname                    from generate_series(1, array_upper(current_schemas(false), 1)) as s(r) ) as r          using ( nspname )        ) as sp     ON sp.nspoid = typnamespace  WHERE typname = $1  ORDER BY sp.r, pg_type.oid DESC LIMIT 1",<span class="hljs-keyword">oids</span>={<span class="hljs-number">1043</span>})
<span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]  FE=&gt; Bind(stmt=<span class="hljs-keyword">null</span>,portal=<span class="hljs-keyword">null</span>,<span class="hljs-meta">$1</span>=&lt;(<span class="hljs-string">'jsonb'</span>)&gt;,<span class="hljs-keyword">type</span>=<span class="hljs-type">VARCHAR</span>)
<span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]  FE=&gt; Describe(portal=<span class="hljs-keyword">null</span>)
<span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]  FE=&gt; <span class="hljs-keyword">Execute</span>(portal=<span class="hljs-keyword">null</span>,<span class="hljs-keyword">limit</span>=<span class="hljs-number">0</span>)
<span class="hljs-number">2025</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span> <span class="hljs-number">12</span>:<span class="hljs-number">33</span>:<span class="hljs-number">06.157</span> TRACE [o.postgresql.core.v3.QueryExecutorImpl]  FE=&gt; Sync
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Efficiently handling <code>null</code> values in Spring's <code>NamedParameterJdbcTemplate</code> is essential for optimizing database interactions and minimizing unnecessary overhead. This involves preferring indexed placeholders, omitting column type names, and configuring a system property to avoid costly metadata queries.</p>
]]></content:encoded></item><item><title><![CDATA[Multi-active systems]]></title><description><![CDATA[Some business drivers that could justify adopting a multi-active and multi-region deployment strategy:

Securing business continuity in the event of regional data centre disruptions

Deliver a good customer experience worldwide

Deliver business adap...]]></description><link>https://blog.cloudneutral.se/multi-active-systems</link><guid isPermaLink="true">https://blog.cloudneutral.se/multi-active-systems</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[high availability]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Mon, 31 Jul 2023 07:15:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/YkjVsetNN9s/upload/415367df88debc6995c0e5a93bb35b9f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Some business drivers that could justify adopting a <strong>multi-active</strong> and <strong>multi-region</strong> deployment strategy:</p>
<ul>
<li><p>Securing business continuity in the event of regional data centre disruptions</p>
</li>
<li><p>Deliver a good customer experience worldwide</p>
</li>
<li><p>Deliver business adaptability and scalability to different market needs/volumes</p>
</li>
<li><p>Compliance with data locality/placement regulations</p>
</li>
<li><p>Sustainable operational costs as the business grows</p>
</li>
</ul>
<p>Multi-active systems are a prerequisite to effectively adopting a multi-region strategy. Let's find out how.</p>
<h2 id="heading-multi-active-systems">Multi-Active Systems</h2>
<p>A <strong>multi-active</strong> system is capable to operate and serve online traffic simultaneously from multiple active data centres and regions. With that comes characteristics that will satisfy the goals stated above without adding too much infrastructure/app complexity and cost overhead.</p>
<p>Multi-active systems run live in multiple datacenters in different regions all the time and workloads are dynamically shared across these datacenters. A multi-active system process requests simultaneously for either domestic or global markets without any assumptions on locality, traffic affinity or replication delays.</p>
<p>There are no actual concepts of traffic failover or failback. Instead, failures and disruptions are handled transparently through regional and/or global load balancing and traffic rerouting. If one failure domain (data centre or region) begins to fall behind due to disruptions, parts of its workload can move to other domains transparently. If one domain is completely offline, all its work is rebalanced to the remaining domains.</p>
<p>Using three failure domains allows one to go completely dark, while still allowing systems to make forward progress through consensus decisions. Using five failure domains allows two domains to go dark, and so on. This allows for systems to be both highly available and always consistent and correct.</p>
<p>A multi-active system is not without certain challenges:</p>
<ul>
<li><p>Needs a fit-for-purpose solution to manage state and replication (data distribution at a global scale is a difficult problem) such as:</p>
<ul>
<li><p>Protecting rule invariants also during contention and failures</p>
</li>
<li><p>Ensure effectively once outcomes of event processing</p>
</li>
</ul>
</li>
<li><p>Higher service latency due to mandatory cross-datacenter coordination</p>
</li>
<li><p>Existing design assumptions on single DC/region deployment</p>
</li>
<li><p>Constraints around auxiliary system integrations</p>
</li>
<li><p>Breaking current assumptions on system design</p>
</li>
</ul>
<p>To contain complexity, most of these challenges are preferably pushed down to the resource tier - the database - instead of being managed in the app tier. CockroachDB is one such system with a wide range of multi-region deployment options and first-class support for crafting <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/multi-active-availability">multi-active</a> systems.</p>
<h2 id="heading-failover-based-systems">Failover-based Systems</h2>
<p>To contrast multi-active systems, let's quickly look at the main predecessor: singly-homed, failover-based systems. A singly-homed system is crafted to operate and serve online traffic from a single data centre or at most a single region. Crafted in terms of design choices, technology selections, network assumptions and supporting infrastructure components.</p>
<p>In the event of a primary domain disruption or disaster, a singly-homed system may failover traffic to an alternative secondary data centre. After the disruption is cleared away, it may "fail back" traffic to the original primary domain.</p>
<p>This type of setup has many limitations:</p>
<ul>
<li><p>Unable to scale horizontally beyond a single data centre/region</p>
</li>
<li><p>Unable to load-balance traffic freely across multiple active datacenters</p>
</li>
<li><p>Dependent on standby, underutilized resources, increasing TCO (300% capacity for steady state)</p>
</li>
<li><p>Must use asynchronous replication for availability and performance, with the risk of data loss</p>
</li>
<li><p>Long recovery times after failures</p>
</li>
<li><p>Complex and error-prone failover protocols with manual checkpoints/sign-offs</p>
</li>
<li><p>Unclear when and if a standby system can resume traffic from a safe point</p>
</li>
<li><p>Difficult and risky to test and verify that the protocol works</p>
</li>
</ul>
<h2 id="heading-disaster-recovery-spectrum">Disaster Recovery Spectrum</h2>
<p>The main objective of a <strong>disaster recovery</strong> plan is to minimize the time it takes to recover from a severe disruption event and reduce the amount of data loss and other business impacts.</p>
<p>The spectrum of disaster recovery solutions typically ranges from <strong>offline backups</strong> to full-blown <strong>multi-region deployments</strong>, also with backups.</p>
<ul>
<li><p><strong>Backups</strong> - Data is frequently backed up and sent off-side or to cloud storage.</p>
<ul>
<li><p>The recovery time objective (RTO) is governed by the time it takes to restore the database to a new setup</p>
</li>
<li><p>The recovery point objective (RPO) is governed by the frequency of incremental and full backups</p>
</li>
</ul>
</li>
<li><p><strong>Cold Standby</strong> - A minimally provisioned environment with the ability to take over core services from a failed primary data centre.</p>
<ul>
<li><p>Higher TCO due to under-utilized standby capacity</p>
</li>
<li><p>RTO is governed by how fast a switchover can be made to the secondary</p>
</li>
<li><p>RPO is governed by the async replication delay from the primary to the secondary</p>
</li>
</ul>
</li>
<li><p><strong>Warm Standby</strong> - A fully provisioned environment with the ability to take over a failed primary data centre.</p>
<ul>
<li><p>Higher TCO due to excessive amounts of under-utilized standby capacity</p>
</li>
<li><p>Could serve certain read-only traffic at the same time as the primary</p>
</li>
<li><p>RTO and RPO are quite similar to a cold standby, only a bit lower due to higher readiness</p>
</li>
</ul>
</li>
<li><p><strong>Multi-Active</strong> - Each deployment site serves production traffic simultaneously.</p>
<ul>
<li><p>All data centres provide traffic at the same time for the entire keyspace</p>
</li>
<li><p>There is no actual notion of fail-over or fail-back, failures and recovery are handled transparently towards the app tier</p>
</li>
<li><p>RTO is governed by how quickly an isolated or crashed node can drop its authority over reads and writes to local data (typically a few seconds)</p>
</li>
<li><p>RPO is zero due to consensus-based replication</p>
</li>
</ul>
</li>
</ul>
<p>Multi-active systems stand out from most fail-over-based models in terms of cost and complexity reduction. It's far more resilient against different categories of disruptions, but not immune to disasters. If a multi-active system loses a majority of its failure domains (like 2 zones in a 3-zone region) or if some operator error corrupts a database, then the music stops. Therefore, backups are still commonly used alongside multi-active systems, which adds a safety harness for recovery.</p>
<p>Combined with multiple regions, the blast radius is extended to cover most conditions and you can also improve customer experience for a global market.</p>
<h1 id="heading-multi-region-deployments">Multi-region deployments</h1>
<p>One data centre is a single point of failure, similar to a single region. If that data centre/region goes offline for a longer period without any recovery option, it may have a severe impact on the business and the company's reputation.</p>
<p>Adding two or more data centres to a single region will increase the <em>blast radius</em> and decrease the likelihood of severe, long-lasting service disruptions due to a single DC outage.</p>
<p>Deploying a system (as in many services/components working in concert) across multiple, geo-separated regions extends the blast radius even further. Single-region assumptions cannot however be transferred to this new ecosystem due to how we traditionally manage state and consistency. Leveraging multi-region effectively requires a multi-active system architecture. Not exclusively, but it's very much a state/database undertaking that needs a fit-for-purpose solution like CockroachDB.</p>
<h1 id="heading-summary">Summary</h1>
<p>This article discusses the advantages of adopting a region-level deployment strategy for businesses, focusing on multi-active systems. These systems operate simultaneously across multiple data centres, providing increased resiliency and adaptability to market needs. The article also contrasts multi-active systems with traditional failover-based systems and examines the disaster recovery spectrum, including backups, cold standby, warm standby, and multi-active solutions. Ultimately, multi-active systems offer significant benefits in terms of cost and complexity reduction, while still requiring backups to ensure data safety.</p>
]]></content:encoded></item><item><title><![CDATA[User defined composite types]]></title><description><![CDATA[In a previous article, we look at creating a simple distributed user-defined function (UDF) in CockroachDB. In this article, we'll revisit UDFs in the form of user-defined composite types, introduced in CockroachDB v23.1.
Introduction
A composite typ...]]></description><link>https://blog.cloudneutral.se/user-defined-composite-types</link><guid isPermaLink="true">https://blog.cloudneutral.se/user-defined-composite-types</guid><category><![CDATA[SQL]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Fri, 30 Jun 2023 12:13:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/NeTPASr-bmQ/upload/6e73320de6148c3e639c1e4f5e3629cd.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a <a target="_blank" href="https://blog.cloudneutral.se/user-defined-functions-in-cockroachdb">previous article</a>, we look at creating a simple distributed user-defined function (UDF) in CockroachDB. In this article, we'll revisit UDFs in the form of user-defined composite types, introduced in <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/create-type.html">CockroachDB v23.1</a>.</p>
<h1 id="heading-introduction">Introduction</h1>
<p>A composite type is simply a type composed of other types. In the following example, we are creating a composite money type. The money type is the combination of an amount, currency code and <em>monetary</em> type:</p>
<ul>
<li><p>The amount is a decimal with fractions matching the currency.</p>
</li>
<li><p>The currency is a 3-letter ISO 4217 code.</p>
</li>
<li><p>The monetary type is an arbitrary tag for denoting the type of money. For example:</p>
<ul>
<li><p>RM for real money</p>
</li>
<li><p>FM for funny money</p>
</li>
</ul>
</li>
</ul>
<p>On top of the type, we'll also add a few UDFs for money arithmetics. Ideally, these functions should only be allowed when operands use the same currency and monetary type. For example, you want to prevent adding 10 USD with 15 SEK or real money with funny money. There's however no way to enforce these rules in the DB itself.</p>
<p>Let's begin with the money type:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TYPE</span> money_type <span class="hljs-keyword">AS</span> (amount <span class="hljs-built_in">decimal</span>, currency_code <span class="hljs-built_in">char</span> (<span class="hljs-number">3</span>), monetary_type <span class="hljs-built_in">char</span> (<span class="hljs-number">2</span>));
</code></pre>
<p>Next, create a few UDFs for money operations:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_amount(x money_type) <span class="hljs-keyword">RETURNS</span> <span class="hljs-built_in">decimal</span> IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
   <span class="hljs-keyword">select</span> ((x).amount)::<span class="hljs-built_in">decimal</span>
$$;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_currency(x money_type) <span class="hljs-keyword">RETURNS</span> <span class="hljs-built_in">char</span>(<span class="hljs-number">3</span>) IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
   <span class="hljs-keyword">select</span> ((x).currency_code)::<span class="hljs-built_in">char</span>(<span class="hljs-number">3</span>)
$$;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_monetary_type(x money_type) <span class="hljs-keyword">RETURNS</span> <span class="hljs-built_in">char</span>(<span class="hljs-number">3</span>) IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
   <span class="hljs-keyword">select</span> ((x).monetary_type)::<span class="hljs-built_in">char</span>(<span class="hljs-number">3</span>)
$$;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> to_money(x <span class="hljs-keyword">string</span>) <span class="hljs-keyword">RETURNS</span> money_type IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
   <span class="hljs-keyword">select</span> (
    split_part($<span class="hljs-number">1</span>,<span class="hljs-string">' '</span>,<span class="hljs-number">1</span>)::<span class="hljs-built_in">decimal</span>,
    split_part($<span class="hljs-number">1</span>,<span class="hljs-string">' '</span>,<span class="hljs-number">2</span>),
    split_part($<span class="hljs-number">1</span>,<span class="hljs-string">' '</span>,<span class="hljs-number">3</span>)
    )::money_type
$$;
</code></pre>
<p>Next, let's add a few money arithmetics UDFs:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_add(<span class="hljs-keyword">IN</span> m money_type, <span class="hljs-keyword">IN</span> addend <span class="hljs-built_in">decimal</span>) <span class="hljs-keyword">RETURNS</span> money_type IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
<span class="hljs-keyword">select</span> (
    (m).amount + addend,
    (m).currency_code,
    (m).monetary_type
    )
$$;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_mult(<span class="hljs-keyword">IN</span> m money_type, <span class="hljs-keyword">IN</span> multiplier <span class="hljs-built_in">decimal</span>) <span class="hljs-keyword">RETURNS</span> money_type IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
<span class="hljs-keyword">select</span> (
    (m).amount * multiplier,
    (m).currency_code,
    (m).monetary_type
    )
$$;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">FUNCTION</span> money_div(<span class="hljs-keyword">IN</span> m money_type, <span class="hljs-keyword">IN</span> dividend <span class="hljs-built_in">decimal</span>) <span class="hljs-keyword">RETURNS</span> money_type IMMUTABLE LEAKPROOF <span class="hljs-keyword">LANGUAGE</span> <span class="hljs-keyword">SQL</span> <span class="hljs-keyword">AS</span> $$
<span class="hljs-keyword">select</span> (
    (m).amount / dividend,
    (m).currency_code,
    (m).monetary_type
    )
$$;
</code></pre>
<p>That's about it. Now let's put the money type to use and see how things work. Create an account table holding a cached balance using the money type:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
(
    <span class="hljs-keyword">id</span>             <span class="hljs-keyword">uuid</span>        <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> gen_random_uuid(),
    city           <span class="hljs-keyword">string</span>      <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    balance        money_type  <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    <span class="hljs-keyword">name</span>           <span class="hljs-keyword">string</span>(<span class="hljs-number">128</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    description    <span class="hljs-keyword">string</span>(<span class="hljs-number">256</span>) <span class="hljs-literal">null</span>,
    closed         <span class="hljs-built_in">boolean</span>     <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> <span class="hljs-literal">false</span>,
    allow_negative <span class="hljs-built_in">integer</span>     <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> <span class="hljs-number">0</span>,
    updated_at     timestamptz <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> clock_timestamp(),

    primary <span class="hljs-keyword">key</span> (<span class="hljs-keyword">id</span>)
);
</code></pre>
<p>Let's strengthen the integrity a bit with these CHECK constraints (totally optional):</p>
<pre><code class="lang-sql"><span class="hljs-keyword">alter</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
    <span class="hljs-keyword">add</span> <span class="hljs-keyword">constraint</span> check_account_allow_negative <span class="hljs-keyword">check</span> (allow_negative <span class="hljs-keyword">between</span> <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> <span class="hljs-number">1</span>);
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
    <span class="hljs-keyword">add</span> <span class="hljs-keyword">constraint</span> check_account_positive_balance <span class="hljs-keyword">check</span> ((balance).amount * <span class="hljs-keyword">abs</span>(allow_negative - <span class="hljs-number">1</span>) &gt;= <span class="hljs-number">0</span>);
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
    <span class="hljs-keyword">add</span> <span class="hljs-keyword">constraint</span> check_account_currency <span class="hljs-keyword">check</span> ((balance).currency_code <span class="hljs-keyword">in</span> (<span class="hljs-string">'SEK'</span>, <span class="hljs-string">'USD'</span>, <span class="hljs-string">'GBP'</span>, <span class="hljs-string">'EUR'</span>));
</code></pre>
<p>As you can see, different accounts may or may not accept a negative balance depending on the current flag. We could also have used an enum type for the currency.</p>
<p>Add some data:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">account</span> (<span class="hljs-keyword">id</span>, city, balance, <span class="hljs-keyword">name</span>, allow_negative)
<span class="hljs-keyword">VALUES</span> (<span class="hljs-string">'10000000-0000-0000-0000-000000000000'</span>, <span class="hljs-string">'stockholm'</span>, to_money(<span class="hljs-string">'100.00 SEK RM'</span>), <span class="hljs-string">'test:1'</span>, <span class="hljs-number">0</span>),
       (<span class="hljs-string">'20000000-0000-0000-0000-000000000000'</span>, <span class="hljs-string">'stockholm'</span>, to_money(<span class="hljs-string">'200.00 SEK RM'</span>), <span class="hljs-string">'test:2'</span>, <span class="hljs-number">1</span>),
       (<span class="hljs-string">'30000000-0000-0000-0000-000000000000'</span>, <span class="hljs-string">'new york'</span>, to_money(<span class="hljs-string">'300.00 USD PM'</span>), <span class="hljs-string">'test:3'</span>, <span class="hljs-number">0</span>),
       (<span class="hljs-string">'40000000-0000-0000-0000-000000000000'</span>, <span class="hljs-string">'new york'</span>, to_money(<span class="hljs-string">'400.00 USD PM'</span>), <span class="hljs-string">'test:4'</span>, <span class="hljs-number">1</span>);
</code></pre>
<p>Let's see how this looks:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">id</span>,city,balance,
       money_amount(balance),
       money_currency(balance),
       money_monetary_type(balance)
<span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span>;
                   id                  |   city    |     balance     | money_amount | money_currency | money_monetary_type
<span class="hljs-comment">---------------------------------------+-----------+-----------------+--------------+----------------+----------------------</span>
  10000000-0000-0000-0000-000000000000 | stockholm | (100.00,SEK,RM) |       100.00 | SEK            | RM
  20000000-0000-0000-0000-000000000000 | stockholm | (200.00,SEK,RM) |       200.00 | SEK            | RM
  30000000-0000-0000-0000-000000000000 | new york  | (300.00,USD,PM) |       300.00 | USD            | PM
  40000000-0000-0000-0000-000000000000 | new york  | (400.00,USD,PM) |       400.00 | USD            | PM
(4 rows)

Time: 14ms total (execution 14ms / network 0ms)
</code></pre>
<p>Let's execute an aggregation query:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">sum</span>(money_amount(balance)) balance, 
       money_currency(balance) currency 
<span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span> balance,currency;

  balance | currency
<span class="hljs-comment">----------+-----------</span>
   100.00 | SEK
   200.00 | SEK
   300.00 | USD
   400.00 | USD
(4 rows)

Time: 2ms total (execution 2ms / network 0ms)
</code></pre>
<p>Updating the money type can be done using one of the arithmetic functions:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">set</span> balance=money_add(balance,<span class="hljs-number">-90.00</span>) <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span>=<span class="hljs-string">'10000000-0000-0000-0000-000000000000'</span>;
<span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">set</span> balance=money_add(balance,<span class="hljs-number">-100.00</span>) <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span>=<span class="hljs-string">'20000000-0000-0000-0000-000000000000'</span>;
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article provides an example of how to create a user-defined composite type in CockroachDB v23.1, specifically a money type composed of an amount, currency code, and monetary type. It also explains how to use UDFs for money operations, create a table to hold a cached balance, and add CHECK constraints to strengthen the integrity.</p>
]]></content:encoded></item><item><title><![CDATA[Transaction timeouts in CockroachDB]]></title><description><![CDATA[In a previous article series on Spring Data JPA and CockroachDB, we look into different methods to avoid lengthy transaction execution times. Until recently, however, there's not been any way to specify the transaction execution timeout in CockroachD...]]></description><link>https://blog.cloudneutral.se/transaction-timeouts-in-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/transaction-timeouts-in-cockroachdb</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Fri, 30 Jun 2023 12:09:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/cvw2Zx86IaQ/upload/7502df72d8b048cc95889e9f1b5644aa.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a previous article series on Spring Data JPA and CockroachDB, we look into <a target="_blank" href="https://blog.cloudneutral.se/jpa-best-practices-explicit-and-implicit-transactions#heading-11-how-to-limit-transaction-lifetime">different methods</a> to avoid lengthy transaction execution times. Until recently, however, there's not been any way to specify the transaction execution timeout in CockroachDB only at the statement level.</p>
<p>This has changed since CockroachDB v23.1 where a <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/set-vars.html">new session variable</a> for transaction timeouts was introduced, unsurprisingly called <code>transaction_timeout</code>:</p>
<blockquote>
<p>New in v23.1: Aborts an explicit transaction when it runs longer than the configured duration. Stored in milliseconds; can be expressed in milliseconds or as an INTERVAL.</p>
</blockquote>
<h1 id="heading-overview">Overview</h1>
<p>Transaction timeouts are helpful if you need to set a fixed upper limit for how long to wait for an explicit transaction to complete. If a transaction is not completed within that timeframe it's aborted and then you that any provisional writes did not complete.</p>
<p>In contrast, if you just wait for an arbitrary amount of time and then interrupt the calling thread, then you have an ambiguous result where you can't tell if an operation took place or not since the commit could have been completed or rolled back just before the cancellation. Ambiguous results for non-idempotent operations are typically not a good thing for safety.</p>
<p>Now let's see how to hook up transaction timeouts in a fully transparent way using Spring's <code>@Transactional</code> annotation and AspectJ. Similar to how we can deal with transaction retries.</p>
<h1 id="heading-source-code">Source Code</h1>
<p>The code for this article is available on GitHub.</p>
<h1 id="heading-aop-timeout-solution">AOP Timeout Solution</h1>
<p>We are going to set the attributes using AOP and AspectJ, which is a core concept in Spring Boot.</p>
<p>A small recap on basic AOP terminology:</p>
<ul>
<li><p><strong>Aspect</strong> - An orthogonal cross-cutting concern that you wrap in a contained module or <em>aspect</em>. Like retries, logging, security or in our case setting session variables.</p>
</li>
<li><p><strong>Joinpoint</strong> - Points in the application code where to plugin the aspect, such as method execution or the handling of an exception.</p>
</li>
<li><p><strong>Pointcut</strong> - One or more join points where advice should be executed, often using pointcut expressions.</p>
</li>
<li><p><strong>Advice</strong> - The action to be performed either before or after method execution, akin to an <em>interceptor</em>.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1616528951532/nb4XvVYdT.png?auto=compress,format&amp;format=webp" alt="Screenshot 2021-03-23 at 20.48.13.png" /></p>
<p>To set setting attributes, we create a <code>TransactionAttributesAspect</code> with an around-advice:</p>
<pre><code class="lang-java"><span class="hljs-keyword">import</span> org.aspectj.lang.ProceedingJoinPoint;
<span class="hljs-keyword">import</span> org.aspectj.lang.annotation.Around;
<span class="hljs-keyword">import</span> org.aspectj.lang.annotation.Aspect;
<span class="hljs-keyword">import</span> org.aspectj.lang.annotation.Pointcut;
<span class="hljs-keyword">import</span> org.springframework.core.Ordered;
<span class="hljs-keyword">import</span> org.springframework.core.annotation.Order;
<span class="hljs-keyword">import</span> org.springframework.transaction.TransactionDefinition;
<span class="hljs-keyword">import</span> org.springframework.transaction.annotation.Transactional;

<span class="hljs-meta">@Aspect</span>
<span class="hljs-meta">@Order(Ordered.LOWEST_PRECEDENCE - 2)</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TransactionAttributesAspect</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> JdbcTemplate jdbcTemplate;

    <span class="hljs-meta">@Pointcut("execution(public * *(..)) "
            + "&amp;&amp; @annotation(transactional)")</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">anyTransactionalOperation</span><span class="hljs-params">(Transactional transactional)</span> </span>{
    }

    <span class="hljs-meta">@Around(value = "anyTransactionalOperation(transactional)", argNames = "pjp,transactional")</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Object <span class="hljs-title">doAroundTransactionalMethod</span><span class="hljs-params">(ProceedingJoinPoint pjp, Transactional transactional)</span> <span class="hljs-keyword">throws</span> Throwable </span>{
        Assert.isTrue(TransactionSynchronizationManager.isActualTransactionActive(), <span class="hljs-string">"Explicit transaction required"</span>);
        applyVariables(transactional);
        <span class="hljs-keyword">return</span> pjp.proceed();
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">applyVariables</span><span class="hljs-params">(Transactional transactional)</span> </span>{
        <span class="hljs-keyword">if</span> (transactional.timeout() != TransactionDefinition.TIMEOUT_DEFAULT) {
            jdbcTemplate.update(<span class="hljs-string">"SET transaction_timeout=?"</span>, transactional.timeout() * <span class="hljs-number">1000</span>);
        }

        <span class="hljs-keyword">if</span> (transactional.readOnly()) {
            jdbcTemplate.execute(<span class="hljs-string">"SET transaction_read_only=true"</span>);
        }
    }
}
</code></pre>
<p>This <em>weaves</em> in the <code>doAroundTransactionalMethod</code> advice at runtime on all public methods annotated with Spring's <code>@Transactional</code> annotation. This is pretty much what the pointcut expression says:</p>
<pre><code class="lang-apache">@<span class="hljs-attribute">Pointcut</span>(<span class="hljs-string">"execution(public * *(..)) &amp;&amp; @annotation(transactional))</span>
</code></pre>
<p>Lastly, we look at the annotation properties and use a JDBC template to set the appropriate variables while assuming there's an open transaction in scope.</p>
<pre><code class="lang-java"><span class="hljs-keyword">if</span> (transactional.timeout() != TransactionDefinition.TIMEOUT_DEFAULT) {
   jdbcTemplate.update(<span class="hljs-string">"SET transaction_timeout=?"</span>, transactional.timeout() * <span class="hljs-number">1000</span>);
}
</code></pre>
<h1 id="heading-testing-timeouts">Testing Timeouts</h1>
<p>To test this in action, let's create a simple service and a few repositories:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">OrderService</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> OrderRepository orderRepository;

    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> ProductRepository productRepository;

    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW, readOnly = true)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Product <span class="hljs-title">findProduct</span><span class="hljs-params">(String sku)</span> </span>{
        <span class="hljs-keyword">return</span> productRepository.findBySku(sku)
                .orElseThrow(() -&gt; <span class="hljs-keyword">new</span> ObjectRetrievalFailureException(Product.class, sku));
    }

    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW, timeout = 5)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">placeOrderWithTimeout</span><span class="hljs-params">(Order order, <span class="hljs-keyword">long</span> delayMillis)</span> </span>{
        placeOrderAndUpdateInventory(order);

        <span class="hljs-keyword">try</span> {
            logger.info(<span class="hljs-string">"Entering sleep for "</span> + delayMillis);
            Thread.sleep(delayMillis);
        } <span class="hljs-keyword">catch</span> (InterruptedException e) {
            Thread.currentThread().interrupt();
        } <span class="hljs-keyword">finally</span> {
            logger.info(<span class="hljs-string">"Exited sleep for "</span> + delayMillis);
        }
    }

    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">placeOrderWithoutTimeout</span><span class="hljs-params">(Order order)</span> </span>{
        placeOrderAndUpdateInventory(order);
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">placeOrderAndUpdateInventory</span><span class="hljs-params">(Order order)</span> </span>{
        Assert.isTrue(!TransactionSynchronizationManager.isCurrentTransactionReadOnly(), <span class="hljs-string">"Read-only"</span>);
        Assert.isTrue(TransactionSynchronizationManager.isActualTransactionActive(), <span class="hljs-string">"No tx"</span>);

        <span class="hljs-comment">// Update product inventories</span>
        order.getOrderItems().forEach(orderItem -&gt; {
            Product product = orderItem.getProduct();
            product.addInventoryQuantity(-orderItem.getQuantity());
            productRepository.save(product); <span class="hljs-comment">// product is in detached state</span>
        });
        order.setStatus(ShipmentStatus.confirmed);

        orderRepository.save(order);
    }
}
</code></pre>
<p>In the <code>placeOrderWithTimeout</code> method, there's a fake delay that can last longer than the configured timeout to trigger an abort. Let's verify this in an integration test:</p>
<pre><code class="lang-java"><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TimeoutsTest</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">AbstractIntegrationTest</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> OrderService orderService;

    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> TestSetup testSetup;

    <span class="hljs-meta">@BeforeAll</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">setupTest</span><span class="hljs-params">()</span> </span>{
        testSetup.setupTestData();
    }

    <span class="hljs-meta">@org</span>.junit.jupiter.api.Order(<span class="hljs-number">1</span>)
    <span class="hljs-meta">@Test</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">whenCreatingOrderWithTimeoutThatExpires_thenExpectRollback</span><span class="hljs-params">()</span> </span>{
        Product p1 = orderService.findProduct(<span class="hljs-string">"p1"</span>);
        <span class="hljs-keyword">int</span> inventory = p1.getInventory();

        JpaSystemException ex = Assertions.assertThrows(JpaSystemException.class, () -&gt; {
            orderService.placeOrderWithTimeout(Order.builder()
                            .andOrderItem()
                            .withProduct(p1)
                            .withQuantity(<span class="hljs-number">1</span>)
                            .withUnitPrice(p1.getPrice())
                            .then()
                            .build(),
                    <span class="hljs-number">7000</span>);
        });

        Assertions.assertEquals(<span class="hljs-string">"transaction timeout expired"</span>, ex.getMessage());
        Assertions.assertEquals(inventory, orderService.findProduct(<span class="hljs-string">"p1"</span>).getInventory());

        logger.info(<span class="hljs-string">"Exception thrown"</span>, ex);
    }

    <span class="hljs-meta">@org</span>.junit.jupiter.api.Order(<span class="hljs-number">2</span>)
    <span class="hljs-meta">@Test</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">whenCreatingOrderWithTimeout_thenExpectCommit</span><span class="hljs-params">()</span> </span>{
        Product p1 = orderService.findProduct(<span class="hljs-string">"p1"</span>);
        <span class="hljs-keyword">int</span> inventory = p1.getInventory();

        orderService.placeOrderWithTimeout(Order.builder()
                        .andOrderItem()
                        .withProduct(p1)
                        .withQuantity(<span class="hljs-number">1</span>)
                        .withUnitPrice(p1.getPrice())
                        .then()
                        .build(),
                <span class="hljs-number">2000</span>);

        Assertions.assertEquals(inventory - <span class="hljs-number">1</span>, orderService.findProduct(<span class="hljs-string">"p1"</span>).getInventory());
    }

    <span class="hljs-meta">@org</span>.junit.jupiter.api.Order(<span class="hljs-number">3</span>)
    <span class="hljs-meta">@Test</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">whenCreatingOrderWithoutTimeout_thenExpectCommit</span><span class="hljs-params">()</span> </span>{
        Product p1 = orderService.findProduct(<span class="hljs-string">"p1"</span>);
        <span class="hljs-keyword">int</span> inventory = p1.getInventory();

        orderService.placeOrderWithoutTimeout(Order.builder()
                .andOrderItem()
                .withProduct(p1)
                .withQuantity(<span class="hljs-number">1</span>)
                .withUnitPrice(p1.getPrice())
                .then()
                .build());

        Assertions.assertEquals(inventory - <span class="hljs-number">1</span>, orderService.findProduct(<span class="hljs-string">"p1"</span>).getInventory());
    }
}
</code></pre>
<p>In this example if the transaction time out, it throws <code>JpaSystemException.</code></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article explains how to use the new <code>transaction_timeout</code> session variable in CockroachDB v23.1 to set a fixed upper limit for how long to wait for an explicit transaction to complete. It also provides an example of a service and repositories to test the timeout in an integration test, which verifies that when the timeout expires, the transaction is rolled back and the inventory remains unchanged.</p>
]]></content:encoded></item><item><title><![CDATA[One-Phase Commit Transaction Strategy]]></title><description><![CDATA[Introduction
A commonly adopted transaction strategy can be described as the best-efforts one-phase-commit (1PC) pattern. It's different from a global XA/2PC protocol where an external transaction manager ensures that all transaction properties are m...]]></description><link>https://blog.cloudneutral.se/one-phase-commit-transaction-strategy</link><guid isPermaLink="true">https://blog.cloudneutral.se/one-phase-commit-transaction-strategy</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Sun, 30 Apr 2023 18:29:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/cpCZqA1OTQk/upload/b307848758c50abdd7090ff8f1ae6a2f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>A commonly adopted transaction strategy can be described as the best-efforts one-phase-commit (1PC) pattern. It's different from a global XA/2PC protocol where an external transaction manager ensures that all transaction properties are maintained across the involved transactional resources (database, queue, etc).</p>
<p>The basic idea behind 1PC is to delay the commit in a transaction as late as possible so that the only things that can go wrong are infrastructure failures (because they are rare). All business processing failures are caught before it happens.</p>
<p>It is a relaxation of ACID properties spanning multiple transactional resources. That also means there's a certain risk for system inconsistency in a worst-case scenario, which can be mitigated if the processing is idempotent. For this strategy to work as safely as possible, idempotency is key.</p>
<p>This concept is also described in full detail in Dr David Syer's article <a target="_blank" href="https://www.infoworld.com/article/2077963/distributed-transactions-in-spring--with-and-without-xa.html">Distributed transactions in Spring, with and without XA</a> from 2009.</p>
<h1 id="heading-scenarios">Scenarios</h1>
<p>To illustrate 1PC let's review a couple of examples.</p>
<h2 id="heading-database-and-message-broker">Database and Message Broker</h2>
<p>Consider a typical <a target="_blank" href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/MessagingAdapter.html">service activator</a> scenario where there's a database and a JMS message queue involved. It includes a write to the database and an acknowledgement to the broker of receiving a message. Both these operations are independent, as in there's no atomicity.</p>
<blockquote>
<p>This scenario also maps to Kafka which uses consumer offsets and message retention rather than ephemeral message acks (removed once ack:ed for queue).</p>
</blockquote>
<ol>
<li><p>Start messaging transaction (broker delivers a message)</p>
</li>
<li><p>Receive message</p>
</li>
<li><p>Start database transaction</p>
</li>
<li><p>Update database</p>
</li>
<li><p>Commit database transaction</p>
</li>
<li><p>Commit messaging transaction (ack is sent to broker upon which the message is removed from destination)</p>
</li>
</ol>
<p>The order of the first four steps is not important. What is important is that the message must be received before updating the database and each transaction must start before its corresponding resource is used.</p>
<p>Dual-write problem (below):</p>
<pre><code class="lang-java"><span class="hljs-meta">@Component</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">RegistrationConsumer</span> </span>{
    <span class="hljs-meta">@JmsListener(destination = "${active-mq.topic}", containerFactory = "jmsListenerContainerFactory")</span>
    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">receiveMessage</span><span class="hljs-params">(RegistrationEvent event)</span> </span>{
        registrationRepository.save(toEntity(event));
    }
}
</code></pre>
<p>The following order is therefore just as valid:</p>
<ol>
<li><p>Start messaging transaction</p>
</li>
<li><p>Start database transaction</p>
</li>
<li><p>Receive message</p>
</li>
<li><p>Update database</p>
</li>
<li><p>Commit database transaction</p>
</li>
<li><p>Commit messaging transaction</p>
</li>
</ol>
<p>The last two steps (5 and 6) are important to be both in order and come last. It's better to surface business processing violations (bad input, rule violations, constraint violations etc) before sending things to be made permanent. When flushing database operations before acknowledging the message, there's less chance of both systems going out of sync.</p>
<p>An out-of-sync condition could be that the database transaction commits, but the message broker ack fails. Or the other way around, the commit fails and the ack succeeds. In either case, it will result in double-processing the same event and if the database writes are nonidempotent you end up with multiple side effects.</p>
<p>In the case of object-relational-mappers (ORM), most database actions take place during the commit phase due to the first-level cache. It's at that point where the JPA provider (Hibernate) performs update optimizations (collapsing) and determines what SQL statements to send to the database. It's also the phase where the database may raise data model constraint violations to preserve integrity.</p>
<p>Things that can go wrong in the messaging transaction are network and process failures with the broker, which are less likely to occur.</p>
<h2 id="heading-database-and-remote-api-call">Database and Remote API Call</h2>
<p>Consider a typical service boundary scenario where there's a database and a foreign API service involved. It includes a write to the database and an API call to the remote endpoint, which in turn creates some side-effect. Both these operations are independent, as in there's no atomicity.</p>
<ol>
<li><p>Start database transaction</p>
</li>
<li><p>Update database</p>
</li>
<li><p>Send a POST request to the remote API</p>
</li>
<li><p>Commit database transaction</p>
</li>
</ol>
<p>First off, it's not advisable to invoke remote calls from a transaction context. Therefore, the minimum ask would be to order the steps accordingly:</p>
<ol>
<li><p>Send a POST request to the remote API</p>
</li>
<li><p>Start database transaction</p>
</li>
<li><p>Update database</p>
</li>
<li><p>Commit database transaction</p>
</li>
</ol>
<p>Better, but still there's a potential issue here if the database transaction fails. In that case, when you retry the entire operation there will be another POST request sent to the endpoint. If that endpoint is nonidempotent you may end up with multiple side effects. Essentially this is the same problem as in the first example, called non-atomic dual-writes.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TransferService</span> </span>{
    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">createTransfer</span><span class="hljs-params">(TransferEntity entity)</span> </span>{
        ResponseEntity&lt;String&gt; response
                = <span class="hljs-keyword">new</span> RestTemplate().postForEntity(<span class="hljs-string">"https://api.bank.com"</span>,
toRquestPayload(entity), String.class);
        <span class="hljs-keyword">if</span> (!response.getStatusCode().is2xxSuccessful()) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Disturbance!"</span>);
        }
    }
}
</code></pre>
<p>However, in this scenario, if the endpoint is idempotent (invoking many times is the same as invoking once) then it doesn't matter how many times you retry it. It will only have one single side effect.</p>
<h2 id="heading-cockroachdb-and-xa">CockroachDB and XA</h2>
<p>Currently CockroachDB doesn't support the XA protocol but there's a tracking <a target="_blank" href="https://github.com/cockroachdb/cockroach/issues/22329">issue</a> for it. The good thing is that XA-distributed transactions are not strictly needed to support the above scenarios. There are plenty of alternative options, such as:</p>
<ul>
<li><p>Saga pattern:</p>
<ul>
<li><p>A decomposed version of 2PC where involved services implement participation and compensation methods as part of an agreement protocol. Either using an orchestrated or choreographed approach.</p>
</li>
<li><p>The practical use is between disparate services (not between databases and/or brokers)</p>
</li>
<li><p>Quite complicated to implement and test and reduces understandability</p>
</li>
</ul>
</li>
<li><p>Outbox pattern:</p>
<ul>
<li><p>Domain events are written to the database as part of the local transaction.</p>
</li>
<li><p>Domain events are published downstream after the commit point using CDC</p>
</li>
<li><p>Avoids the non-atomic dual write problem.</p>
</li>
<li><p>The practical use is between disparate services</p>
</li>
</ul>
</li>
<li><p>Inbox pattern:</p>
<ul>
<li><p>Incoming messages are stored in the database and then CDC is used to publish or self-subscribe to the messages</p>
</li>
<li><p>Offloads the message broker and adds retention</p>
</li>
<li><p>The practical use is between disparate services</p>
</li>
</ul>
</li>
<li><p>1PC with idempotency</p>
<ul>
<li>As described in this article</li>
</ul>
</li>
</ul>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article outlined a commonly adopted transaction strategy described as the best-efforts one-phase-commit (1PC) pattern. It offers a simple, low-effort alternative to XA pre-conditioned that operations are idempotent.</p>
]]></content:encoded></item><item><title><![CDATA[Testing Serializable Isolation in CockroachDB]]></title><description><![CDATA[As a follow-up to A Basic Guide to Transaction Isolation, this article will reproduce a handful of interesting tests described on the PostgreSQL SSI page. Only this time for CockroachDB.
Another great resource to illustrate the behaviour of serializa...]]></description><link>https://blog.cloudneutral.se/testing-serializable-isolation-in-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/testing-serializable-isolation-in-cockroachdb</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Sun, 30 Apr 2023 10:06:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/DU2MYqq2KIQ/upload/27eacb2498115dbedcdb0e3d42279e77.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As a follow-up to <a target="_blank" href="https://blog.cloudneutral.se/a-basic-guide-to-transaction-isolation">A Basic Guide to Transaction Isolation</a>, this article will reproduce a handful of interesting tests described on the PostgreSQL <a target="_blank" href="https://wiki.postgresql.org/wiki/SSI">SSI</a> page. Only this time for CockroachDB.</p>
<p>Another great resource to illustrate the behaviour of serializable is Martin Kleppman's <a target="_blank" href="https://github.com/ept/hermitage">Hermitage</a> project and the CockroachDB <a target="_blank" href="https://github.com/ept/hermitage/blob/master/cockroachdb.md">contribution</a>. It goes through a rich set of anomalies ranging from dirty writes to write skew on disjoint and predicate reads.</p>
<h1 id="heading-examples">Examples</h1>
<p>These tests were executed using the following CockroachDB version:</p>
<pre><code class="lang-bash">$ cockroach version
Build Tag:        v22.2.8
Build Time:       2023/04/17 13:22:08
Distribution:     CCL
Platform:         linux amd64 (x86_64-pc-linux-gnu)
Go Version:       go1.19.6
C Compiler:       gcc 6.5.0
Build Commit ID:  9a7c644e565b21d29db26a0a82524a00809d0a8c
Build Type:       release
</code></pre>
<p>First, create a test database:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database test"</span>
</code></pre>
<p>Next, start three separate shell windows representing transactions T1, T2 and T3. Whenever there's a "-- T1" comment for a SQL statement, run that statement in the designated console session.</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost --database <span class="hljs-built_in">test</span> // <span class="hljs-keyword">for</span> T1
cockroach sql --insecure --host=localhost --database <span class="hljs-built_in">test</span> // <span class="hljs-keyword">for</span> T2
cockroach sql --insecure --host=localhost --database <span class="hljs-built_in">test</span> // <span class="hljs-keyword">for</span> T3
</code></pre>
<h2 id="heading-black-and-white-marbles">Black and White Marbles</h2>
<p>This is a test for Write Skew (A5B), prevented by serializable. <a target="_blank" href="https://www.cockroachlabs.com/blog/what-write-skew-looks-like/">Write Skew</a> is when two transactions overlap and one reads data that another is writing.</p>
<p>Schema setup:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> marbles (
  <span class="hljs-keyword">id</span>    <span class="hljs-built_in">bigint</span>      <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> primary <span class="hljs-keyword">key</span>,
  color <span class="hljs-built_in">varchar</span>(<span class="hljs-number">25</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
);

<span class="hljs-keyword">delete</span> <span class="hljs-keyword">from</span> marbles <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles <span class="hljs-keyword">values</span>
  (<span class="hljs-number">1</span>,<span class="hljs-string">'black'</span>),
  (<span class="hljs-number">2</span>,<span class="hljs-string">'black'</span>),
  (<span class="hljs-number">3</span>,<span class="hljs-string">'black'</span>),
  (<span class="hljs-number">4</span>,<span class="hljs-string">'black'</span>),
  (<span class="hljs-number">5</span>,<span class="hljs-string">'black'</span>),
  (<span class="hljs-number">6</span>,<span class="hljs-string">'white'</span>),
  (<span class="hljs-number">7</span>,<span class="hljs-string">'white'</span>),
  (<span class="hljs-number">8</span>,<span class="hljs-string">'white'</span>),
  (<span class="hljs-number">9</span>,<span class="hljs-string">'white'</span>),
  (<span class="hljs-number">10</span>,<span class="hljs-string">'white'</span>);
</code></pre>
<p>Note: The <code>set transaction isolation level serializable</code> part is redundant for CockroachDB since it's the default (and only level supported).</p>
<p>Observe that CockroachDB serializable prevent this anomaly:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">update</span> marbles <span class="hljs-keyword">set</span> color = <span class="hljs-string">'black'</span> <span class="hljs-keyword">where</span> color = <span class="hljs-string">'white'</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">update</span> marbles <span class="hljs-keyword">set</span> color = <span class="hljs-string">'white'</span> <span class="hljs-keyword">where</span> color = <span class="hljs-string">'black'</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T1. First commit wins.</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T2. ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: committed value on key /Table/137/1/6/0): "sql txn" meta={id=f8ee6d8c key=/Table/137/1/1/0 pri=0.00066739 epo=0 ts=1682691951.358336984,2 min=1682691934.561490359,0 seq=5} lock=true stat=PENDING rts=1682691934.561490359,0 wto=false gul=1682691935.061490359,0</span>
SQLSTATE: 40001
HINT: See: https://www.cockroachlabs.com/docs/v22.2/transaction-retry-error-reference.html<span class="hljs-comment">#retry_serializable</span>
</code></pre>
<p>All the colours must match:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">from</span> marbles <span class="hljs-keyword">order</span> <span class="hljs-keyword">by</span> <span class="hljs-keyword">id</span>;
  id | color
<span class="hljs-comment">-----+--------</span>
   1 | black
   2 | black
   3 | black
   4 | black
   5 | black
   6 | black
   7 | black
   8 | black
   9 | black
  10 | black
(10 rows)
</code></pre>
<p>If you would run in SNAPSHOT (which CockroachDB doesn't provide) then it would not prevent write skew and look like this instead:</p>
<pre><code class="lang-sql">+<span class="hljs-comment">----+-------+</span>
| id | color |
+<span class="hljs-comment">----+-------+</span>
|  1 | white |
|  2 | white |
|  3 | white |
|  4 | white |
|  5 | white |
|  6 | black |
|  7 | black |
|  8 | black |
|  9 | black |
| 10 | black |
+<span class="hljs-comment">----+-------+</span>
<span class="hljs-comment">-- (10 rows)</span>
</code></pre>
<p>The colours have been flipped due to write skew, which is expected under concurrent execution with snapshot isolation (or read committed).</p>
<h2 id="heading-red-green-and-blue-marbles">Red, Green and Blue Marbles</h2>
<p>This example is similar to the previous one, only this time involving three transactions.</p>
<p>Setup schema:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> exist marbles (
  <span class="hljs-keyword">id</span>    <span class="hljs-built_in">bigint</span>      <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> primary <span class="hljs-keyword">key</span>,
  color <span class="hljs-built_in">varchar</span>(<span class="hljs-number">25</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
);

<span class="hljs-keyword">delete</span> <span class="hljs-keyword">from</span> marbles <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">1</span>,<span class="hljs-string">'red'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">2</span>,<span class="hljs-string">'red'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">3</span>,<span class="hljs-string">'red'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">4</span>,<span class="hljs-string">'yellow'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">5</span>,<span class="hljs-string">'yellow'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">6</span>,<span class="hljs-string">'yellow'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">7</span>,<span class="hljs-string">'blue'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">8</span>,<span class="hljs-string">'blue'</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> marbles (<span class="hljs-keyword">id</span>,color) <span class="hljs-keyword">values</span> (<span class="hljs-number">9</span>,<span class="hljs-string">'blue'</span>);
</code></pre>
<p>Again, CockrochDB <code>SERIALIZABLE</code> isolation prevents Write Skew (A5B):</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">SERIALIZABLE</span> ; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">update</span> marbles <span class="hljs-keyword">set</span> color = <span class="hljs-string">'yellow'</span> <span class="hljs-keyword">where</span> color = <span class="hljs-string">'red'</span>; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">SERIALIZABLE</span> ; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">update</span> marbles <span class="hljs-keyword">set</span> color = <span class="hljs-string">'blue'</span> <span class="hljs-keyword">where</span> color = <span class="hljs-string">'yellow'</span>; <span class="hljs-comment">-- T2. Blocks on T1 intents.</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">SERIALIZABLE</span> ; <span class="hljs-comment">-- T3</span>
<span class="hljs-keyword">update</span> marbles <span class="hljs-keyword">set</span> color = <span class="hljs-string">'red'</span> <span class="hljs-keyword">where</span> color = <span class="hljs-string">'blue'</span>; <span class="hljs-comment">-- T3. Blocks.</span>

<span class="hljs-keyword">commit</span>; <span class="hljs-comment">--T1</span>
(T2 unblocks - rows affected 6)
(T3 unblocks - rows affected 3)
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T3</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T2. ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict</span>
</code></pre>
<p>The correct outcome (only yellow and red):</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> marbles;
+<span class="hljs-comment">----+--------+</span>
| id | color  |
+<span class="hljs-comment">----+--------+</span>
|  1 | yellow |
|  2 | yellow |
|  3 | yellow |
|  4 | yellow |
|  5 | yellow |
|  6 | yellow |
|  7 | red    |
|  8 | red    |
|  9 | red    |
+<span class="hljs-comment">----+--------+</span>
(9 rows)
</code></pre>
<h2 id="heading-intersecting-data">Intersecting Data</h2>
<p>Two concurrent transactions read data, and each uses it to update the range read by the other.</p>
<p>Setup schema:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> tab (
  <span class="hljs-keyword">id</span> <span class="hljs-built_in">bigint</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
  <span class="hljs-keyword">value</span> <span class="hljs-built_in">bigint</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
);

<span class="hljs-keyword">delete</span> <span class="hljs-keyword">from</span> tab <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> tab <span class="hljs-keyword">VALUES</span>
(<span class="hljs-number">1</span>, <span class="hljs-number">10</span>), (<span class="hljs-number">1</span>, <span class="hljs-number">20</span>), (<span class="hljs-number">2</span>, <span class="hljs-number">100</span>), (<span class="hljs-number">2</span>, <span class="hljs-number">200</span>);
</code></pre>
<p>Observe CockroachDB guarantees a serial execution:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">SUM</span>(<span class="hljs-keyword">value</span>) <span class="hljs-keyword">FROM</span> tab <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">1</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> tab <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">2</span>, <span class="hljs-number">30</span>); <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">SUM</span>(<span class="hljs-keyword">value</span>) <span class="hljs-keyword">FROM</span> tab <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">2</span>; <span class="hljs-comment">-- T2 (blocks)</span>

<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T1 (unblocks T2)</span>

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> tab <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">1</span>, <span class="hljs-number">330</span>); <span class="hljs-comment">-- T2</span>

<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T2</span>

<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">from</span> tab;
</code></pre>
<p>Yields:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">from</span> tab;
  id | value
<span class="hljs-comment">-----+--------</span>
   1 |    10
   1 |    20
   2 |   100
   2 |   200
   2 |    30
   1 |   330
(6 rows)
</code></pre>
<h2 id="heading-overdraft-protection">Overdraft Protection</h2>
<p>Here we will protect the invariant that the total of all accounts must exceed the amount requested.</p>
<p>Schema setup:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> <span class="hljs-keyword">account</span>
  (
    <span class="hljs-keyword">name</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">25</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    <span class="hljs-keyword">type</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">25</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    balance <span class="hljs-built_in">NUMERIC</span>(<span class="hljs-number">19</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,

    primary <span class="hljs-keyword">key</span> (<span class="hljs-keyword">name</span>, <span class="hljs-keyword">type</span>)
  );

<span class="hljs-keyword">delete</span> <span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">values</span>
  (<span class="hljs-string">'alice'</span>,<span class="hljs-string">'saving'</span>, <span class="hljs-number">500</span>),
  (<span class="hljs-string">'alice'</span>,<span class="hljs-string">'checking'</span>, <span class="hljs-number">500</span>);
</code></pre>
<p>Let's try to play the bank under serializable isolation:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">select</span> <span class="hljs-keyword">type</span>, balance <span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">where</span> <span class="hljs-keyword">name</span> = <span class="hljs-string">'alice'</span>; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">select</span> <span class="hljs-keyword">type</span>, balance <span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">where</span> <span class="hljs-keyword">name</span> = <span class="hljs-string">'alice'</span>; <span class="hljs-comment">-- T2</span>

<span class="hljs-keyword">update</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">set</span> balance = balance - <span class="hljs-number">900.00</span> <span class="hljs-keyword">where</span> <span class="hljs-keyword">name</span> = <span class="hljs-string">'alice'</span> <span class="hljs-keyword">and</span> <span class="hljs-keyword">type</span> = <span class="hljs-string">'saving'</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">update</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">set</span> balance = balance - <span class="hljs-number">900.00</span> <span class="hljs-keyword">where</span> <span class="hljs-keyword">name</span> = <span class="hljs-string">'alice'</span> <span class="hljs-keyword">and</span> <span class="hljs-keyword">type</span> = <span class="hljs-string">'checking'</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T2 ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict:</span>
</code></pre>
<p>Yields the following where the invariant holds:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> <span class="hljs-keyword">account</span>;
  name  |   type   | balance
<span class="hljs-comment">--------+----------+----------</span>
  alice | checking |  500.00
  alice | saving   | -400.00
(2 rows)
</code></pre>
<h2 id="heading-deposit-report">Deposit Report</h2>
<p>Setup schema (before every test run):</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> control
  (
    deposit_no <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
  );
<span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> receipt
  (
    receipt_no <span class="hljs-built_in">bigint</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span> PRIMARY <span class="hljs-keyword">KEY</span> <span class="hljs-keyword">DEFAULT</span> unique_rowid(),
    deposit_no <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    payee <span class="hljs-built_in">text</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    amount <span class="hljs-built_in">numeric</span>(<span class="hljs-number">19</span>,<span class="hljs-number">2</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
  );

<span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">from</span> control <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;
<span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">from</span> receipt <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> control <span class="hljs-keyword">values</span> (<span class="hljs-number">1</span>);

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> receipt
  (deposit_no, payee, amount)
  <span class="hljs-keyword">values</span> ((<span class="hljs-keyword">select</span> deposit_no <span class="hljs-keyword">from</span> control), <span class="hljs-string">'Crosby'</span>, <span class="hljs-number">100.00</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> receipt
  (deposit_no, payee, amount)
  <span class="hljs-keyword">values</span> ((<span class="hljs-keyword">select</span> deposit_no <span class="hljs-keyword">from</span> control), <span class="hljs-string">'Stills'</span>, <span class="hljs-number">200.00</span>);
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> receipt
  (deposit_no, payee, amount)
  <span class="hljs-keyword">values</span> ((<span class="hljs-keyword">select</span> deposit_no <span class="hljs-keyword">from</span> control), <span class="hljs-string">'Nash'</span>, <span class="hljs-number">300.00</span>);
</code></pre>
<p>Test sequence:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> receipt (deposit_no, payee, amount) <span class="hljs-keyword">values</span> ( (<span class="hljs-keyword">select</span> deposit_no <span class="hljs-keyword">from</span> control), <span class="hljs-string">'Young'</span>, <span class="hljs-number">100.00</span> ); <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> receipt; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T2   </span>
<span class="hljs-keyword">select</span> deposit_no <span class="hljs-keyword">from</span> control; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">update</span> control <span class="hljs-keyword">set</span> deposit_no = <span class="hljs-number">2</span> <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T2</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T3   </span>
<span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> receipt <span class="hljs-keyword">where</span> deposit_no = <span class="hljs-number">1</span>; <span class="hljs-comment">-- T3. Blocks on T1</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T1</span>
(T3 unblocks)
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T3</span>
</code></pre>
<p>Yields:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> receipt;
      receipt_no     | deposit_no | payee  | amount
<span class="hljs-comment">---------------------+------------+--------+---------</span>
  860561294810873858 |          1 | Crosby | 100.00
  860561295115354114 |          1 | Stills | 200.00
  860561295382970370 |          1 | Nash   | 300.00
  860561358736326657 |          1 | Young  | 100.00
(4 rows)
</code></pre>
<h2 id="heading-rollover">Rollover</h2>
<p>This example was created to show that PostgreSQL can roll back read-only transactions to prevent serialization conflicts. It won't happen in CockroachDB.</p>
<p>Schema setup:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> rollover (
  <span class="hljs-keyword">id</span> <span class="hljs-built_in">int</span> primary <span class="hljs-keyword">key</span>, 
  n <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
  );
<span class="hljs-keyword">delete</span> <span class="hljs-keyword">from</span> rollover <span class="hljs-keyword">where</span> <span class="hljs-number">1</span>=<span class="hljs-number">1</span>;
<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> rollover <span class="hljs-keyword">values</span> (<span class="hljs-number">1</span>,<span class="hljs-number">100</span>), (<span class="hljs-number">2</span>,<span class="hljs-number">10</span>);
</code></pre>
<p>Financial transaction under serializable isolation:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">update</span> rollover
  <span class="hljs-keyword">set</span> n = n + (<span class="hljs-keyword">select</span> n <span class="hljs-keyword">from</span> rollover <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">2</span>)
  <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">1</span>; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">serializable</span> ; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">update</span> rollover <span class="hljs-keyword">set</span> n = n + <span class="hljs-number">1</span> <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">2</span>; <span class="hljs-comment">-- T2 - blocks on T1</span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">--T2 </span>

<span class="hljs-keyword">begin</span>; <span class="hljs-keyword">set</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">isolation</span> <span class="hljs-keyword">level</span> <span class="hljs-keyword">snapshot</span> ; <span class="hljs-comment">-- T3</span>
<span class="hljs-keyword">select</span> <span class="hljs-keyword">count</span>(*) <span class="hljs-keyword">from</span> rollover; <span class="hljs-comment">-- T3 - blocks on T1         </span>
<span class="hljs-keyword">commit</span>; <span class="hljs-comment">-- T1</span>

<span class="hljs-keyword">select</span> n <span class="hljs-keyword">from</span> rollover <span class="hljs-keyword">where</span> <span class="hljs-keyword">id</span> <span class="hljs-keyword">in</span> (<span class="hljs-number">1</span>,<span class="hljs-number">2</span>); <span class="hljs-comment">-- T3</span>
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article showcased a few examples outlined in the PostgreSQL <a target="_blank" href="https://wiki.postgresql.org/wiki/SSI#Deposit_Report">SSI</a> description page. It highlights some runtime differences between PostgreSQL SSI and CockroachDB serializable.</p>
]]></content:encoded></item><item><title><![CDATA[A Basic Guide to Transaction Isolation]]></title><description><![CDATA[ACID transactions are implemented differently in databases and provide different runtime characteristics towards applications. It's mainly manifested in terms of when different operations are blocked from proceeding or a transaction is forced to retr...]]></description><link>https://blog.cloudneutral.se/a-basic-guide-to-transaction-isolation</link><guid isPermaLink="true">https://blog.cloudneutral.se/a-basic-guide-to-transaction-isolation</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Sun, 30 Apr 2023 09:54:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/yJBMbQgw9mg/upload/697055beef93adce6dad0a5b0d0dbd67.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>ACID transactions are implemented differently in databases and provide different runtime characteristics towards applications. It's mainly manifested in terms of when different operations are blocked from proceeding or a transaction is forced to retry. That is, if the isolation level is indeed serializable, which is not always the case. Not all databases provide true ACID guarantees and that presents a problem if you are dependent on it.</p>
<p>The "I" part in <a target="_blank" href="https://en.wikipedia.org/wiki/ACID">ACID</a> stands for serializable isolation, which means that a database that formally claims to support ACID needs to provide the highest isolation standard in SQL - serializable. Serializable isolation guarantees that even though transactions may execute in parallel, the result is the same as if they had executed one at a time, without any concurrency.</p>
<p>Executing transactions serially would lead to the same result, but it would also destroy any performance aspirations. Concurrent execution is a must-have. One way to look at it is that it's a magic show hosted by the database, giving the illusion to clients they are the exclusive users of the database, completely free from interference from others.</p>
<p><a target="_blank" href="https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/"><img src="https://blog.acolyer.org/wp-content/uploads/2016/02/isolation-levels-table.png" alt /></a></p>
<p>(image from: <a target="_blank" href="https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/">https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/</a>)</p>
<p>Isolation levels are however confusing and ambiguous, in particular for distributed databases where you don't have a single time source. Not only are isolation levels difficult to understand but can also mean different things. Serializable in Oracle, for example, actually means Snapshot (which is weaker) and Repeatable Read in PostgreSQL means snapshot (which is stronger). Snapshot also permits write skew (A5B), which Repeatable Reads does not. Then we have Oracle Read Consistency, which is like Read Committed, only stronger by advancing the transaction timestamp for each SQL statement.</p>
<p>This ambiguity presents a real challenge for application developers and architects. They are tasked to figure out when a given isolation level is sufficient for correct execution. It also makes it more difficult to think in terms of portability between databases when the behaviour is different. One piece of advice is that unless you are 100% sure of what anomalies business rule invariants are exposed to, then go for a higher level of isolation.</p>
<p>Related Resources:</p>
<ul>
<li><p><a target="_blank" href="http://www.bailis.org/blog/understanding-weak-isolation-is-a-serious-problem/">http://www.bailis.org/blog/understanding-weak-isolation-is-a-serious-problem/</a></p>
</li>
<li><p><a target="_blank" href="http://www.bailis.org/blog/when-is-acid-acid-rarely/">http://www.bailis.org/blog/when-is-acid-acid-rarely/</a></p>
</li>
<li><p><a target="_blank" href="http://martin.kleppmann.com/2014/11/25/hermitage-testing-the-i-in-acid.html">http://martin.kleppmann.com/2014/11/25/hermitage-testing-the-i-in-acid.html</a></p>
</li>
<li><p><a target="_blank" href="https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/">https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/</a></p>
</li>
</ul>
<p>The goal of transaction isolation is to find a good balance between safety and performance for concurrent transactions. A database should allow concurrent access to data while still being safe, meaning that concurrent operations that happen to interleave, should not observe intermediate state, overwrite other transaction writes or violate invariants guarded by constraints. It's the database being liberal and conservative at the same time.</p>
<p>A higher isolation level reduces and even eliminates most known read/write conflict anomalies, at the expense of performance and rollbacks on contended operations. Performance is increased and transient errors are reduced by lowering the isolation level, effectively requiring less coordination and planning effort by the database to guarantee safe, concurrent execution. It depends on the database implementation though, and in some cases, the difference in performance is small for non-contending operations.</p>
<p>The main downside of lowering isolation is that applications become more exposed to read-write phenomena (anomalies) that may cause data loss or corruption in the worst case. These types of errors are quite difficult to track down and test for.</p>
<h2 id="heading-read-and-write-anomalies">Read and Write Anomalies</h2>
<p>The lowest isolation level is Read Uncommitted (RU) meaning basically that all (most) bets are off. It allows dirty reads (P1) where transaction T1 is allowed to read transaction T2:s writes that haven't been committed yet. Read Uncommitted must prohibit dirty writes (P0) though, where T1 would modify T2:s write before it has committed.</p>
<p>The highest ACID isolation level is serializability which means transactions are not exposed to any read/write anomalies. A client can safely read and write without having to worry about other transactions possibly performing the same operations. The database will guarantee that no client will ever observe any inconsistent state and that all invariants will be preserved at commit.</p>
<p>In between you have all the rest. Anomalies can either be permitted or prevented by using ANSI SQL isolation levels, or something even higher like strict serializability or linearizability (external consistency).</p>
<p>Common anomalies include:</p>
<ul>
<li><p>Dirty write (P0)</p>
</li>
<li><p>Dirty read (P1)</p>
</li>
<li><p>Fuzzy read (P2)</p>
</li>
<li><p>Phantom (P3)</p>
</li>
<li><p>Strict Phantom (A3)</p>
</li>
<li><p>Lost update (P4)</p>
</li>
<li><p>Cursor lost update (P4C)</p>
</li>
<li><p>Read Skew (A5A)</p>
</li>
<li><p>Write Skew (A5B)</p>
</li>
</ul>
<p>Surprisingly enough, the default isolation level in most modern databases is read committed (RC). It is a fundamentally unsafe isolation level exposed to lost updates (P4) and more. Still, many applications are using it and seem to work fine most of the time.</p>
<p>But how can you be sure you will not be the next Bitcoin exchange or e-commerce site that gets <a target="_blank" href="http://www.bailis.org/papers/acidrain-sigmod2017.pdf">exploited</a> by weak isolation? Trying to navigate through these things is not far from trying to beat classic Minesweeper.</p>
<p><img src="https://minesweeper.online/img/homepage/expert.png" alt="Minesweeper Online" /></p>
<h2 id="heading-isolation-levels-in-databases">Isolation Levels in Databases</h2>
<p>Modern lock-free <a target="_blank" href="https://en.wikipedia.org/wiki/Multiversion_concurrency_control">MVCC</a> databases (and others) like Oracle and PostgreSQL default to Read Committed (RC). As MVCC databases, they also support <a target="_blank" href="https://en.wikipedia.org/wiki/Snapshot_isolation">snapshot isolation</a> (SI) which is a slightly weaker model than serializable.</p>
<p>SI does not use locking, which is sort of the point with MVCC, but instead every transaction operates on an isolated snapshot of committed data whose values are not visible to other transactions unless the transaction commits.</p>
<p>SI sorts in somewhere between read committed and serializable (Berenson and Adya). It prevents P4 (lost update) by applying a first committer wins policy and like Repeatable Read (RR) it prohibits P0, P1 and P2. It prevents a special version of P3 called A3 (Phantom) that RR allows, but allows A5B (write skew) that RR prevents. Write skew is when two concurrent transactions are writing based on reading a data set which overlaps what the other is writing.</p>
<p>PostgreSQL (since 9.1) implements serializable isolation on top of SI, called serializable snapshot isolation or <a target="_blank" href="https://wiki.postgresql.org/wiki/SSI">SSI</a>. It prevents A5B (write skew) by forcing conflicts through either promoting reads to writes or by analyzing dependency cycles in transactions.</p>
<p>If you are not already confused at this point, then congratulations. These conditions and more are outlined in far more detail in the <a target="_blank" href="https://arxiv.org/ftp/cs/papers/0701/0701157.pdf">A Critique of ANSI SQL Isolation Levels</a> paper.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682847982319/561b5cbd-3d3d-4ca9-8fa6-f8893e85bdbb.png" alt class="image--center mx-auto" /></p>
<p>Cockroachdb only implements <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/demo-serializable.html">serializable isolation</a>, which narrows down the options. It gives peace of mind if you are concerned about read/write anomalies.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>ACID transaction isolation levels are ambiguous and tricky to grok. Most modern databases implement transaction isolation differently and often have weak defaults where applications need to opt-in for higher isolation. In CockroachDB, the only choice is serializable which is the highest level in the SQL standard.</p>
]]></content:encoded></item><item><title><![CDATA[Entity Control Boundary in Spring Boot Apps]]></title><description><![CDATA[Introduction
In a previous article, we looked at an architectural pattern named entity-control-boundary (ECB) mapped to Spring meta-annotations for transaction management and retries. That post didn't go very deep into this architectural pattern, whi...]]></description><link>https://blog.cloudneutral.se/entity-control-boundary-in-spring-boot-apps</link><guid isPermaLink="true">https://blog.cloudneutral.se/entity-control-boundary-in-spring-boot-apps</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[retries]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Sun, 30 Apr 2023 09:18:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/LNqRdBdv7ow/upload/0ae2bb26e3ca25b6fad02314a3e14120.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>In a previous <a target="_blank" href="https://blog.cloudneutral.se/spring-annotations-for-cockroachdb#heading-transaction-retries">article</a>, we looked at an architectural pattern named entity-control-boundary (ECB) mapped to Spring meta-annotations for transaction management and retries. That post didn't go very deep into this architectural pattern, which is the purpose of this article.</p>
<p><a target="_blank" href="https://en.wikipedia.org/wiki/Entity-control-boundary">ECB</a> is an architecture pattern originally coined in <strong>Ivar Jacobson's</strong> use-case-driven object-oriented software engineering (OOSE) method published in 1992. In other words, it dates way back in time yet it's not super well-known.</p>
<p>This pattern fits really well into organising transaction boundaries in application code and you don't need to go all-in on all the fun stuff like UML, waterfall or unified process to use it. It's really straightforward and mainly serves a documentative and declarative purpose.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682844548278/6866399c-e739-4e1c-bcb6-e83e4ce21bff.jpeg" alt class="image--center mx-auto" /></p>
<p>The ECB pattern is centred around defining clear responsibilities and interactions between different categories of classes. It can be broken down into four elements of a robustness diagram: <strong>Actor</strong>, <strong>Boundary</strong>, <strong>Control</strong> and <strong>Entity</strong>.</p>
<blockquote>
<p>The following robustness constraints apply:</p>
<ul>
<li><p><strong>Actors</strong> may only know and communicate with boundaries.</p>
</li>
<li><p><strong>Boundaries</strong> may communicate with actors and controls only.</p>
</li>
<li><p><strong>Controls</strong> may know and communicate with boundaries and entities, and if needed other controls.</p>
</li>
<li><p><strong>Entities</strong> may only know about other entities but could communicate also with controls.</p>
</li>
</ul>
</blockquote>
<p><em>Source:</em> <a target="_blank" href="https://en.wikipedia.org/wiki/Entity-control-boundary"><em>https://en.wikipedia.org/wiki/Entity-control-boundary</em></a></p>
<p>In other words, there could be dependencies and interactions like this in a single service:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682845090127/a9ebdc64-5594-442b-8849-cd33f4380433.jpeg" alt class="image--center mx-auto" /></p>
<p>ECB offers structure and low-effort consistency to transaction boundary demarcating, simply by using transaction attributes or preferably dedicated meta- or stereotype annotations. Without this level of structure, the chances are that the boundaries become unclear and blurry which may result in hard-to-find errors and system inconsistencies.</p>
<h1 id="heading-definitions">Definitions</h1>
<p>Let's map the ECB concept to concrete architectural elements (namespaces and annotations) that you typically see in a Spring Boot application.</p>
<h2 id="heading-boundary">Boundary</h2>
<p>A <strong>boundary</strong> is coarse-grained and exposes functionality towards users or other systems (actors). It is typically implemented as a web controller or business service facade. It should be thin and delegate business processing to more fine-grained control services, if applicable. It acts both as a remoting and transaction boundary.</p>
<p>A boundary should never be invoked from within a transaction context. It means that only a boundary is allowed to create new transactions. To that end, boundaries must have the <code>REQUIRES_NEW</code> transaction attribute on their public, transactional methods. This propagation attribute will always create a new transaction and suspend any existing one.</p>
<p>Transaction suspension via <code>REQUIRES_NEW</code> and nested transactions via <code>NESTED</code> are different things. Nested transactions allow for a rollback to the beginning of the sub-transaction while keeping the transactional state of the outer transaction. Nested transactions are expressed using <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/savepoint.html">savepoints</a>. Unfortunately, savepoints are not fully supported in all Java application frameworks but it's available in Spring Boot, although not in JPA. If you are using something else than JPA then savepoints opens up a few more opportunities. For the transaction boundary type discussed here, however, we don't use nested transactions (via savepoints) but just regular unnested local transactions.</p>
<h3 id="heading-characteristics">Characteristics</h3>
<p>Key characteristics for a transaction and remoting boundary:</p>
<ul>
<li><p>Independent of other service facades or web controllers.</p>
</li>
<li><p>Granularity is more coarse-grained than a service.</p>
</li>
<li><p>The layer that exposes functionality outside of the business tier.</p>
</li>
<li><p>The only layer that is accessible from an external client (typically via a web API).</p>
</li>
<li><p>Methods are preferably <strong>idempotent</strong> for client convenience.</p>
</li>
<li><p>Never invoked within a transaction context.</p>
</li>
</ul>
<h3 id="heading-solution">Solution</h3>
<p>Typical implementation elements in Spring Boot:</p>
<ul>
<li><p>Can be a Business Facade, Web Controller or Service Activator (Message Listener) where:</p>
<ul>
<li><p>A business facade uses <code>@Service</code></p>
</li>
<li><p>A controller uses <code>@RestController</code></p>
</li>
<li><p>A service activator uses <code>@Service</code></p>
</li>
</ul>
</li>
<li><p>Implements simple business logic or delegates to services (Control) or even repositories (Entity).</p>
</li>
<li><p>Always uses transaction demarcation <code>REQUIRES_NEW</code> since it's a boundary.</p>
<ul>
<li><code>@Transactional(propagation = Propagation.REQUIRES_NEW)</code></li>
</ul>
</li>
</ul>
<h3 id="heading-conventions">Conventions</h3>
<ul>
<li><p>Represents the remoting entry point (when it's a web <code>@Controller</code>).</p>
</li>
<li><p>An interface or class with thin, coarse-grained methods.</p>
</li>
<li><p>Should be located in a dedicated <code>boundary</code> or <code>service</code> namespace.</p>
</li>
<li><p>Should use the documentative meta-annotation to emphasise its architectural role.</p>
</li>
<li><p>The business interface should be named after business concepts.</p>
</li>
</ul>
<p>Example of a boundary meta-annotation (annotation describing or grouping other annotations). Notice that it incorporates the Spring <code>@Transactional</code> annotation with propagation <code>REQUIRES_NEW</code>.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Inherited</span>
<span class="hljs-meta">@Documented</span>
<span class="hljs-meta">@Retention(RetentionPolicy.RUNTIME)</span>
<span class="hljs-meta">@Target({ElementType.TYPE, ElementType.METHOD})</span>
<span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
<span class="hljs-keyword">public</span> <span class="hljs-meta">@interface</span> Boundary {
}
</code></pre>
<p>Boundary service facade example:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TransactionServiceImpl</span> <span class="hljs-keyword">implements</span> <span class="hljs-title">TransactionService</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> AccountRepository accountRepository;

    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> TransactionRepository transactionRepository;

    <span class="hljs-meta">@Override</span>
    <span class="hljs-meta">@Boundary</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Transaction <span class="hljs-title">submitTransferRequest</span><span class="hljs-params">(TransferRequest request)</span> </span>{
        <span class="hljs-keyword">if</span> (!TransactionSynchronizationManager.isActualTransactionActive()) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"No transaction context"</span>);
        }
    }
}
</code></pre>
<p>Boundary web controller example:</p>
<pre><code class="lang-java"><span class="hljs-meta">@RestController</span>
<span class="hljs-meta">@RequestMapping(value = "/api/transaction")</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TransactionController</span> </span>{
    <span class="hljs-meta">@GetMapping</span>
    <span class="hljs-meta">@Boundary</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> PagedModel&lt;TransactionModel&gt; <span class="hljs-title">listTransactions</span><span class="hljs-params">(<span class="hljs-meta">@PageableDefault(size = 5)</span> Pageable page)</span> </span>{
        <span class="hljs-keyword">return</span> pagedTransactionResourceAssembler
                .toModel(bankService.find(page), transactionResourceAssembler);
    }
}
</code></pre>
<p>Boundary service activator example:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">KafkaChangeFeedConsumer</span> </span>{
    <span class="hljs-meta">@KafkaListener(topics = TOPIC_ACCOUNTS, containerFactory = "accountListenerContainerFactory")</span>
    <span class="hljs-meta">@Boundary</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">accountChanged</span><span class="hljs-params">(<span class="hljs-meta">@Payload</span> AccountPayload event,
                               <span class="hljs-meta">@Header(KafkaHeaders.RECEIVED_PARTITION)</span> <span class="hljs-keyword">int</span> partition,
                               <span class="hljs-meta">@Header(KafkaHeaders.OFFSET)</span> <span class="hljs-keyword">int</span> offset)</span> </span>{
    }
}
</code></pre>
<h2 id="heading-control">Control</h2>
<p>A <strong>control</strong> service is a fine-grained realization of activities or sub-processes. It's where business functionality is implemented. It must always be invoked within the context of a transaction and is not allowed to create new transactions. To that end, it must have the <code>MANDATORY</code> transaction attribute. The same policy applies to repository interfaces or classes that perform persistence logic. A repository is not allowed to create a new transaction.</p>
<p>A control service that is just a thin delegation layer between a boundary and repository contract (like Spring Data repository) adds no real value. In that case, to reduce boilerplate code, consider accessing repository resources directly from boundaries, effectively collapsing the boundary and service into one artefact.</p>
<h3 id="heading-characteristics-1">Characteristics</h3>
<ul>
<li><p>Services should be independent of other services.</p>
</li>
<li><p>The granularity is finer than a boundary.</p>
</li>
<li><p>Services are not available or visible outside of the business tier.</p>
</li>
<li><p>Methods should be idempotent and always be invoked from a transactional context.</p>
</li>
</ul>
<h3 id="heading-solution-1">Solution</h3>
<ul>
<li><p>Can be a business service that implements business logic.</p>
</li>
<li><p>Not allowed to start new transactions.</p>
</li>
<li><p>Use <code>MANDATORY</code> transaction propagation attribute.</p>
<ul>
<li><code>@Transactional(propagation = Propagation.MANDATORY)</code></li>
</ul>
</li>
</ul>
<h3 id="heading-conventions-1">Conventions</h3>
<ul>
<li><p>Interface or class with fine-grained methods and PDOs.</p>
</li>
<li><p>Should be located in a dedicated <code>..service</code> package.</p>
</li>
<li><p>Should use a documentative meta-annotation to emphasise its architectural role.</p>
</li>
<li><p>The business interface should be named after business concepts.</p>
</li>
</ul>
<p>Example of a control service meta-annotation. Notice that it incorporates the Spring <code>@Transactional</code> annotation with propagation <code>MANDATORY</code>.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Inherited</span>
<span class="hljs-meta">@Documented</span>
<span class="hljs-meta">@Retention(RetentionPolicy.RUNTIME)</span>
<span class="hljs-meta">@Target({ElementType.TYPE, ElementType.METHOD})</span>
<span class="hljs-meta">@Transactional(propagation = Propagation.MANDATORY)</span>
<span class="hljs-keyword">public</span> <span class="hljs-meta">@interface</span> Control {
}
</code></pre>
<p>A control service example:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">DefaultTransactionService</span> <span class="hljs-keyword">implements</span> <span class="hljs-title">TransactionService</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> AccountRepository accountRepository;

    <span class="hljs-meta">@Override</span>
    <span class="hljs-meta">@Control</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Transaction <span class="hljs-title">createTransaction</span><span class="hljs-params">(UUID id, TransactionForm transactionForm)</span> </span>{   Assert.isTrue(TransactionSynchronizationManager.isActualTransactionActive(), <span class="hljs-string">"Expected transaction"</span>);
    }
}
</code></pre>
<h2 id="heading-entity">Entity</h2>
<p>Entities are a static model representation of the application state mapped against a database. Usually through some ORM technology such as JPA and Hibernate. Entities must never be visible outside the system boundaries or JVM, but can optionally have DTO or value object/model representations. In most cases, DTOs add little value to hide implementation detail and protect internal entities. One exception could be <a target="_blank" href="https://docs.spring.io/spring-hateoas/docs/current/reference/html/#migrate-to-1.0.changes.representation-models">representation models</a> in Spring HATEOAS that add hypermedia controls on top of domain entities (you can use <code>EntityModel</code> also).</p>
<p>In terms of ECB, the entity element is simply represented by JPA entities and Spring Data repositories with the <code>@Repository</code> annotation. There's not much more to it than emphasising the architectural role.</p>
<h1 id="heading-transaction-retries">Transaction Retries</h1>
<p>Now that we are familiar with ECB, let's wrap things up by also adding the capability to retry transient SQL errors. When a SQL error with the state code <code>40001</code> encountered, it's typically safe to retry the local transaction from a database point of view. If the retried business facade method and its descendants are nonidempotent, then some precautions may be needed to avoid multiple side effects (again strive for idempotency).</p>
<p>The simplest approach is to use <a target="_blank" href="https://docs.spring.io/spring-batch/docs/current/reference/html/retry.html">Spring Retry</a> with a custom exception classifier and exponential backoff. How this is done is outlined in more detail in this <a target="_blank" href="https://blog.cloudneutral.se/spring-retry-with-cockroachdb">post</a>.</p>
<p>A brief example of a retriable boundary for completeness (notice the <code>@Retryable</code>):</p>
<pre><code class="lang-sql">@Service
public class OrderService {
    @Boundary
    @Retryable
    public Order updateOrderStatus(Long orderId, ShipmentStatus status, BigDecimal amount) {
        // <span class="hljs-keyword">Call</span> DB <span class="hljs-keyword">and</span> maybe <span class="hljs-keyword">do</span> other idempotent <span class="hljs-keyword">stuff</span>
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">order</span>;
    }
}
</code></pre>
<p>You can also push the <code>@Retryable</code> annotation to <code>@Boundary</code> which then automatically adds the retry capability to all annotated methods.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Inherited</span>
<span class="hljs-meta">@Documented</span>
<span class="hljs-meta">@Retention(RetentionPolicy.RUNTIME)</span>
<span class="hljs-meta">@Target({ElementType.TYPE, ElementType.METHOD})</span>
<span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
<span class="hljs-meta">@Retryable(exceptionExpression = "@cockroachExceptionClassifier.shouldRetry(#root)", maxAttempts = 5, backoff = @Backoff(maxDelay = 15_000, multiplier = 1.5))</span>
<span class="hljs-keyword">public</span> <span class="hljs-meta">@interface</span> Boundary {
}
</code></pre>
<p>The exception classifier for completeness:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Component</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CockroachExceptionClassifier</span> </span>{
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> Logger logger = LoggerFactory.getLogger(getClass());

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">final</span> String SERIALIZATION_FAILURE = <span class="hljs-string">"40001"</span>;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">boolean</span> <span class="hljs-title">shouldRetry</span><span class="hljs-params">(Throwable ex)</span> </span>{
        <span class="hljs-keyword">if</span> (ex == <span class="hljs-keyword">null</span>) {
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
        }
        Throwable throwable = NestedExceptionUtils.getMostSpecificCause(ex);
        <span class="hljs-keyword">if</span> (throwable <span class="hljs-keyword">instanceof</span> SQLException) {
            <span class="hljs-keyword">return</span> shouldRetry((SQLException) throwable);
        }
        logger.warn(<span class="hljs-string">"Non-transient exception {}"</span>, ex.getClass());
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">boolean</span> <span class="hljs-title">shouldRetry</span><span class="hljs-params">(SQLException ex)</span> </span>{
        <span class="hljs-keyword">if</span> (SERIALIZATION_FAILURE.equals(ex.getSQLState())) {
            logger.warn(<span class="hljs-string">"Transient SQL exception detected : sql state [{}], message [{}]"</span>,
                    ex.getSQLState(), ex.toString());
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">true</span>;
        }
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
    }
}
</code></pre>
<h1 id="heading-general-guidelines">General Guidelines</h1>
<p>A few general guidelines for transaction management.</p>
<h2 id="heading-avoid-remote-calls">Avoid Remote Calls</h2>
<p>Avoid remote calls to external resources from within a database transaction context. You may end up locking up resources for a long time in case of network communication problems or issues with the target endpoint. You are also exposed to the challenge of dual writes, where one part succeeds and the other part fails leaving the system in an inconsistent state (typically addressed with the outbox pattern).</p>
<h2 id="heading-read-only-implicit-transactions">Read-Only Implicit Transactions</h2>
<p>If you are not performing any writes, then consider using read-only, implicit transactions. The <code>readOnly</code> attribute in <code>@Transactional</code> gives a clue to the transaction management that it's a read-only operation. The JPA provider may then perform certain optimizations.</p>
<p>Non-transactional read-only (implicit transactions) methods can use <code>SUPPORTS</code> propagation. This works as long as the default <code>autoCommit</code> flag is not enabled in the data source.</p>
<pre><code class="lang-java">HikariDataSource ds = properties
        .initializeDataSourceBuilder()
        .type(HikariDataSource.class)
        .build();
ds.setAutoCommit(<span class="hljs-keyword">false</span>); <span class="hljs-comment">// false is the default, setting it to true makes all transactions explicit</span>
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article describes the ECB architecture pattern to enhance transaction robustness in Spring Boot apps. Database transactions must always be started by boundaries and nowhere else. A boundary is typically a web controller or business service facade.</p>
<ul>
<li><p>Boundaries use <code>REQUIRES_NEW</code> propagation.</p>
</li>
<li><p>Control services and repositories use <code>MANDATORY</code> propagation.</p>
</li>
<li><p>Non-transactional read-only methods can use <code>SUPPORTS</code> propagation.</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Working with BLOBs in CockroachDB]]></title><description><![CDATA[Introduction
Databases aren't great for storing binary large objects, aka BLOBs. By large meaning several MBs of size. If that is needed, then it's likely much more performant to just use the filesystem or cloud storage and only store references in t...]]></description><link>https://blog.cloudneutral.se/working-with-blobs-in-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/working-with-blobs-in-cockroachdb</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[jpa]]></category><category><![CDATA[hibernate]]></category><category><![CDATA[Springboot]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Thu, 27 Apr 2023 11:51:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/u3pRViUI2oU/upload/3340bc1e70d599305faac4570dc168f2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Databases aren't great for storing binary large objects, aka BLOBs. By <em>large</em> meaning several MBs of size. If that is needed, then it's likely much more performant to just use the filesystem or cloud storage and only store references in the database for structure.</p>
<p>Smaller objects are typically fine to store in the database. To that end, this article will demonstrate how to manage BLOBs using JPA and Hibernate along with CockroachDB. In CockroachDB, the BLOB type is an alias for the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/bytes.html">BYTES</a> data type. As mentioned in the docs, it's recommended to keep values under 1 MB to ensure adequate performance. Above that threshold, <a target="_blank" href="https://www.cockroachlabs.com/docs/v22.2/architecture/storage-layer#write-amplification">write amplification</a> and other considerations may cause significant performance degradation.</p>
<h1 id="heading-mapping-blobs-in-jpa">Mapping BLOBs in JPA</h1>
<p>When using Hibernate, you typically use the <a target="_blank" href="https://jakarta.ee/specifications/persistence/2.2/apidocs/javax/persistence/lob">@Lob</a> annotation and <a target="_blank" href="https://docs.oracle.com/javase/7/docs/api/java/sql/Blob.html">java.sql.Blob</a> which maps to the SQL BLOB data type.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Entity</span>
<span class="hljs-meta">@Table(name = "attachment")</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Attachment</span> </span>{   
    <span class="hljs-meta">@Column(name = "content")</span>
    <span class="hljs-meta">@Basic(fetch = FetchType.LAZY)</span>
    <span class="hljs-meta">@Lob</span>
    <span class="hljs-keyword">private</span> Blob content;
    ...
}
</code></pre>
<p>You can also use a <code>byte[]</code> array or a String, but it's generally more performant to use a streaming approach using the Blob type.</p>
<h1 id="heading-using-blob-mappings">Using BLOB Mappings</h1>
<p>You need to use Hibernate’s <em>BlobProxy</em> class to create a <em>Blob</em>. As you can see in the example below, it's pretty straightforward:</p>
<pre><code class="lang-java"><span class="hljs-comment">// Stores a BLOB represented by the inputStream</span>
Blob content = BlobProxy.generateProxy(inputStream, contentLength);

Attachment attachment = <span class="hljs-keyword">new</span> Attachment();
attachment.setContent(content);
attachment.setName(name);
attachment.setDescription(description);

attachmentRepository.save(attachment);
</code></pre>
<p>That's about it, very simple.</p>
<p>To read the BLOB back again, it recommended to use the streaming approach:</p>
<pre><code class="lang-java"><span class="hljs-comment">// Lookup attachment by ID and stream the blob to the outputStream</span>
Attachment attachment = attachmentRepository.getReferenceById(id);
<span class="hljs-keyword">try</span> (InputStream in = <span class="hljs-keyword">new</span> BufferedInputStream(
        attachment.getContent().getBinaryStream())) {
    FileCopyUtils.copy(in, outputStream);
} <span class="hljs-keyword">catch</span> (SQLException | IOException e) {
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> DataRetrievalFailureException(<span class="hljs-string">"Error reading attachment data"</span>, e);
}
</code></pre>
<p>If you would provide a REST endpoint for downloading BLOB attachments, then it could look like this when implemented using Spring Boot:</p>
<pre><code class="lang-java"><span class="hljs-meta">@GetMapping("/download/{id}")</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> ResponseEntity&lt;StreamingResponseBody&gt; <span class="hljs-title">downloadAttachment</span><span class="hljs-params">(<span class="hljs-meta">@PathVariable("id")</span> Long id)</span> </span>{
    Attachment attachment = attachmentService.findById(id);
    StreamingResponseBody responseBody =
            outputStream -&gt; attachmentService.streamAttachment(attachment, outputStream);
    <span class="hljs-keyword">return</span> ResponseEntity.ok()
            .header(HttpHeaders.CONTENT_TYPE, attachment.getContentType())
            .header(HttpHeaders.CONTENT_DISPOSITION, <span class="hljs-string">"inline"</span>)
            .header(<span class="hljs-string">"Cache-Control"</span>, <span class="hljs-string">"no-cache, no-store, must-revalidate"</span>)
            .header(<span class="hljs-string">"Pragma"</span>, <span class="hljs-string">"no-cache"</span>)
            .header(<span class="hljs-string">"Expires"</span>, <span class="hljs-string">"0"</span>)
            .body(responseBody);
}
</code></pre>
<p><a target="_blank" href="https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/servlet/mvc/method/annotation/StreamingResponseBody.html">StreamingResponseBody</a> is used for asynchronous request processing where the application can write directly to the response OutputStream.</p>
<h1 id="heading-demo-project">Demo Project</h1>
<p>This <a target="_blank" href="https://github.com/kai-niemi/roach-spring-boot-v3/tree/main/spring-boot-blob">demo</a> project is a runnable Spring Boot application that provides a REST API for querying and uploading attachments with BLOB content.</p>
<p>To build:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> git@github.com:kai-niemi/roach-spring-boot-v3
<span class="hljs-built_in">cd</span> roach-spring-boot-v3
chmod +x mvnw
./mvnw clean install
</code></pre>
<p>To run:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database spring_boot_demo"</span>

java -jar spring-boot-blob/target/spring-boot-blob.jar
</code></pre>
<p>Check that the service is up at <code>http://localhost:8090</code>.</p>
<p>Upload an image file using cURL:</p>
<pre><code class="lang-bash">curl http://localhost:8090/attachment/form \
-H <span class="hljs-string">"Content-Type: multipart/form-data"</span> \
-v \
-F <span class="hljs-string">"content=@spring-boot-blob/src/test/resources/test.jpg"</span> \
-F <span class="hljs-string">"fileName=test.jpg"</span> \
-F <span class="hljs-string">"description=test.jpg"</span>
</code></pre>
<h2 id="heading-source-code">Source Code</h2>
<p>The code for this article is available on <a target="_blank" href="https://github.com/kai-niemi/roach-spring-boot-v3/tree/main/spring-boot-blob">GitHub</a>.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article provides a guide on how to manage binary large objects (BLOBs) using JPA and Hibernate with CockroachDB. It demonstrates the @Lob annotation, java.sql.Blob type, and a runnable Spring Boot example.</p>
]]></content:encoded></item><item><title><![CDATA[Defining Quality Attributes]]></title><description><![CDATA[Introduction
Quality attributes are synonymous with non-functional requirements, as in the properties and characteristics of a software system that is not directly related to some functional aspect. Quality attributes are what ultimately define a sys...]]></description><link>https://blog.cloudneutral.se/defining-quality-attributes</link><guid isPermaLink="true">https://blog.cloudneutral.se/defining-quality-attributes</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Thu, 27 Apr 2023 07:37:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/T6fDN60bMWY/upload/591069619e49da47b034b51705a8ad6a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Quality attributes are synonymous with non-functional requirements, as in the properties and characteristics of a software system that is not directly related to some functional aspect. Quality attributes are what ultimately define a system's runtime and evolutionary characteristics. These attributes indicate the software architecture style of the system and how different implementation mechanisms support these qualities, including the database.</p>
<p>In this article, we'll take a look at some of these attributes and more specifically how they can be defined, quantified and addressed when using the unique capabilities of <a target="_blank" href="https://www.cockroachlabs.com/">CockroachDB</a>.</p>
<p>Commonly used quality attributes for software systems (there's a lot more):</p>
<ul>
<li><p><strong>Scalability</strong> - A system’s ability to scale with increasing load and business complexity.</p>
</li>
<li><p><strong>Reliability</strong> - A measure of a system's ability to detect and recover from failures and deliver correct, consistent and reliable results.</p>
</li>
<li><p><strong>Performance -</strong> The unit of time it takes to execute an operation, usually measured in response time, transaction time and throughput (work per time unit).</p>
</li>
<li><p><strong>Availability</strong> - A measure of the ability of a system to function in a state of serious service or infrastructure degradation.</p>
</li>
<li><p><strong>Evolvability</strong> - A system’s ability to make changes with low cost and small client/user impact.</p>
</li>
<li><p><strong>Maintainability</strong> - A system’s ability to be diagnosed and repaired after an error occurs.</p>
</li>
<li><p><strong>Interoperability</strong> - How the system interacts with other subsystems or foreign services.</p>
</li>
<li><p><strong>Visibility</strong> - A system’s support for debugging and real-time monitoring.</p>
</li>
<li><p><strong>Security</strong> - A system’s ability to support security controls including access controls, encryption, data isolation, secure information processing and auditing for compliance.</p>
</li>
</ul>
<p>Functional attributes in a software system are useless without quality, and non-functional attributes are useless without relevant purpose and meaning for a business. It's a task for architects and developers to map the problem domain against a solution domain and deliver solutions that meet both functional and non-functional requirements. The problem domain describes the <em>what &amp; why</em> and the solution describes the <em>how</em>.</p>
<p>The re-usable artifacts are typically software design guidelines and principles that both guide the development of new software components as well as help delay software to deteriorate over time due to change. Change is a natural given for any software component and that's where the real cost sits in the lifecycle of software systems. Not so much in the initial development efforts. The architectural decisions made early on in a product's life cycle to support things like evolvability, maintainability and low cost of change are very important for the total cost of ownership.</p>
<p>There's always a balance between the amount of time invested in quality attributes against getting a new product or feature out on the market. Things need to be prioritized like anything else and the best way is to ask the business stakeholders what is most important to deliver.</p>
<h1 id="heading-qualifying-quality-attributes">Qualifying Quality Attributes</h1>
<p>Definiting clear, measurable and quantifiable requirements is an art form. Both business-oriented and of technical nature. To get good at it you need to ask questions and a lot of them.</p>
<p>How can we define, measure and communicate the importance of abstract things like <em>evolvability</em>? Let's give it a try by listing a few questions for each listed quality attribute. Because the database is a critical infrastructure component for any software system, let's also see how each quality attribute can be addressed from a database point of view. Not just any database but CockroachDB.</p>
<h2 id="heading-scalability">Scalability</h2>
<blockquote>
<p>A system’s ability to scale with increasing load and business complexity.</p>
</blockquote>
<p>Scalability describes the ability of a system to cope with increased load, most often measured in latency percentiles and throughput. When the load increases on a system, it's relevant to observe much more resources are needed to maintain the same level of performance. When a business grows in terms of increased customer demands, new markets or acquisitions, scalability also describes the ability to adapt systems to that new reality without having to undergo major refactoring or redesign efforts.</p>
<p>Questions:</p>
<ul>
<li><p>What are the data volumes and what’s the expected growth over time?</p>
</li>
<li><p>What is the impact of traffic, data or customer base growing by 10x?</p>
</li>
<li><p>What level of scalability is relevant, local, regional or global?</p>
</li>
<li><p>How important is it to deliver a consistent customer experience globally?</p>
</li>
<li><p>What does the traffic load look like for steady state, peak, extended peak and stress?</p>
</li>
<li><p>Does the system need to auto-adjust to increased/decreased spike demands?</p>
</li>
<li><p>Is data archiving to offline systems needed?</p>
</li>
</ul>
<p>Solutions:</p>
<p>CockroachDB is a geo-distributed SQL database designed to scale horizontally by adding more nodes to a cluster, increasing compute and IO capacity. Given the transactional properties and consistency guarantees, it also enables crafting <strong>multi-active</strong> systems, where its relevant to distinguish between response times and service times.</p>
<p>Service time is the time it takes to process a synchronous request entering a service's boundary and preparing a response. Response time is service time plus the time it takes to transport traffic over the network including queuing delays. For a cluster spanning for example SA-EU-AP, we can drastically reduce response times by servicing requests in the proximity of where it's stored.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682579576158/6cea033a-05fc-41ed-8103-63e69e2ec0a8.png" alt class="image--center mx-auto" /></p>
<p>Multi-active systems add the ability to control both service and response time through physical network topologies, data locality and replication policies. Depending on how nodes and replicas are arranged, both latency and fault tolerance or survival goals can also be controlled.</p>
<p>It provides the equivalence of a content-delivery network (<a target="_blank" href="https://en.wikipedia.org/wiki/Content_delivery_network">CDN</a>) for transactional data, logically spanning the entire globe. No matter which (edge) node you interact with, it will provide a consistent and accurate result.</p>
<h2 id="heading-reliability">Reliability</h2>
<blockquote>
<p>A measure of the system's ability to detect and recover from failures and deliver correct, consistent and reliable results.</p>
</blockquote>
<p>Reliability describes the ability of a system to operate correctly (do the right thing) at the desired level of performance, both under heavy concurrent workloads and in the event of infrastructure failures like partial and full zone or region failures. It’s more difficult to measure than scalability but can be defined in different ways.</p>
<p>For instance:</p>
<ul>
<li><p>Not loose or corrupt data because of infrastructure failures (no partial commits)</p>
</li>
<li><p>Not loose or corrupt data because of concurrency anomalies (ex: lost updates, phantom reads, read/write skew)</p>
</li>
<li><p>Not provide stale data where authoritative data is expected</p>
</li>
<li><p>Prevent diverging histories of state and throwing away committed writes when healing from network partitions</p>
</li>
<li><p>Not breach correctness rules or invariants (ex: negative balance in accounting)</p>
</li>
<li><p>Tolerate human mistakes and errors</p>
</li>
</ul>
<p>Questions:</p>
<ul>
<li><p>What is the anatomy of a typical business transaction?</p>
<ul>
<li><p>How does it get triggered?</p>
</li>
<li><p>What is the average duration?</p>
</li>
<li><p>How much information needs to be scanned vs returned?</p>
</li>
<li><p>What are the expected success and failure outcomes?</p>
</li>
</ul>
</li>
<li><p>How are key business rule invariants to be protected?</p>
</li>
<li><p>Is reading information always needed to be authoritative or potentially stale?</p>
</li>
<li><p>How are business transactions spanning multiple services handled?</p>
</li>
<li><p>How are transient transaction errors handled?</p>
</li>
<li><p>How are timeouts/indeterminate outcomes handled?</p>
</li>
<li><p>How is service idempotency implemented?</p>
</li>
</ul>
<p>Solutions:</p>
<p>One feature of multi-active systems is the ability to operate simultaneously from multiple geographies with sustained throughput and transactional integrity, both during steady state and during disruptions or even disasters in other regions. In a 3-region CockroachDB deployment, one entire region can have an outage without affecting forward progress in the other two.</p>
<h2 id="heading-performance">Performance</h2>
<blockquote>
<p><em>The unit of time it takes to execute an operation, usually measured in response time or transaction time and throughput (work per time unit).</em></p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>How is customer experience affected by high response times and low throughput?</p>
</li>
<li><p>How many concurrent active sessions/users will connect to the system?</p>
</li>
<li><p>Are there specific performance and throughput targets (service level indicators) on a per-use case basis?</p>
</li>
<li><p>What is the ratio between reads and writes?</p>
</li>
<li><p>Can reads be served as potentially stale or always authoritative?</p>
</li>
<li><p>Will there be a caching tier to scale out reads, in that case, how is the cache invalidated and kept in sync?</p>
</li>
</ul>
<p>Defining performance requirements should ideally be context or journey specific. Most larger systems are composed of different customer journeys such as registration, login, deposit, withdrawal, pay and so on. Not all journeys have the same NFRs and may touch different services, hence it makes sense to define the performance goals based on that rather than at individual service level.</p>
<p>For example, for journey X:</p>
<ul>
<li><p>Raw data size: 4TB</p>
</li>
<li><p>Active connections: 400 to 500</p>
</li>
<li><p>Actual users: 7M</p>
</li>
<li><p>Active users: 500k</p>
</li>
<li><p>Reads must be authoritative</p>
</li>
<li><p>Sustained throughput under 60min, qualified with:</p>
<ul>
<li>5,000 business transactions per sec, equivalent to 15K QPS, at P99 &lt; 120ms, read ratio 75%</li>
</ul>
</li>
<li><p>Peak throughput under 30min, qualified with:</p>
<ul>
<li>7000 business transactions per sec, equivalent to 21K QPS, at P95 &lt; 150ms, read ratio 65%</li>
</ul>
</li>
<li><p>Extended peak throughput under 15min, qualified with:</p>
<ul>
<li>10,000 business transactions per sec, equivalent to 30K QPS, at P95 &lt; 300ms, read ratio 65%</li>
</ul>
</li>
</ul>
<p>Solutions:</p>
<p>CockroachDB delivers predictable <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance.html">response times and throughput</a> for different workload scales. When optimizing performance characteristics, it is typically a matter of finding opportunities in:</p>
<ol>
<li><p>Application workload patterns</p>
</li>
<li><p>Schema design</p>
</li>
<li><p>Cluster hardware capacity and utilization</p>
</li>
<li><p>Replica and leaseholder placement</p>
</li>
</ol>
<p>Most opportunities are outlined in <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance-best-practices-overview.html">SQL performance best practices</a>. A workload should be evenly distributed across all machines of a cluster (no hotspots) which happens automatically given a few schema design and load balancing considerations. By using different multi-region <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/multiregion-overview.html">capabilities</a>, the coordination between nodes over the network can also be minimized, drastically improving both read and write performance.</p>
<hr />
<p>Let's wrap with the other quality attributes by just highlighting a few relevant questions.</p>
<h2 id="heading-availability">Availability</h2>
<blockquote>
<p>A measure of the ability of a system to function in a state of serious service or infrastructure degradation.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>What level of redundancy will the system have (any accepted SPOFs)?</p>
</li>
<li><p>What type of infrastructure failures must the system survive (cloud, region, zone, rack or server)?</p>
</li>
<li><p>What’s the business impact on degraded or denied forward progress?</p>
</li>
<li><p>Will the system continue to function on partial infrastructure failure?</p>
</li>
<li><p>Are there any specific RTO and RPO metrics?</p>
</li>
<li><p>What are the requirements for backup and restore (MTTR)?</p>
</li>
</ul>
<h2 id="heading-evolvability">Evolvability</h2>
<blockquote>
<p>A system’s ability to make changes with low cost and small client impact.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>What's the structure and process around the development and deployment pipeline?</p>
</li>
<li><p>Are downtime windows allowed for production deployments?</p>
</li>
<li><p>Is a pre-production deployment environment needed?</p>
</li>
<li><p>What are the key factors that impact time-to-value in new business initiatives/improvements?</p>
</li>
<li><p>How does changing/adding functionality impact existing functionality?</p>
</li>
</ul>
<h2 id="heading-maintainability">Maintainability</h2>
<blockquote>
<p>A system’s ability to be repaired after an error occurs.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>What design principles are applied to reduce maintenance efforts?</p>
</li>
<li><p>How much QA/OPS effort is needed to verify and deploy the system?</p>
</li>
</ul>
<h2 id="heading-interoperability">Interoperability</h2>
<blockquote>
<p>How the system interacts with other subsystems or foreign services.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>How does data flow into the system and what is the output?</p>
</li>
<li><p>Is the system classified as online, nearline or offline?</p>
</li>
<li><p>What are the major infrastructure components involved (database, message broker)?</p>
</li>
<li><p>Is the system classified as a system of record or a system of access?</p>
</li>
<li><p>Is the interaction model a typical request/response based model or an event-driven, async model?</p>
</li>
</ul>
<h2 id="heading-visibility">Visibility</h2>
<blockquote>
<p>A system’s support for debugging and real-time monitoring.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>How is the system monitored and acting on alerts?</p>
</li>
<li><p>How can problems quickly be identified and corrected?</p>
</li>
</ul>
<h2 id="heading-security">Security</h2>
<blockquote>
<p>A system’s ability to support security controls including access controls, encryption, data isolation, secure information processing and auditing for compliance.</p>
</blockquote>
<p>Questions:</p>
<ul>
<li><p>Will the system run in a PCI or equivalent regulated environment?</p>
</li>
<li><p>Will the system handle PII data?</p>
</li>
<li><p>What security mechanisms does the system require on ingress and egress channels, or data protection?</p>
</li>
</ul>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article discusses non-functional requirements, also known as quality attributes, which are properties and characteristics of a software system that is not directly related to its functional aspects. It looks at how these requirements can be defined, quantified, and addressed when using the capabilities of CockroachDB.</p>
]]></content:encoded></item><item><title><![CDATA[Parallel Query Execution in CockroachDB]]></title><description><![CDATA[This article provides an example of increasing large query performance by using client-side parallel query execution.
Introduction
CockroachDB uses parallelism in many parts of its architecture to deliver high-scale distributed SQL execution. For exa...]]></description><link>https://blog.cloudneutral.se/parallel-query-execution-in-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/parallel-query-execution-in-cockroachdb</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[SQL]]></category><category><![CDATA[parallelism]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Wed, 26 Apr 2023 14:11:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/dKPosnvpE2Q/upload/df33bc5dde11d5a56f223616cdb43b3c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article provides an example of increasing large query performance by using client-side parallel query execution.</p>
<h1 id="heading-introduction">Introduction</h1>
<p>CockroachDB uses parallelism in many parts of its architecture to deliver high-scale distributed SQL execution. For example, to improve write performance, it uses a <a target="_blank" href="https://www.cockroachlabs.com/blog/parallel-commits/"><strong>parallel atomic commit protocol</strong></a> designed to cut the commit latency of a transaction from two roundtrips of consensus to one. When combined with <a target="_blank" href="https://www.cockroachlabs.com/blog/transaction-pipelining/"><strong>transaction pipelining</strong></a>, where write intents are replicated from leaseholders in parallel rather than sequentially, all waiting happens in the end at commit time, thereby drastically reducing latencies for multi-statement write transactions.</p>
<p><a target="_blank" href="https://www.cockroachlabs.com/blog/parallel-commits/"><img src="https://d33wubrfki0l68.cloudfront.net/eb4d9db154cd8c228985b2226e1f191583581fb7/36ea0/wp-content/uploads/2019/04/transaction_pipelining_new-order.png" alt="Graph: Inter-Node RTT vs TPC-C New-Order Transaction Latency" /></a></p>
<p>To improve read performance in multi-region high-latency deployments, the cost-based optimizer performs what's referred to as a <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/cost-based-optimizer.html"><strong>locality-optimized search</strong></a>. The optimizer may begin to scan for rows in the gateway node's local region and fan out to remote regions in parallel, but only if the local region did not satisfy the query. The remote lookup (performed in parallel) result is returned to the gateway once received without having to wait for completion. This increases read performance in multi-region deployments since results can be returned from wherever they are first found, without waiting for the completion of all lookups.</p>
<p>Last but not least, CockroachDB also uses <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/vectorized-execution.html">vectorized SQL query execution</a>, designed to process batches of columnar data instead of just a single row at a time. In the longer term, this will also make use of vectorized CPU (SIMD) instructions.</p>
<p>Parallelism is well exploited in the algorithms and mechanisms that CockroachDB uses. This works well for both larger and smaller statements that don't scan large volumes of data, which is typically something you'd want to avoid doing anyway in an OLTP database.</p>
<p>Now to the purpose of this article, what can the client do to take this even further?</p>
<h1 id="heading-client-side-parallelism">Client-side Parallelism</h1>
<p>The CockroachDB database (and SQL for that matter) does a decent job to hide the implementation details from clients through all abstraction layers. One of the primary tasks of a SQL database is to provide the illusion to clients that they are the sole users, free to read and write any piece of information without interference from others. In reality, the environment is highly concurrent and parallelized, which in practical terms means that the database is allowed to reorder concurrent transactions as long as the result is the same as if they had executed one at a time (serially), without any concurrency. This is the definition of <a target="_blank" href="https://en.wikipedia.org/wiki/Isolation_(database_systems)">SERIALIZABLE</a> transaction isolation.</p>
<p>A SQL database is designed to be highly capable of accepting queries from multiple application instances and threads in parallel. In a typical request-response, thread-bound execution model you get a connection from the pool, send a single or multi-statement transaction, await its completion and close the connection (recycled to the pool). While this gives a level of parallelism in terms of multiple application-level threads, it doesn't help that much for larger scans beyond what the database offers.</p>
<p>What if you want to take things a step further in terms of parallel execution and involve the client? For example, by first running a very large query that scans hundreds of thousands of rows to compute an aggregated sum in the database and then do the equivalent client side by decomposing the query into smaller blocks run in parallel. Let's find out if it makes any difference.</p>
<h1 id="heading-example-use-case">Example Use Case</h1>
<p>Assume we have a simple <code>product</code> table holding an inventory column.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> product
(
    <span class="hljs-keyword">id</span>        <span class="hljs-keyword">uuid</span>           <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> gen_random_uuid(),
    <span class="hljs-keyword">version</span>   <span class="hljs-built_in">int</span>            <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> <span class="hljs-number">0</span>,
    inventory <span class="hljs-built_in">int</span>            <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    <span class="hljs-keyword">name</span>      <span class="hljs-built_in">varchar</span>(<span class="hljs-number">128</span>)   <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    price     <span class="hljs-built_in">numeric</span>(<span class="hljs-number">19</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    sku       <span class="hljs-built_in">varchar</span>(<span class="hljs-number">128</span>)   <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">unique</span>,
    country   <span class="hljs-built_in">varchar</span>(<span class="hljs-number">128</span>)   <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,

    primary <span class="hljs-keyword">key</span> (<span class="hljs-keyword">id</span>)
);
</code></pre>
<p>Next, we add a covering index on the country and insert a huge bunch of products:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> <span class="hljs-keyword">ON</span> product (country) STORING (inventory,<span class="hljs-keyword">name</span>,price);

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> product (inventory,<span class="hljs-keyword">name</span>,price,sku,country)
<span class="hljs-keyword">select</span> <span class="hljs-number">10</span> + random() * <span class="hljs-number">50</span>,
       <span class="hljs-keyword">md5</span>(random()::<span class="hljs-built_in">text</span>),
       <span class="hljs-number">500.00</span> + random() * <span class="hljs-number">500.00</span>,
       gen_random_uuid()::<span class="hljs-built_in">text</span>,
       <span class="hljs-string">'US'</span>
<span class="hljs-keyword">from</span> generate_series(<span class="hljs-number">1</span>, <span class="hljs-number">500000</span>) <span class="hljs-keyword">as</span> i;
<span class="hljs-comment">-- Repeat insert for 9 more countries, in total 5M rows</span>
</code></pre>
<h2 id="heading-composed-query">Composed Query</h2>
<p>Let's run a single composed query to get the total inventory sum grouped by country:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">sum</span>(p.inventory), p.country <span class="hljs-keyword">from</span> product p <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span> p.country;
</code></pre>
<p>Gives:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">sum</span>(p.inventory), p.country <span class="hljs-keyword">from</span> product p <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span> p.country;
    sum    | country
<span class="hljs-comment">-----------+----------</span>
  17251976 | BE
  17253042 | DE
  17234287 | DK
  17253539 | ES
  17229425 | FI
  17250751 | FR
  17247093 | NO
  17257296 | SE
  17237964 | UK
  17261461 | US
(10 rows)


Time: 4.083s total (execution 4.083s / network 0.000s)
</code></pre>
<p>This query still runs fairly fast for a total row count of 5M. Let's look a the <code>explain</code> plan to see that we are scanning the entire table:</p>
<pre><code class="lang-bash">explain analyze select sum(p.inventory), p.country from product p group by p.country;
                                                            info
-----------------------------------------------------------------------------------------------------------------------------
  planning time: 421µs
  execution time: 4.1s
  distribution: full
  vectorized: <span class="hljs-literal">true</span>
  rows <span class="hljs-built_in">read</span> from KV: 5,000,000 (466 MiB, 47 gRPC calls)
  cumulative time spent <span class="hljs-keyword">in</span> KV: 3.8s
  maximum memory usage: 10 MiB
  network usage: 0 B (0 messages)
  regions: europe-west1

  • group (streaming)
  │ nodes: n1
  │ regions: europe-west1
  │ actual row count: 10
  │ estimated row count: 10
  │ group by: country
  │ ordered: +country
  │
  └── • scan
        nodes: n1
        regions: europe-west1
        actual row count: 5,000,000
        KV time: 3.8s
        KV contention time: 0µs
        KV rows <span class="hljs-built_in">read</span>: 5,000,000
        KV bytes <span class="hljs-built_in">read</span>: 466 MiB
        KV gRPC calls: 47
        estimated max memory allocated: 10 MiB
        estimated row count: 7,036,818 (100% of the table; stats collected 5 days ago; using stats forecast <span class="hljs-keyword">for</span> 5 days ago)
        table: product@product_country_idx
        spans: FULL SCAN
(31 rows)


Time: 4.091s total (execution 4.090s / network 0.000s)
</code></pre>
<h2 id="heading-decomposed-parallel-queries">Decomposed Parallel Queries</h2>
<p>Let's decompose the single query into multiple ones and run them in parallel, then combine the results in the end. We refactor the query by removing the GROUP BY and filtering on the indexed country column instead. Effectively the GROUP BY operator is moved client side.</p>
<p>Example of a single country query:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">sum</span>(p1_0.inventory) <span class="hljs-keyword">from</span> product p1_0 <span class="hljs-keyword">where</span> p1_0.country=<span class="hljs-string">'US'</span>;
    sum
<span class="hljs-comment">------------</span>
  17261461
(1 row)

Time: 231ms total (execution 231ms / network 0ms)
</code></pre>
<p>Let's also check the execution plan:</p>
<pre><code class="lang-bash">explain analyze select sum(p1_0.inventory) from product p1_0 <span class="hljs-built_in">where</span> p1_0.country=<span class="hljs-string">'US'</span>;
                                                           info
---------------------------------------------------------------------------------------------------------------------------
  planning time: 535µs
  execution time: 248ms
  distribution: full
  vectorized: <span class="hljs-literal">true</span>
  rows <span class="hljs-built_in">read</span> from KV: 500,000 (47 MiB, 5 gRPC calls)
  cumulative time spent <span class="hljs-keyword">in</span> KV: 225ms
  maximum memory usage: 10 MiB
  network usage: 0 B (0 messages)
  regions: europe-west1

  • group (scalar)
  │ nodes: n1
  │ regions: europe-west1
  │ actual row count: 1
  │ estimated row count: 1
  │
  └── • scan
        nodes: n1
        regions: europe-west1
        actual row count: 500,000
        KV time: 225ms
        KV contention time: 0µs
        KV rows <span class="hljs-built_in">read</span>: 500,000
        KV bytes <span class="hljs-built_in">read</span>: 47 MiB
        KV gRPC calls: 5
        estimated max memory allocated: 10 MiB
        estimated row count: 696,645 (9.9% of the table; stats collected 5 days ago; using stats forecast <span class="hljs-keyword">for</span> 5 days ago)
        table: product@product_country_idx
        spans: [/<span class="hljs-string">'US'</span> - /<span class="hljs-string">'US'</span>]
(29 rows)


Time: 249ms total (execution 249ms / network 0ms)
</code></pre>
<p>The estimated row count is about 10% of the table which sounds about right since we inserted 500K rows per country.</p>
<p>Now we apply a parallel fork and join operation at the client side. This means we fire ten concurrent threads with the individual queries and then await completion before proceeding. After that, the results are joined together.</p>
<p>For this example, we'll use Spring Data and a JPA query:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Query("select sum(p.inventory) from Product p where p.country = :country")</span>
<span class="hljs-function">Integer <span class="hljs-title">sumInventory</span><span class="hljs-params">(<span class="hljs-meta">@Param("country")</span> String country)</span></span>;
</code></pre>
<p>First queue up the workers, one for each country:</p>
<pre><code class="lang-java">List&lt;Callable&lt;Pair&lt;String, Integer&gt;&gt;&gt; tasks = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
  StringUtils.commaDelimitedListToSet(<span class="hljs-string">"SE,UK,DK,NO,ES,US,FI,FR,BE,DE"</span>).forEach(country -&gt;
  tasks.add(() -&gt; Pair.of(country, productRepository.sumInventory(country))));
</code></pre>
<p>Next, we unleash the workers to run in parallel while blocking until completion (or cancellation by timeout):</p>
<pre><code class="lang-java">ConcurrencyUtils.runConcurrentlyAndWait(tasks, <span class="hljs-number">10</span>, TimeUnit.MINUTES, sums::add);
</code></pre>
<p>This utility method makes use of Java's <code>CompletableFuture</code> was introduced way back in <a target="_blank" href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/CompletableFuture.html">Java 8</a>. It's like a Swiss army knife for asynchronous computations using parallel decomposition constructs. Tasks are decomposed into steps that can be forked and joined in different stages to a final result. It's a very elegantly designed API.</p>
<p>In this example, we are just using a small subset of it to run our query tasks in parallel and join the results. It also adds cancellation, in case queries would go rogue and run for too long. Cancellation is not a natural part of <code>CompletableFuture</code> so there's a small trick in there to add that.</p>
<p>It's also using a <strong>bounded thread pool</strong> which means that no matter how many tasks are queued it will only run a limited number of concurrent threads by adding backpressure on the client code queuing up tasks. This is more lenient on thread scheduling since the client will be blocking anyway.</p>
<pre><code class="lang-java">ScheduledExecutorService cancellationService
        = Executors.newSingleThreadScheduledExecutor();

ExecutorService executor = boundedThreadPool();

List&lt;CompletableFuture&lt;Void&gt;&gt; allFutures = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();

<span class="hljs-keyword">final</span> Instant expiryTime = Instant.now().plus(timeout, timeUnit.toChronoUnit());

tasks.forEach(callable -&gt; {
    allFutures.add(CompletableFuture.supplyAsync(() -&gt; {
                <span class="hljs-keyword">if</span> (Instant.now().isAfter(expiryTime)) {
                    logger.warn(<span class="hljs-string">"Task scheduled after expiration time: "</span> + expiryTime);
                    <span class="hljs-keyword">return</span> <span class="hljs-keyword">null</span>;
                }
                Future&lt;V&gt; future = executor.submit(callable);
                <span class="hljs-keyword">long</span> cancellationTime = Duration.between(Instant.now(), expiryTime).toMillis();
                cancellationService.schedule(() -&gt; future.cancel(<span class="hljs-keyword">true</span>), cancellationTime, TimeUnit.MILLISECONDS);
                <span class="hljs-keyword">try</span> {
                    <span class="hljs-keyword">return</span> future.get();
                } <span class="hljs-keyword">catch</span> (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(e);
                } <span class="hljs-keyword">catch</span> (ExecutionException e) {
                    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(e.getCause());
                }
            }, executor)
            .thenAccept(completionFunction)
            .exceptionally(throwableFunction)
    );
});

CompletableFuture.allOf(
        allFutures.toArray(<span class="hljs-keyword">new</span> CompletableFuture[]{})).join();

executor.shutdownNow();
cancellationService.shutdownNow();
</code></pre>
<p>Once all the query sums are gathered we simply add them up client side using a stream API aggregator:</p>
<pre><code class="lang-java">sums.stream().mapToInt(Pair::getSecond).sum()
</code></pre>
<p>OK, the result then? Here's the log output:</p>
<pre><code class="lang-bash">09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> UK is 17237964
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> US is 17261461
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> DK is 17234287
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> SE is 17257296
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> ES is 17253539
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> NO is 17247093
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> FR is 17250751
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> DE is 17253042
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> FI is 17229425
09:21:53.253  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Inventory sum <span class="hljs-keyword">for</span> BE is 17251976
09:21:53.254  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Total inventory sum is 172476834
09:21:53.254  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Verified inventory sum is 172476834
09:21:53.254  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Parallel execution time: PT1.1578745S
09:21:53.254  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Serial execution time: PT2.9943538S
09:21:53.254  INFO [i.r.s.p.ParallelApplication$<span class="hljs-variable">$SpringCGLIB</span>$<span class="hljs-variable">$0</span>] Execution time diff: 259%
</code></pre>
<p>In this simple example, we can notice a 260% performance improvement by decomposing the query and running these independently.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article explains how CockroachDB uses parallelism to improve read and write performance, and how client-side parallel query execution can be used to further increase large query performance. An example use case is provided to illustrate how this works, using Spring Data and a JPA query to run a parallel fork and join operation at the client side with a bounded thread pool and a cancellation service.</p>
<p>The source code for the article is available on <a target="_blank" href="https://github.com/kai-niemi/roach-spring-boot-v3/tree/main/spring-boot-parallel">GitHub</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Enhancing Global Read Performance in CockroachDB]]></title><description><![CDATA[Introduction
CockroachDB is a geo-distributed SQL database purpose-built from the ground up for high scalability, fault tolerance, cloud neutrality and usability for developers and operators. It also offers the highest SQL standard for transactional ...]]></description><link>https://blog.cloudneutral.se/enhancing-global-read-performance-in-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/enhancing-global-read-performance-in-cockroachdb</guid><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Wed, 26 Apr 2023 14:08:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/TrhLCn1abMU/upload/ca7ecb460872a536d27f365ebe03e700.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>CockroachDB is a geo-distributed SQL database purpose-built from the ground up for high scalability, fault tolerance, cloud neutrality and usability for developers and operators. It also offers the highest SQL standard for transactional integrity - serializable isolation.</p>
<p>The term geo-distribution is to emphasise its capability to break out of the low-latency, stable networking assumptions of a single data center or single region deployment. CockroachDB clusters can span the globe and still offer one logical database towards applications with intact semantics and guarantees. No more need for manual sharding.</p>
<p>One major influence on performance, when nodes need to perform some level of coordination, is network latency. There are numerous mechanisms at work in CockroachDB to mitigate the effects of high cross-link latencies. For performance and to ensure safety and liveness in volatile and ephemeral hosting environments.</p>
<p>One key ingredient for high performance and linear scalability in CockroachDB is the ability to distribute workload both vertically and horizontally and thereby leverage the aggregate compute/IO capacity of a cluster. This is mainly achieved by the database itself but application designers can help by using some best practices around schema design and query patterns and load balance traffic across the cluster. In general terms, it's about avoiding hotspots from forming, avoiding contention if possible and reducing large table scans.</p>
<h1 id="heading-techniques">Techniques</h1>
<p>CockroachDB uses a distributed SQL execution engine at its core, which means many things:</p>
<ul>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/architecture/sql-layer.html">The SQL layer</a> is parallelized and pushes processors close to the proximity of data.</p>
</li>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/cost-based-optimizer.html">The SQL optimizer</a> is purpose-built with latency as a cost factor and locality awareness.</p>
</li>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/architecture/transaction-layer.html#transaction-pipelining">The transaction layer</a> uses a sophisticated pipelining and parallel commit protocol to reduce round trips to the theoretical minimum for consensus.</p>
</li>
<li><p>Backup and restore are <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/take-and-restore-locality-aware-backups.html">locality-aware</a>.</p>
</li>
<li><p>The <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/vectorized-execution.html">vectorized execution engine</a> provides good performance for a wide range of queries.</p>
</li>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/load-based-splitting.html">Load-based splitting</a> and rebalancing heuristics help to balance load across a cluster of machines.</p>
</li>
</ul>
<p>Scaling reads is generally considered to be slightly easier than scaling writes. In a global or multi-region deployment topology, there are a couple of useful patterns including global tables, regional-by-row table localities and follower reads.</p>
<h2 id="heading-global-tables">Global Tables</h2>
<p>A global table means that all voting range replicas reside on nodes in the <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/multiregion-overview#database-regions">primary region</a>, and non-voting replicas in remote regions to service consistent reads. The database automatically adjusts the replication factor (RF) to ensure there are range replicas for these tables in each configured region. It also uses something called <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/architecture/transaction-layer#non-blocking-transactions">non-blocking transactions</a> in combination with non-voting replicas to provide low-latency global reads, also during workload contention. This concept is useful if you have a table which has a low volume of writes but high volumes of reads from different regions and the reads must be authoritative.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> primary region <span class="hljs-string">"eu-north-1"</span>;
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> <span class="hljs-keyword">add</span> region <span class="hljs-string">"eu-west-3"</span>;
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> <span class="hljs-keyword">add</span> region <span class="hljs-string">"us-east-1"</span>;

<span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> postal_codes
(
    <span class="hljs-keyword">id</span>   <span class="hljs-built_in">int</span> primary <span class="hljs-keyword">key</span>,
    code <span class="hljs-keyword">string</span>
);

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> postal_codes <span class="hljs-keyword">SET</span> LOCALITY <span class="hljs-keyword">GLOBAL</span>;
</code></pre>
<h2 id="heading-regional-by-row">Regional by Row</h2>
<p>Regional by row is a table locality in which the home region is defined at the row level in a table. In contrast to regional tables, where all rows in a table have the same home region.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> primary region <span class="hljs-string">"eu-north-1"</span>;
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> <span class="hljs-keyword">add</span> region <span class="hljs-string">"eu-west-3"</span>;
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">database</span> <span class="hljs-keyword">test</span> <span class="hljs-keyword">add</span> region <span class="hljs-string">"us-east-1"</span>;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span>
(
    user_id     <span class="hljs-built_in">INT</span>    <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    <span class="hljs-keyword">name</span>        <span class="hljs-keyword">STRING</span> <span class="hljs-literal">NULL</span>,
    postal_code <span class="hljs-built_in">int</span>    <span class="hljs-literal">NULL</span>,

    PRIMARY <span class="hljs-keyword">KEY</span> (user_id <span class="hljs-keyword">ASC</span>)
);

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">SET</span> LOCALITY regional <span class="hljs-keyword">by</span> <span class="hljs-keyword">row</span>;

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> <span class="hljs-keyword">users</span> (user_id, crdb_region)
<span class="hljs-keyword">select</span> <span class="hljs-keyword">no</span>, <span class="hljs-string">'eu-north-1'</span> <span class="hljs-keyword">from</span> generate_series(<span class="hljs-number">1</span>, <span class="hljs-number">100</span>) <span class="hljs-keyword">no</span>;

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> <span class="hljs-keyword">users</span> (user_id, crdb_region)
<span class="hljs-keyword">select</span> <span class="hljs-keyword">no</span>, <span class="hljs-string">'eu-west-3'</span> <span class="hljs-keyword">from</span> generate_series(<span class="hljs-number">101</span>, <span class="hljs-number">200</span>) <span class="hljs-keyword">no</span>;

<span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> <span class="hljs-keyword">users</span> (user_id, crdb_region)
<span class="hljs-keyword">select</span> <span class="hljs-keyword">no</span>, <span class="hljs-string">'us-east-1'</span> <span class="hljs-keyword">from</span> generate_series(<span class="hljs-number">201</span>, <span class="hljs-number">300</span>) <span class="hljs-keyword">no</span>;
</code></pre>
<h2 id="heading-follower-reads">Follower Reads</h2>
<p>Follower reads are akin to <a target="_blank" href="https://www.cockroachlabs.com/blog/follower-reads/">Content Delivery Networks (CDN)</a> by not having to chase the leaseholder for a given range that can potentially be located in another part of the world. Instead, the closest replica to a gateway node (receiving the request) can service the read with some staleness. There are two variants of follower reads called <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/follower-reads">exact staleness and bounded staleness</a> reads.</p>
<p>On the surface, follower reads may appear similar to global tables but the latter works quite <a target="_blank" href="https://www.cockroachlabs.com/docs/v21.2/architecture/transaction-layer#how-non-blocking-transactions-work">differently</a> through non-voting replicas and non-blocking transactions.</p>
<p>Follower reads are useful for more ad-hoc SQL queries for both partitioned and unpartitioned tables, where reads are allowed to be non-authoritative (potentially stale). Global tables are always authoritative (no staleness bounds) but pay for that in higher write latency.</p>
<p>The choice between follower reads and global tables should be driven by staleness requirements, read vs write volumes and survival goals. In other words, if the decision to write something is based on a read, and the value read must be authoritative, then a global table is a better choice. On the other hand, if write performance is a priority and a staleness window is acceptable, then follower-reads are better.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- per statement:</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">ID</span> <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">USERS</span> <span class="hljs-keyword">AS</span> <span class="hljs-keyword">OF</span> <span class="hljs-keyword">SYSTEM</span> <span class="hljs-built_in">TIME</span> follower_read_timestamp() <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>;

<span class="hljs-comment">-- alt via session var:</span>
<span class="hljs-keyword">BEGIN</span>;
<span class="hljs-keyword">SET</span> <span class="hljs-keyword">TRANSACTION</span> <span class="hljs-keyword">AS</span> <span class="hljs-keyword">OF</span> <span class="hljs-keyword">SYSTEM</span> <span class="hljs-built_in">TIME</span> follower_read_timestamp();
<span class="hljs-keyword">SELECT</span> ..
<span class="hljs-keyword">COMMIT</span>;
</code></pre>
<h1 id="heading-further-reading">Further Reading</h1>
<ul>
<li><p>Enabling the Next Generation of Multi-Region Applications with CockroachDB<br />  <a target="_blank" href="https://dl.acm.org/doi/10.1145/3514221.3526053">https://dl.acm.org/doi/10.1145/3514221.3526053</a></p>
</li>
<li><p>CockroachDB: The Resilient Geo-Distributed SQL Database<br />  <a target="_blank" href="https://dl.acm.org/doi/10.1145/3318464.3386134">https://dl.acm.org/doi/10.1145/3318464.3386134</a></p>
</li>
</ul>
<h1 id="heading-summary">Summary</h1>
<p>CockroachDB is a geo-distributed SQL database designed for scalability, fault tolerance, cloud neutrality, and usability. It offers distributed SQL execution and concepts like global tables, and follower reads to help balance read-heavy load across a cluster of machines. When deciding between follower reads and global tables, factors such as staleness requirements, read vs write volumes, and survival goals should be taken into consideration. Global tables are better for authoritative reads, while follower reads are better for write performance with an acceptable staleness window.</p>
]]></content:encoded></item><item><title><![CDATA[Spring Retry with CockroachDB]]></title><description><![CDATA[Spring Retry is a small library for retrying failed method invocations of a transient nature. Typically when interacting with another service over the network, a message broker or database.
In this tutorial, we'll look at using spring retry for seria...]]></description><link>https://blog.cloudneutral.se/spring-retry-with-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/spring-retry-with-cockroachdb</guid><category><![CDATA[Springboot]]></category><category><![CDATA[cockroachdb]]></category><category><![CDATA[retries]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Thu, 13 Apr 2023 13:20:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1681388060765/5b51edb5-b0e2-4b20-b7a9-58698735a988.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Spring Retry is a small library for retrying failed method invocations of a transient nature. Typically when interacting with another service over the network, a message broker or database.</p>
<p>In this tutorial, we'll look at using spring retry for serialization conflict errors denoted by the SQL state code <code>40001</code>.</p>
<h1 id="heading-maven-setup">Maven Setup</h1>
<p>To use Spring Retry, you need to add the Spring Retry and Spring AOP dependencies to your <code>pom.xml</code>.</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.retry<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-retry<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>2.0.1<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-aspects<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>5.3.10<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
</code></pre>
<h1 id="heading-configuration">Configuration</h1>
<p>To enable Spring Retry in an application, add the <code>@EnableRetry</code> annotation to any of the <code>@Configuration</code> classes:</p>
<pre><code class="lang-java"><span class="hljs-meta">@EnableRetry</span>
<span class="hljs-meta">@Configuration</span>
<span class="hljs-meta">@SpringBootApplication</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MyApplication</span> </span>{
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">main</span><span class="hljs-params">(String[] args)</span> </span>{
        <span class="hljs-keyword">new</span> SpringApplicationBuilder(MyApplication.class)
                .run(args);
    }
}
</code></pre>
<h1 id="heading-example-service">Example Service</h1>
<p>Using Spring Retry is as simple as adding the <code>@Retryable</code> annotation to the methods to-be-retried:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">OrderService</span> </span>{
    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
    <span class="hljs-meta">@Retryable</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Order <span class="hljs-title">updateOrderStatus</span><span class="hljs-params">(Long orderId,
ShipmentStatus status, BigDecimal amount)</span> </span>{
        Order order = ...;
        <span class="hljs-keyword">return</span> order;
    }
}
</code></pre>
<p>In our case, however, we want to be more specific on what type of exceptions qualify for a retry and also tailor the backoff policy to use an exponentially increasing delay with jitter.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">OrderService</span> </span>{
    <span class="hljs-meta">@Transactional(propagation = Propagation.REQUIRES_NEW)</span>
    <span class="hljs-meta">@Retryable(exceptionExpression = "@exceptionClassifier.shouldRetry(#root)",
            maxAttempts = 5,
            backoff = @Backoff(maxDelay = 15_000, multiplier = 1.5))</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Order <span class="hljs-title">updateOrderStatus</span><span class="hljs-params">(Long orderId,
ShipmentStatus status, BigDecimal amount)</span> </span>{
        Order order = ...;
        <span class="hljs-keyword">return</span> order;
    }
}
</code></pre>
<p>The <code>backoff</code> annotation parameters defines a policy that results in the <code>ExponentialRandomBackOffPolicy</code> is used at runtime.</p>
<p>Next, let's look at the exception classifier:</p>
<pre><code class="lang-java"><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CockroachExceptionClassifier</span> </span>{
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> Logger logger = LoggerFactory.getLogger(getClass());

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">final</span> String SERIALIZATION_FAILURE = <span class="hljs-string">"40001"</span>;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">boolean</span> <span class="hljs-title">shouldRetry</span><span class="hljs-params">(Throwable ex)</span> </span>{
        <span class="hljs-keyword">if</span> (ex == <span class="hljs-keyword">null</span>) {
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
        }
        Throwable throwable = NestedExceptionUtils.getMostSpecificCause(ex);
        <span class="hljs-keyword">if</span> (throwable <span class="hljs-keyword">instanceof</span> SQLException) {
            <span class="hljs-keyword">return</span> shouldRetry((SQLException) throwable);
        }
        logger.warn(<span class="hljs-string">"Non-transient exception {}"</span>, ex.getClass());
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">boolean</span> <span class="hljs-title">shouldRetry</span><span class="hljs-params">(SQLException ex)</span> </span>{
        <span class="hljs-keyword">if</span> (SERIALIZATION_FAILURE.equals(ex.getSQLState())) {
            logger.warn(<span class="hljs-string">"Transient SQL exception detected : sql state [{}], message [{}]"</span>,
                    ex.getSQLState(), ex.toString());
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">true</span>;
        }
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">false</span>;
    }
}
</code></pre>
<p>We also add the classifier bean to the configuration:</p>
<pre><code class="lang-java">
    <span class="hljs-meta">@Bean</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> CockroachExceptionClassifier <span class="hljs-title">exceptionClassifier</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> CockroachExceptionClassifier();
    }
</code></pre>
<p>The <code>shouldRetry</code> method simply looks for the exception type and if it is a <code>SQLException</code> that it has the proper state code <code>40001</code>.</p>
<p>We could qualify exceptions with other state codes but then there are no guarantees of multiple side effects when retried. For example, if a transaction involves multiple INSERTs and the COMMIT is successful but lost in transit in the reply back to the client. In that case, it wouldn't use state code 40001 but more likely a broken connection error code.</p>
<p>To be safe, only retry on the state code <code>40001</code> and nothing else, unless you are sure about the side effects of your SQL transactions and it's considered safe (or the operations are idempotent).</p>
<h1 id="heading-demo-project">Demo Project</h1>
<p><a target="_blank" href="https://github.com/kai-niemi/roach-retry-examples">Roach Retry</a> is a project that provides runnable examples of different transaction retry strategies for Spring Boot and the JavaEE stack. It includes Spring Retry along with a simpler AOP-driven approach and JavaEE interceptors for old-style stateless session beans.</p>
<h2 id="heading-step-1-startup">Step 1: Startup</h2>
<p>Create the database:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database roach_retry"</span>
</code></pre>
<p>Build the app:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> spring-retry
../mvnw clean install
</code></pre>
<p>Run the app:</p>
<pre><code class="lang-bash">java -jar target/roach-retry.jar
</code></pre>
<p>Then open another shell window so you have at least two windows. In any of the shells, check that the service is up and connected to the database:</p>
<pre><code class="lang-bash">curl --verbose http://localhost:8090/api
</code></pre>
<h2 id="heading-step-2-get-order-request-form">Step 2: Get Order Request Form</h2>
<p>Print an order form template that we will use to create orders:</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order/template &gt; form.json
</code></pre>
<h2 id="heading-step-3-submit-order-form">Step 3: Submit Order Form</h2>
<p>Create at least one purchase order:</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order -H <span class="hljs-string">"Content-Type:application/json"</span> -X POST -d <span class="hljs-string">"@form.json"</span>
</code></pre>
<h2 id="heading-step-4-produce-a-readwrite-conflict">Step 4: Produce a Read/Write Conflict</h2>
<p>Assuming that there is now an existing order with ID 1 with status <code>PLACED</code>. We will read that order and change the status to something else, concurrently. This is known as a read-write or unrepeatable-read conflict which is prevented by serializable isolation. As a result, there will be a SQL exception and a rollback.</p>
<p>When this happens, the retry mechanism will kick in and retry the failed transaction. It will then succeed since the two transactions are no longer conflicting since one of them was committed successfully.</p>
<p>To observe this predictably we'll use two separate sessions with a controllable delay between the read and write operations.</p>
<p>Overview of the SQL operations executed (what the service will execute):</p>
<pre><code class="lang-sql"><span class="hljs-keyword">BEGIN</span>; <span class="hljs-comment">-- T1</span>
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> purchase_order <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>; <span class="hljs-comment">-- T1 </span>
<span class="hljs-comment">-- T1: Assert that status is `PLACED`</span>
<span class="hljs-comment">-- T1: Suspend for 15s  </span>
<span class="hljs-keyword">BEGIN</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> purchase_order <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>; <span class="hljs-comment">-- T2</span>
<span class="hljs-comment">-- Assert that status is still `PLACED`</span>
<span class="hljs-keyword">UPDATE</span> purchase_order <span class="hljs-keyword">SET</span> order_status=<span class="hljs-string">'PAID'</span> <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>; <span class="hljs-comment">-- T2 </span>
<span class="hljs-keyword">COMMIT</span>; <span class="hljs-comment">-- T2 (OK)</span>
<span class="hljs-keyword">UPDATE</span> purchase_order <span class="hljs-keyword">SET</span> order_status=<span class="hljs-string">'CONFIRMED'</span> <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>; <span class="hljs-comment">-- T1 (ERROR!)</span>
<span class="hljs-keyword">ROLLBACK</span>; <span class="hljs-comment">-- T1</span>
</code></pre>
<p>Now prepare the two separate shell windows so you can run the commands concurrently.</p>
<p>First, check that the order with ID 1 exists and has the status <code>PLACED</code> (or anything else other than <code>CONFIRMED</code>)</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order/1
</code></pre>
<p>Now let's run the first transaction (T1) where there is a simulated 15-sec delay before the commit (you can increase/decrease the time):</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order/1?status=CONFIRMED\&amp;delay=15000 -i -X PUT
</code></pre>
<p>In less than 15 sec and before T1 commits, run the second transaction (T2) from another session which doesn't wait and succeeds with a commit:</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order/1?status=PAID -i -X PUT
</code></pre>
<p>At this point, T1 has no other choice than to rollback and that will trigger a retry:</p>
<pre><code class="lang-bash">ERROR: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write <span class="hljs-keyword">for</span> key /Table/109/1/12/0 at timestamp 1669990868.355588000,0 too old; wrote at 1669990868.778375000,3: <span class="hljs-string">"sql txn"</span> meta={id=92409d02 key=/Table/109/1/12/0 pri=0.03022202 epo=0 ts=1669990868.778375000,3 min=1669990868.355588000,0 seq=0} lock=<span class="hljs-literal">true</span> <span class="hljs-built_in">stat</span>=PENDING rts=1669990868.355588000,0 wto=<span class="hljs-literal">false</span> gul=1669990868.855588000,0
</code></pre>
<p>The retry mechanism will catch that SQL exception, back off for a few hundred millis and then retry until it eventually succeeds (1 attempt).</p>
<p>The expected outcome is a <code>200 OK</code> returned to both client sessions. The final order status must be CONFIRMED since client 1 request (T1) was retried and eventually committed, thereby overwriting T2.</p>
<pre><code class="lang-bash">curl http://localhost:8090/api/order/1
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In this tutorial, we explore using Spring Retry, a library for retrying failed method invocations of a transient nature, to handle serialization conflict errors denoted by the SQL state code 40001. We cover how to set up Maven, configure Spring Retry, create a sample service, and demonstrate a retry scenario using a demo project.</p>
]]></content:encoded></item><item><title><![CDATA[Create a Ledger Utilizing CockroachDB - Part III - Architecture]]></title><description><![CDATA[In the third part of a series about RoachBank, a full-stack, financial accounting ledger running on CockroachDB, we will look into the design features and architectural mechanisms used.
Problem Statement
Let's begin by describing what the service doe...]]></description><link>https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-iii-architecture</link><guid isPermaLink="true">https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-iii-architecture</guid><category><![CDATA[distributed ledger]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Wed, 12 Apr 2023 16:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Grk4L0ZJeAU/upload/28976333bcc9b5270c148d083b8a0f81.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the third part of a series about RoachBank, a full-stack, financial accounting ledger running on CockroachDB, we will look into the design features and architectural mechanisms used.</p>
<h1 id="heading-problem-statement">Problem Statement</h1>
<p>Let's begin by describing what the service does by using a problem statement. A problem statement specifies the system requirements at a high level. Input from business or product owners is critical in composing this statement.</p>
<p>The main characteristics include:</p>
<ul>
<li><p>Uses business domain language.</p>
</li>
<li><p>Has clear sentences without jargon.</p>
</li>
<li><p>Describes the project scope.</p>
</li>
<li><p>Specifies the context of the business capability.</p>
</li>
<li><p>Specifies the users/actors of the system.</p>
</li>
<li><p>Specifies known business and technical constraints that are important to consider.</p>
</li>
<li><p>Could serve as the foundation for identifying candidate domain objects (picking nouns).</p>
</li>
</ul>
<p><strong>Example:</strong></p>
<blockquote>
<p>The business requires an accounting system to keep track of monetary transactions. Users of the system are internal components that need to manage financial transactions between accounts.</p>
<p>The system must keep track of monetary accounts, transaction history on those accounts and account balances. Each account is associated with an account owner and a base currency. A transaction is the outcome of moving funds between different accounts. A transaction contains several account legs that may involve accounts with different currencies.</p>
<p>The safety mechanism used is the double-entry bookkeeping principle where each transaction must have a zero balance sum of all legs with the same currency. The system must also be able to produce reports of account activities and transactions for external auditors.</p>
</blockquote>
<p>Problem statements are <strong>very useful</strong> when creating new components and for understanding the purpose and meaning of existing ones. Like for this hypothetical accounting ledger.</p>
<h1 id="heading-architectural-mechanisms">Architectural Mechanisms</h1>
<p>Moving on from the problem domain to the solution domain. Software architectures can be visualized using UML diagrams or the <a target="_blank" href="https://c4model.com/">https://c4model.com/</a> that I find quite useful. But what if you don't have diagrams or don't fancy drawing them?</p>
<p>One approach is to analyze what key architectural mechanisms are needed to implement all the features. Mechanisms are abstractions so when refining these into usable components or tools, you effectively do technology selection appropriate for the business domain (and organisation).</p>
<p>An architectural mechanism represents a common solution to a frequently encountered architectural problem that is not specific to a project or business domain. Quite similar to design patterns.</p>
<p>Implementation mechanisms are typically selected from a technology baseline within a tech organisation. For instance, when <em>persistence</em> is needed (analysis mechanism) and ACID properties are required, then an RDBMS (design mechanism) should be used where CockroachDB (implementation mechanism) is a great choice.</p>
<p>Key architectural mechanisms and realizations in this accounting ledger:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Analysis</td><td><strong>Design &amp; Implementation</strong></td><td><strong>Characteristics &amp; Constraints</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Persistence</td><td>RDBMS / CockroachDB &amp; PostgreSQL</td><td>CockroachDB is used for all persistence needs. PostgreSQL support is also available for reference.</td></tr>
<tr>
<td>Data Access</td><td>ORM / JPA and Hibernate</td><td>Hibernate via Spring Data JPA and Spring Data JDBC for reference. JDBC as default. These modes can be switched for performance comparison.</td></tr>
<tr>
<td>Transaction Management</td><td>Local Transactions (no XA)</td><td>The system uses a local transaction manager for accessing transactional resources such as the database. For database access, a local JPA transaction manager is used. Serializable transaction isolation is required. For eventing, the system uses a mix of Kafka listeners and CDC webhook endpoints for visualization, both with at least once guarantee.</td></tr>
<tr>
<td>Interoperability</td><td>Hypermedia API, Websockets, Kafka consumer and webhook endpoint for CockroachDB CDC sinks</td><td>Spring Hateoas using HAL+json. Streaming text oriented mesaging protocol (STOMP). Kafka consumer/publisher of CDC event</td></tr>
<tr>
<td>Frontend</td><td>Web UI / Thymeleaf template framework, CSS and JQuery, Bootstrap</td><td>For visualisation of account activities and system liveness during partial infrastructure disruptions.</td></tr>
<tr>
<td>Observability</td><td>Logging / Pull-based HTTP queries</td><td>SLF4J + Logback, Spring Actuators, Prometheus endpoint and TTDDYY proxy for JDBC logging</td></tr>
<tr>
<td>Caching</td><td>HTTP-level / Client side use of cache headers</td><td>HTTP cache headers in REST API. Local Spring cache for heavy reporting queries.</td></tr>
<tr>
<td>Resource Management</td><td>Connection Pooling</td><td>HikariCP data source.</td></tr>
<tr>
<td>Scheduling</td><td>Application Level</td><td>Spring built-in cron task scheduling (non-clustered)</td></tr>
<tr>
<td>Versioning</td><td>Database Versioning</td><td>Flyway</td></tr>
<tr>
<td>Inversion of Control</td><td>Application IOC</td><td>Spring Boot and AOP aspects for retryable transactions. JDBC driver retries (default)</td></tr>
<tr>
<td>Deployment</td><td>Container / Spring Boot</td><td>Spring Boot self-contained executable JAR</td></tr>
<tr>
<td>Load Balancing</td><td>L4 against the database, L7 against service API / HAProxy</td><td>Client-to-service HTTP load balancing is optional. Service to DB load balancing via HAProxy</td></tr>
<tr>
<td>Build</td><td>Convention over configuration</td><td>Maven 3+, JDK 17 (LTS) at source and target level</td></tr>
</tbody>
</table>
</div><h1 id="heading-design-overview">Design Overview</h1>
<p>The ledger is based on a common Spring Boot microservice stack using Spring Boot, Spring Data JDBC/JPA, Spring Hateoas, HikariCP, Flyway and more. Kafka as CDC sink is optional for driving account balance push events to the web front-end. The default is just to send the push events after each successful transaction (with an AOP after-advice).</p>
<p>There are two distinct data access implementations; JPA via Hibernate and plain JDBC. Both are included in a single self-contained executable JAR artifact with an embedded Jetty servlet container. It's possible to configure the retry strategy, data access strategy and more through Spring profiles.</p>
<p>It connects to either a CockroachDB cluster or PostgreSQL. When using PostgreSQL, some features are disabled such as follower reads and geo-partitioning.</p>
<p>The bank client issues concurrent requests towards the service API endpoint which in turn reads and writes to the database. When a transfer request is processed, the outcome is a permanent record in history (the ledger) and balance updates on the affected accounts. These balance updates are also pushed to the frontends via websocket STOMP events for visualization.</p>
<h2 id="heading-architecture-diagram">Architecture Diagram</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1677154527165/007d4e80-f96b-4cd6-afac-acc4567dc674.png" alt /></p>
<h2 id="heading-entity-model">Entity Model</h2>
<p>The system uses the following entity model for double-entry bookkeeping of monetary transaction history.</p>
<ul>
<li><p><strong>account</strong> - Accounts with a derived balance from the sum of transactions</p>
</li>
<li><p><strong>transaction</strong> - Owning entity for balanced multi-legged monetary transactions</p>
</li>
<li><p><strong>transaction_item</strong> - Association table between transaction and account representing a single leg with a running account balance</p>
</li>
<li><p><strong>region</strong> - Static information about deployment regions</p>
</li>
<li><p><strong>outbox</strong> - Optional table for showcasing outbox pattern</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678955173027/69e01307-184c-44b1-bdcc-8d174f813347.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-main-sql-files">Main SQL files</h2>
<p>Flyway is used to set up the DB schema and account plan during startup time. The schema is not geo-partitioned by default.</p>
<h2 id="heading-transaction-workflow">Transaction Workflow</h2>
<p>Each monetary transaction creates a transaction record (1) and one leg (2) for each account update and also updates the cached balance on each account (3). A CHECK constraint ensures that balances don't end up negative unless allowed for that account (using <code>allow_negative</code> column).</p>
<p>The <code>UPDATE .. FROM</code> (below) with array unnesting is a workaround for the lack of batch updates over the wire. The pgJDBC driver doesn't batch UPDATE statements, only INSERTs up to a given limit using SQL rewrites (aka multi-value inserts).</p>
<p>In the default workflow, the initial balance check on the accounts is redundant since the invariant check is done by looking at the rows affected on the final UPDATE. An UPDATE also takes an implicit lock in the reading part in CockroachDB (configurable) which will reduce retries. In summary, there are no reads involved in the default workflow (not counting the internal read that is part of each UPDATE).</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- (1) header</span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">transaction</span> (<span class="hljs-keyword">id</span>,city,balance,currency,<span class="hljs-keyword">name</span>,..);
<span class="hljs-comment">-- (2) for each leg (batch)</span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> transaction_item (city,transaction_id,..);
<span class="hljs-comment">-- (3) for each account (batch)</span>
<span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">SET</span> balance = account.balance + data_table.balance, <span class="hljs-keyword">updated</span>=clock_timestamp()
<span class="hljs-keyword">FROM</span> (<span class="hljs-keyword">select</span> <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>, <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> balance) <span class="hljs-keyword">as</span> data_table
<span class="hljs-keyword">WHERE</span> account.id=data_table.id
  <span class="hljs-keyword">AND</span> account.closed=<span class="hljs-literal">false</span>
  <span class="hljs-keyword">AND</span> (account.balance + data_table.balance) * <span class="hljs-keyword">abs</span>(account.allow_negative<span class="hljs-number">-1</span>) &gt;= <span class="hljs-number">0</span>
</code></pre>
<p>The actual CHECK constraints:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">alter</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
    <span class="hljs-keyword">add</span> <span class="hljs-keyword">constraint</span> check_account_allow_negative <span class="hljs-keyword">check</span> (allow_negative <span class="hljs-keyword">between</span> <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> <span class="hljs-number">1</span>);
<span class="hljs-keyword">alter</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">account</span>
    <span class="hljs-keyword">add</span> <span class="hljs-keyword">constraint</span> check_account_positive_balance <span class="hljs-keyword">check</span> (balance * <span class="hljs-keyword">abs</span>(allow_negative - <span class="hljs-number">1</span>) &gt;= <span class="hljs-number">0</span>);
</code></pre>
<h1 id="heading-transaction-workflow-alternative">Transaction Workflow (alternative)</h1>
<p>The default workflow yields low contention. There's an alternative workflow designed to provoke more contention to visualize the effects of transient rollback errors and retries. The alternative workflow will perform an initial balance check and optionally use select-for-update (SFU) locks. Without the SFUs, the chance of contention is high. The main benefit of this workflow is that the running balance of the accounts can be stored on the transaction legs.</p>
<p>It's enabled by starting the server with the <code>--roachbank.updateRunningBalance=true</code> option.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- (1) initial query for all involved accounts (lock is optional)</span>
<span class="hljs-keyword">SELECT</span> .. <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span> <span class="hljs-keyword">IN</span> (..) <span class="hljs-keyword">AND</span> city <span class="hljs-keyword">IN</span> (..) <span class="hljs-comment">/* FOR UPDATE */</span>;
<span class="hljs-comment">-- (2) header </span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">transaction</span> (<span class="hljs-keyword">id</span>,city,balance,currency,<span class="hljs-keyword">name</span>,..);
<span class="hljs-comment">-- (3) for each leg (notice running_balance)</span>
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> transaction_item (city,transaction_id,running_balance,..);
<span class="hljs-comment">-- (4) for each account (batch)</span>
<span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">SET</span> balance = account.balance + data_table.balance, <span class="hljs-keyword">updated</span>=clock_timestamp()
<span class="hljs-keyword">FROM</span> (<span class="hljs-keyword">select</span> <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>, <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> balance) <span class="hljs-keyword">as</span> data_table
<span class="hljs-keyword">WHERE</span> account.id=data_table.id
  <span class="hljs-keyword">AND</span> account.closed=<span class="hljs-literal">false</span>
  <span class="hljs-keyword">AND</span> (account.balance + data_table.balance) * <span class="hljs-keyword">abs</span>(account.allow_negative<span class="hljs-number">-1</span>) &gt;= <span class="hljs-number">0</span>
</code></pre>
<h2 id="heading-transaction-retry-strategy">Transaction Retry Strategy</h2>
<p>Any database running in serializable (such as CockroachDB) is exposed to transient SQL errors on contended workloads. These errors are tagged with SQL state 40001 and can be safely retried.</p>
<p>The ledger has three main strategies for performing retries:</p>
<ul>
<li><p>Client-side retries using a Spring/AspectJ AOP "around advice" with exponential backoff.</p>
</li>
<li><p>JDBC driver level retries using the CockroachDB JDBC Driver</p>
</li>
<li><p>No retries where transient SQL errors propagate to the client.</p>
</li>
</ul>
<p>The default is client-side retries.</p>
<p>For more details on pros/cons with these different retry strategies, see:</p>
<ul>
<li><p><a target="_blank" href="https://github.com/cockroachlabs-field/cockroachdb-jdbc">https://github.com/cockroachlabs-field/cockroachdb-jdbc</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/cockroachlabs-field/spring-data-cockroachdb">https://github.com/cockroachlabs-field/spring-data-cockroachdb</a></p>
</li>
</ul>
<h2 id="heading-apis">APIs</h2>
<p>The ledger provides two main interfaces:</p>
<ul>
<li><p>A hypermedia/REST API for request/response-based interactions based on Spring HATEOAS. The shell client (based on Spring Shell) interacts with the ledger through this API.</p>
</li>
<li><p>A WebSocket Streaming API for reactive front-ends, driven via CockroachDB CDC (optional) or synthetic events.</p>
</li>
</ul>
<p>The Hypermedia API is used to view data, create accounts and generate monetary transactions. A typical HTTP client follows the hyperlinks provided by the API to guide through different workflows, such as placing a monetary transaction or browsing through pages of account details. As with any REST API, following hyperlinks is optional. A client can also bind directly to the resource URI:s with tight coupling as a result. The semantics of the endpoints are tied to the link relations rather than the opaque URI:s.</p>
<ul>
<li><p>https://en.wikipedia.org/wiki/Hypertext_Application_Language</p>
</li>
<li><p>https://en.wikipedia.org/wiki/HATEOAS</p>
</li>
</ul>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Roach Bank is a financial accounting ledger demo running on CockroachDB and PostgreSQL. It uses an entity model for double-entry bookkeeping and provides two distinct data access implementations. This article discusses an alternative transaction workflow with a balance check and retry strategy for databases running in serializable mode. The retry strategy is handled via Spring/CGLIB proxies with exponential backoff.</p>
]]></content:encoded></item><item><title><![CDATA[Create a Ledger Utilizing CockroachDB - Part II - Deployment]]></title><description><![CDATA[In this second part of a series about RoachBank, a full-stack financial accounting ledger running on CockroachDB, we will look at how to deploy the bank against a global, multi-regional CockroachDB cluster.
Cloud Deployment
The ledger provides a few ...]]></description><link>https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-ii-deployment</link><guid isPermaLink="true">https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-ii-deployment</guid><category><![CDATA[distributed ledger]]></category><category><![CDATA[JDBC]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Tue, 11 Apr 2023 16:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/aFdTdWYXFd0/upload/8c33e4e301aa4f6e0c4a6b5397c0ff90.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this second part of a series about RoachBank, a full-stack financial accounting ledger running on <a target="_blank" href="https://www.cockroachlabs.com/">CockroachDB</a>, we will look at how to deploy the bank against a global, multi-regional CockroachDB cluster.</p>
<h1 id="heading-cloud-deployment">Cloud Deployment</h1>
<p>The ledger provides a few convenience scripts for deploying to AWS, GCE and Azure using an internal tool called <code>roachprod</code>. This tool is free to use but at your own risk.</p>
<p>The provided scripts will do the following:</p>
<ul>
<li><p>Provision a single-region or multi-region CockroachDB cluster</p>
</li>
<li><p>Deploy HAProxy on all client nodes</p>
</li>
<li><p>Deploy bank server and client JAR on all client nodes</p>
</li>
<li><p>Start the bank server on all client nodes</p>
</li>
<li><p>Enable regional-by-row and global tables for multi-region (if needed)</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p><a target="_blank" href="https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachprod">Roachprod</a> - a Cockroach Labs internal tool for ramping AWS/GCE/Azure VM clusters</p>
</li>
<li><p>You will need the AWS/GCE/AZ client SDK and an account.</p>
</li>
</ul>
<h1 id="heading-deployment-scripts">Deployment Scripts</h1>
<h2 id="heading-aws-deployment">AWS Deployment</h2>
<p>The <code>$basedir/deploy/aws</code> folder contains a few scripts for provisioning different cluster sizes in different regions. Let's look at the <code>aws-multiregion-eu.sh</code> which is a multi-region configuration spanning <code>eu-west-1</code>, <code>eu-west-2</code> and <code>eu-central-1</code>.</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-comment"># Script for setting up a multi-region Roach Bank cluster using roachprod in either AWS or GCE.</span>

<span class="hljs-comment"># Configuration</span>
<span class="hljs-comment">########################</span>

title=<span class="hljs-string">"CockroachDB 3-region EU deployment"</span>
<span class="hljs-comment"># CRDB release version</span>
releaseversion=<span class="hljs-string">"v22.2.5"</span>
<span class="hljs-comment"># Number of node instances in total including clients</span>
nodes=<span class="hljs-string">"12"</span>
<span class="hljs-comment"># Nodes hosting CRDB</span>
crdbnodes=<span class="hljs-string">"1-9"</span>
<span class="hljs-comment"># Array of client nodes (must match size of regions)</span>
clients=(10 11 12)
<span class="hljs-comment"># Array of regions localities (must match zone names)</span>
regions=(<span class="hljs-string">'eu-west-1'</span> <span class="hljs-string">'eu-west-2'</span> <span class="hljs-string">'eu-central-1'</span>)
<span class="hljs-comment"># AWS/GCE cloud (aws|gce)</span>
cloud=<span class="hljs-string">"aws"</span>
<span class="hljs-comment"># AWS/GCE region zones (must align with nodes count)</span>
zones=<span class="hljs-string">"\
eu-west-1a,\
eu-west-1b,\
eu-west-1c,\
eu-west-2a,\
eu-west-2b,\
eu-west-2c,\
eu-central-1a,\
eu-central-1b,\
eu-central-1c,\
eu-west-1a,\
eu-west-2a,\
eu-central-1a"</span>
<span class="hljs-comment"># AWS/GCE machine types</span>
machinetypes=<span class="hljs-string">"c5d.4xlarge"</span>

<span class="hljs-comment"># DO NOT EDIT BELOW THIS LINE</span>
<span class="hljs-comment">#############################</span>

functionsdir=<span class="hljs-string">"../common"</span>

<span class="hljs-built_in">source</span> <span class="hljs-string">"<span class="hljs-variable">${functionsdir}</span>/core_functions.sh"</span>

main.sh
</code></pre>
<p>By the end of running this script, you would have an AWS provisioned 12-node (instances) cluster, out of which the nodes 10, 11 and 12 are hosting the bank application stack including HAProxy.</p>
<p>Something like in this diagram:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678975751246/5afc9f4d-3882-40ba-819c-80132d286984.png" alt class="image--center mx-auto" /></p>
<p>The setup script is interactive and each step will ask for confirmation. It's launched by this simple command:</p>
<pre><code class="lang-bash">./aws-multiregion-eu.sh
</code></pre>
<p>After the steps are completed, you should have a page automatically opened in your default browser along with the service landing page showing account boxes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678980480695/3cedf381-de3b-4497-a13f-9203a6d81794.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-gce-deployment">GCE Deployment</h2>
<p>For GGE, the process is quite similar just using different regions and instance types.</p>
<p>For example:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> deploy/gce
chmod +x *.sh
./gce-multiregion-eu.sh
</code></pre>
<h2 id="heading-azure-deployment">Azure Deployment</h2>
<p>The same goes for Azure, however, it only contains a single region provisioning script.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> deploy/azure
chmod +x *.sh
./azure-singleregion.sh
</code></pre>
<h1 id="heading-operating-in-multi-region">Operating in Multi-Region</h1>
<p>When the ledger is deployed in a multi-regional topology (like US-EU-APAC), the accounts and transactions need to be pinned/domiciled to each region for best performance.</p>
<p>This is done by using the regional-by-row table locality in CockroachDB. There's an explicit step in the setup script than executes the SQL statements below. This will provide for low read and write latencies in each region.</p>
<p>For the AWS multi-region example:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">DATABASE</span> roach_bank PRIMARY REGION <span class="hljs-string">"eu-central-1"</span>;
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">DATABASE</span> roach_bank <span class="hljs-keyword">ADD</span> REGION <span class="hljs-string">"eu-west-1"</span>;
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">DATABASE</span> roach_bank <span class="hljs-keyword">ADD</span> REGION <span class="hljs-string">"eu-west-2"</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> region <span class="hljs-keyword">SET</span> locality <span class="hljs-keyword">GLOBAL</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> region crdb_internal_region <span class="hljs-keyword">AS</span> (
    <span class="hljs-keyword">CASE</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'dublin'</span>,<span class="hljs-string">'belfast'</span>,<span class="hljs-string">'liverpool'</span>,<span class="hljs-string">'manchester'</span>,<span class="hljs-string">'glasgow'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-1'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'london'</span>,<span class="hljs-string">'birmingham'</span>,<span class="hljs-string">'leeds'</span>,<span class="hljs-string">'amsterdam'</span>,<span class="hljs-string">'rotterdam'</span>,<span class="hljs-string">'antwerp'</span>,<span class="hljs-string">'hague'</span>,<span class="hljs-string">'ghent'</span>,<span class="hljs-string">'brussels'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-2'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'berlin'</span>,<span class="hljs-string">'hamburg'</span>,<span class="hljs-string">'munich'</span>,<span class="hljs-string">'frankfurt'</span>,<span class="hljs-string">'dusseldorf'</span>,<span class="hljs-string">'leipzig'</span>,<span class="hljs-string">'dortmund'</span>,<span class="hljs-string">'essen'</span>,<span class="hljs-string">'stuttgart'</span>,<span class="hljs-string">'stockholm'</span>,<span class="hljs-string">'copenhagen'</span>,<span class="hljs-string">'helsinki'</span>,<span class="hljs-string">'oslo'</span>,<span class="hljs-string">'riga'</span>,<span class="hljs-string">'tallinn'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">ELSE</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">END</span>
    ) <span class="hljs-keyword">STORED</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">account</span> <span class="hljs-keyword">SET</span> LOCALITY REGIONAL <span class="hljs-keyword">BY</span> <span class="hljs-keyword">ROW</span> <span class="hljs-keyword">AS</span> region;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> region crdb_internal_region <span class="hljs-keyword">AS</span> (
    <span class="hljs-keyword">CASE</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'dublin'</span>,<span class="hljs-string">'belfast'</span>,<span class="hljs-string">'liverpool'</span>,<span class="hljs-string">'manchester'</span>,<span class="hljs-string">'glasgow'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-1'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'london'</span>,<span class="hljs-string">'birmingham'</span>,<span class="hljs-string">'leeds'</span>,<span class="hljs-string">'amsterdam'</span>,<span class="hljs-string">'rotterdam'</span>,<span class="hljs-string">'antwerp'</span>,<span class="hljs-string">'hague'</span>,<span class="hljs-string">'ghent'</span>,<span class="hljs-string">'brussels'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-2'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'berlin'</span>,<span class="hljs-string">'hamburg'</span>,<span class="hljs-string">'munich'</span>,<span class="hljs-string">'frankfurt'</span>,<span class="hljs-string">'dusseldorf'</span>,<span class="hljs-string">'leipzig'</span>,<span class="hljs-string">'dortmund'</span>,<span class="hljs-string">'essen'</span>,<span class="hljs-string">'stuttgart'</span>,<span class="hljs-string">'stockholm'</span>,<span class="hljs-string">'copenhagen'</span>,<span class="hljs-string">'helsinki'</span>,<span class="hljs-string">'oslo'</span>,<span class="hljs-string">'riga'</span>,<span class="hljs-string">'tallinn'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">ELSE</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">END</span>
    ) <span class="hljs-keyword">STORED</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">transaction</span> <span class="hljs-keyword">SET</span> LOCALITY REGIONAL <span class="hljs-keyword">BY</span> <span class="hljs-keyword">ROW</span> <span class="hljs-keyword">AS</span> region;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> transaction_item <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> region crdb_internal_region <span class="hljs-keyword">AS</span> (
    <span class="hljs-keyword">CASE</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'dublin'</span>,<span class="hljs-string">'belfast'</span>,<span class="hljs-string">'liverpool'</span>,<span class="hljs-string">'manchester'</span>,<span class="hljs-string">'glasgow'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-1'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'london'</span>,<span class="hljs-string">'birmingham'</span>,<span class="hljs-string">'leeds'</span>,<span class="hljs-string">'amsterdam'</span>,<span class="hljs-string">'rotterdam'</span>,<span class="hljs-string">'antwerp'</span>,<span class="hljs-string">'hague'</span>,<span class="hljs-string">'ghent'</span>,<span class="hljs-string">'brussels'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-west-2'</span>
        <span class="hljs-keyword">WHEN</span> city <span class="hljs-keyword">IN</span> (<span class="hljs-string">'berlin'</span>,<span class="hljs-string">'hamburg'</span>,<span class="hljs-string">'munich'</span>,<span class="hljs-string">'frankfurt'</span>,<span class="hljs-string">'dusseldorf'</span>,<span class="hljs-string">'leipzig'</span>,<span class="hljs-string">'dortmund'</span>,<span class="hljs-string">'essen'</span>,<span class="hljs-string">'stuttgart'</span>,<span class="hljs-string">'stockholm'</span>,<span class="hljs-string">'copenhagen'</span>,<span class="hljs-string">'helsinki'</span>,<span class="hljs-string">'oslo'</span>,<span class="hljs-string">'riga'</span>,<span class="hljs-string">'tallinn'</span>) <span class="hljs-keyword">THEN</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">ELSE</span> <span class="hljs-string">'eu-central-1'</span>
        <span class="hljs-keyword">END</span>
    ) <span class="hljs-keyword">STORED</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> transaction_item <span class="hljs-keyword">SET</span> LOCALITY REGIONAL <span class="hljs-keyword">BY</span> <span class="hljs-keyword">ROW</span> <span class="hljs-keyword">AS</span> region;
</code></pre>
<p>When transactions are issued against accounts in these different cities, the read-and-write operations will be constrained to the home regions. For example, creating monetary transactions involving accounts in "stockholm" and "helsinki" will be serviced only by the 3 nodes in the region <code>eu-central-1</code>. Read operations will have local latency and write operations will have only one single roundtrip to the next closest region.</p>
<p>As an option to limit the amount of data and overhead of replicating cross regions, you could disable the non-voting replicas with <a target="_blank" href="https://www.cockroachlabs.com/docs/v22.2/alter-database#placement">placement restrictions</a>. This would result in no replicas placed outside of the home regions with the consequence of higher latency for follower reads in the other regions.</p>
<p>The tradeoff is regional survival. To combine both regional survival and data domiciling with regional-by-row, you can use super-regions which is covered more in this <a target="_blank" href="https://blog.cloudneutral.se/data-domiciling-using-super-regions-in-cockroachdb">post</a>.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SET</span> enable_multiregion_placement_policy=<span class="hljs-keyword">on</span>;
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">DATABASE</span> roach_bank PLACEMENT <span class="hljs-keyword">RESTRICTED</span>;
</code></pre>
<h2 id="heading-running-a-global-workload">Running a Global Workload</h2>
<p>First, SSH to the first client machine which in the AWS example is node (aka instance) 10.</p>
<pre><code class="lang-bash">roachprod run:<span class="hljs-variable">$CLUSTER</span>:10
</code></pre>
<p>Next, start the bank client and type <code>connect</code>. It should print something like this:</p>
<pre><code class="lang-apache">
                                             <span class="hljs-attribute">C</span> O C K R O A C H D B
──▄──▄────▄▀        <span class="hljs-attribute">___</span>                __     ___            __
───▀▄─█─▄▀▄▄▄      / <span class="hljs-attribute">_</span> \___  ___ _____/ /    / _ )___ ____  / /__
▄██▄████▄██▄▀█▄   / , <span class="hljs-attribute">_</span>/ _ \/ _ `/ __/ _ \  / _  / _ `/ _ \/  '_/
─▀▀─█▀█▀▄▀███▀   /<span class="hljs-attribute">_</span>/|_|\___/\_,_/\__/_//_/ /____/\_,_/_//_/_/\_\
──▄▄▀─█──▀▄▄     <span class="hljs-attribute">bank</span>-client (v<span class="hljs-number">2</span>.<span class="hljs-number">0</span>.<span class="hljs-number">1</span>.BUILD-SNAPSHOT) powered by Spring Boot (v<span class="hljs-number">3</span>.<span class="hljs-number">0</span>.<span class="hljs-number">4</span>)
                 <span class="hljs-attribute">Active</span> profiles: <span class="hljs-variable">${spring.profiles.active}</span>
<span class="hljs-attribute">15</span>:<span class="hljs-number">30</span>:<span class="hljs-number">37</span>.<span class="hljs-number">219</span>  INFO<span class="hljs-meta"> [main] [io.roach.bank.client.ClientApplication] Starting ClientApplication v2.0.1.BUILD-SNAPSHOT using Java 17.0.6 with PID 10267 (/home/ubuntu/bank-client.jar started by ubuntu in /home/ubuntu)
15:30:37.220  INFO [main] [io.roach.bank.client.ClientApplication] No active profile set, falling back to 1 default profile: "default"
15:30:38.414  INFO [main] [io.roach.bank.client.ClientApplication] Started ClientApplication in 1.564 seconds (process running for 2.095)
disconnected:$ connect
15:30:42.949  INFO [main] [io.roach.bank.client.command.Connect] Connecting to http://localhost:8090/api..
15:30:43.084  INFO [main] [io.roach.bank.client.command.Connect] Welcome to text-only Roach Bank. You are in a dark, cold lobby.
15:30:43.084  INFO [main] [io.roach.bank.client.command.Connect] Type help for commands.
localhost:$</span>
</code></pre>
<p>Next, let's run some account transfers across the cities in the local region. First, we need to verify that the local gateway region is <code>eu-west-1</code>:</p>
<pre><code class="lang-apache"><span class="hljs-attribute">localhost</span>:$ gateway-region
<span class="hljs-attribute">eu</span>-west-<span class="hljs-number">1</span>
</code></pre>
<p>Then start the transfers:</p>
<pre><code class="lang-apache"><span class="hljs-attribute">localhost</span>:$ transfer --regions eu-west-<span class="hljs-number">1</span>
</code></pre>
<p>If you look in the browser tab pointing at the regional bank service, you should see some effects on the accounts in that region. If you are not sure about the URL, use <code>roachprod ip</code>:</p>
<pre><code class="lang-apache"><span class="hljs-attribute">roachprod</span> ip $CLUSTER:<span class="hljs-number">10</span>-<span class="hljs-number">12</span> --external
</code></pre>
<p>Pick the first IP and append port 8090 and you should see:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678980981467/eb5648f9-7555-49c8-bb36-c9df82e0559e.png" alt class="image--center mx-auto" /></p>
<p><strong>Note:</strong> For simplicity, the push events that update the balances are not broadcasted across regions, so you can only see effects at a regional level.</p>
<p>The <strong>transfer</strong> command runs with a very low volume by default but it can be ramped up with more concurrent threads and a higher selection of accounts to avoid contention. The low amount range reduce the risk of ending up with a negative balance causing aborts.</p>
<pre><code class="lang-apache"><span class="hljs-attribute">transfer</span> --regions eu-west-<span class="hljs-number">1</span> --concurrency <span class="hljs-number">10</span> --limit <span class="hljs-number">1000</span> --amount <span class="hljs-number">0</span>.<span class="hljs-number">01</span>-<span class="hljs-number">0</span>.<span class="hljs-number">15</span>
</code></pre>
<p>To run a 100% read-based workload we can use the balance command. This will start then concurrent workers per city in the given region and run point lookups.</p>
<pre><code class="lang-apache"><span class="hljs-attribute">balance</span> --regions eu-west-<span class="hljs-number">1</span> --followerReads --concurrency <span class="hljs-number">10</span> --limit <span class="hljs-number">1000</span>
</code></pre>
<p>Lastly, repeat the steps above for client nodes 11 and 12 so you end up with 3 concurrent clients and servers, one pair per region.</p>
<pre><code class="lang-bash">roachprod run:<span class="hljs-variable">$CLUSTER</span>:11
..
</code></pre>
<p>Once the workloads run at full speed, you should see metrics picking up in the DB Console.</p>
<p>As we can see in the hardware dashboard, the vCPU utilization starts reaching the 50% threshold. Using these 16vCPU VMs, we get around 40K QPS at less than 2ms on P99. Keep in mind the cluster stretches across 3 regions in EU.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678982190686/30a10724-284a-4345-86e3-b8e69ddcc944.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-summary">Summary</h1>
<p>This article provides instructions on how to deploy the RoachBank accounting ledger demo on a multi-regional CockroachDB cluster, including instructions for deploying on AWS, GCE and Azure, setting up regional-by-row and global tables, and using the <code>roachprod</code> tool. Prerequisites include the AWS/GCE/AZ client SDK and an account.</p>
]]></content:encoded></item><item><title><![CDATA[Create a Ledger Utilizing CockroachDB - Part I - Introduction]]></title><description><![CDATA[This is the first part of a series about RoachBank, a full-stack financial accounting ledger demo running on both CockroachDB and PostgreSQL. It's designed to demonstrate the safety and liveness properties of a globally deployed, system-of-record typ...]]></description><link>https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-i-introduction</link><guid isPermaLink="true">https://blog.cloudneutral.se/create-a-ledger-utilizing-cockroachdb-part-i-introduction</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[Springboot]]></category><category><![CDATA[distributed ledger]]></category><category><![CDATA[accounting]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Mon, 10 Apr 2023 16:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/MyEZ0ASmJ7c/upload/7cd7ca1a0d05d05981755b67f98c149f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the first part of a series about RoachBank, a full-stack financial accounting ledger demo running on both <a target="_blank" href="https://www.cockroachlabs.com/">CockroachDB</a> and PostgreSQL. It's designed to demonstrate the safety and liveness properties of a globally deployed, system-of-record type of financial workload.</p>
<h1 id="heading-introduction">Introduction</h1>
<p>The concept behind the ledger is to move funds between monetary accounts using balanced, multi-legged transactions, at a high frequency. As a financial system, correctness is defined as conserving money at all times and providing an audit trail of monetary transactions performed towards the accounts. Put simply, when externally observing the system, the total account balance must be constant at all times. Funds are simply moved between different accounts using balanced transactions.</p>
<p>This is visualized by the service using a single page to display accounts as rectangles with their current balance.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1677677220488/19c33276-2830-4f90-92d1-4c6188621139.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-key-invariants">Key Invariants</h2>
<p>There are a few business rule invariants that must hold at all times regardless of observer and activities. Such as infrastructure failure (nodes crashing) or conflicting operations when concurrently updating the same accounts.</p>
<ul>
<li><p>The total balance of all accounts must be constant.</p>
</li>
<li><p>User accounts must have a positive balance (account types that disallow negative balance).</p>
</li>
<li><p>An audit trail of all transactions must be stored from which the account balances can be derived.</p>
</li>
</ul>
<p>The system must refuse forward progress if an operation would result in any of these invariants being compromised. For example, if a variation of the total balances is observed at any given time, then money has either been "invented" or "destroyed".</p>
<p>These invariants are safeguarded by ACID guarantees and real serializable transactions. CockroachDB defaults to only serializable while PostgreSQL defaults to read-committed but can be elevated to serializable-snapshot or SSI.</p>
<h2 id="heading-double-entry-bookkeeping">Double-entry Bookkeeping</h2>
<p>To satisfy the audit trail requirement, the ledger follows the <a target="_blank" href="https://en.wikipedia.org/wiki/Double-entry_bookkeeping">double-entry bookkeeping</a> principle. This principle was originally formalized and published by the Italian mathematician <strong>Luca Pacioli</strong> during the 15th century.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/Pacioli.jpg/220px-Pacioli.jpg" alt="Portrait of Luca Pacioli" /></p>
<p>It involves making at least two account entries for every transaction. A debit in one account and a corresponding credit in another account. The sum of all debits must equal the sum of all credits, providing a simple method for error detection. Real accounting doesn't use negative numbers, but for simplicity, this ledger does (it's not about modelling the true complexity of accounting).</p>
<p>A positive value means increasing the value (credit), and a negative value means decreasing the value (debit). A transaction is considered balanced when the sum of the legs with the same currency equals zero.</p>
<p>In the following example, there are four different accounts involved with zero-sum in the end.</p>
<pre><code class="lang-c">Account | Credit(+) | Debit(-) |
A         <span class="hljs-number">100</span>               
B                     <span class="hljs-number">-50</span>
C          <span class="hljs-number">25</span>
D*                    <span class="hljs-number">-25</span> \
                           <span class="hljs-number">-75</span> (coalesced)
D*                    <span class="hljs-number">-50</span> /
------------------------------------------
Σ         <span class="hljs-number">125</span>    +   <span class="hljs-number">-125</span> = <span class="hljs-number">0</span>
</code></pre>
<h2 id="heading-building">Building</h2>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p><strong>Java 17</strong></p>
<ul>
<li><p><a target="_blank" href="https://openjdk.org/projects/jdk/17/">https://openjdk.org/projects/jdk/17/</a></p>
</li>
<li><p><a target="_blank" href="https://www.oracle.com/java/technologies/downloads/#java17">https://www.oracle.com/java/technologies/downloads/#java17</a></p>
</li>
</ul>
</li>
<li><p><strong>Maven 3+</strong></p>
<ul>
<li><a target="_blank" href="https://maven.apache.org/">https://maven.apache.org/</a></li>
</ul>
</li>
</ul>
<p>The service is built with <a target="_blank" href="https://maven.apache.org/download.cgi">Maven 3.1+</a>. Tanuki's Maven wrapper is included (mvnw) so Maven is optional. All 3rd party dependencies are available in public Maven repositories except for the CockroachDB JDBC driver which is available in GitHub Packages (you only need a GitHub account).</p>
<p>These dependencies are available in GitHub <a target="_blank" href="https://maven.pkg.github.com/cockroachlabs-field">packages</a>:</p>
<pre><code class="lang-java">&lt;dependency&gt;
    &lt;groupId&gt;io.cockroachdb.jdbc&lt;/groupId&gt;
    &lt;artifactId&gt;cockroachdb-jdbc-driver&lt;/artifactId&gt;
    &lt;version&gt;<span class="hljs-number">1.0</span>.<span class="hljs-number">0</span>&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
    &lt;groupId&gt;io.cockroachdb&lt;/groupId&gt;
    &lt;artifactId&gt;spring-data-cockroachdb&lt;/artifactId&gt;
    &lt;version&gt;<span class="hljs-number">1.0</span>.<span class="hljs-number">0</span>&lt;/version&gt;
&lt;/dependency&gt;
</code></pre>
<p>To allow Maven to use this repository, add a <code>github</code> profile (or similar) to your Maven settings.xml file. Edit <code>$user.dir/.m2/settings.xml</code>.</p>
<p>An example is provided below:</p>
<pre><code class="lang-apache"><span class="hljs-section">&lt;?xml version=<span class="hljs-string">"1.0"</span> encoding=<span class="hljs-string">"UTF-8"</span>?&gt;</span>
<span class="hljs-section">&lt;settings xmlns=<span class="hljs-string">"http://maven.apache.org/SETTINGS/1.2.0"</span>
          xmlns:xsi=<span class="hljs-string">"http://www.w3.org/2001/XMLSchema-instance"</span>
          xsi:schemaLocation=<span class="hljs-string">"http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd"</span>&gt;</span>
  <span class="hljs-section">&lt;servers&gt;</span>
    <span class="hljs-section">&lt;server&gt;</span>
        <span class="hljs-section">&lt;id&gt;</span><span class="hljs-attribute">github</span>&lt;/id&gt;
        <span class="hljs-section">&lt;username&gt;</span><span class="hljs-attribute">your</span>-github-id&lt;/username&gt;
        <span class="hljs-section">&lt;password&gt;</span><span class="hljs-attribute">your</span>-personal-access-token&lt;/password&gt;
    <span class="hljs-section">&lt;/server&gt;</span>
  <span class="hljs-section">&lt;/servers&gt;</span>

  <span class="hljs-section">&lt;mirrors&gt;</span>
    <span class="hljs-section">&lt;!-- default setting --&gt;</span>
    <span class="hljs-section">&lt;mirror&gt;</span>
      <span class="hljs-section">&lt;id&gt;</span><span class="hljs-attribute">maven</span>-default-http-blocker&lt;/id&gt;
      <span class="hljs-section">&lt;mirrorOf&gt;</span><span class="hljs-attribute">external</span>:http:*&lt;/mirrorOf&gt;
      <span class="hljs-section">&lt;name&gt;</span><span class="hljs-attribute">Pseudo</span> repository to mirror external repositories initially using HTTP.&lt;/name&gt;
      <span class="hljs-section">&lt;url&gt;</span><span class="hljs-attribute">http</span>://<span class="hljs-number">0.0.0.0</span>/&lt;/url&gt;
      <span class="hljs-section">&lt;blocked&gt;</span><span class="hljs-attribute">true</span>&lt;/blocked&gt;
    <span class="hljs-section">&lt;/mirror&gt;</span>
  <span class="hljs-section">&lt;/mirrors&gt;</span>

  <span class="hljs-section">&lt;profiles&gt;</span>
      <span class="hljs-section">&lt;profile&gt;</span>
        <span class="hljs-section">&lt;id&gt;</span><span class="hljs-attribute">github</span>&lt;/id&gt;
        <span class="hljs-section">&lt;repositories&gt;</span>
            <span class="hljs-section">&lt;repository&gt;</span>
                <span class="hljs-section">&lt;id&gt;</span><span class="hljs-attribute">central</span>&lt;/id&gt;
                <span class="hljs-section">&lt;url&gt;</span><span class="hljs-attribute">https</span>://repo<span class="hljs-number">1</span>.maven.org/maven<span class="hljs-number">2</span>&lt;/url&gt;
            <span class="hljs-section">&lt;/repository&gt;</span>
            <span class="hljs-section">&lt;repository&gt;</span>
                <span class="hljs-section">&lt;id&gt;</span><span class="hljs-attribute">github</span>&lt;/id&gt;
                <span class="hljs-section">&lt;url&gt;</span><span class="hljs-attribute">https</span>://maven.pkg.github.com/cockroachlabs-field/*&lt;/url&gt;
                <span class="hljs-section">&lt;snapshots&gt;</span>
                    <span class="hljs-section">&lt;enabled&gt;</span><span class="hljs-attribute">true</span>&lt;/enabled&gt;
                <span class="hljs-section">&lt;/snapshots&gt;</span>
                <span class="hljs-section">&lt;releases&gt;</span>
                    <span class="hljs-section">&lt;enabled&gt;</span><span class="hljs-attribute">true</span>&lt;/enabled&gt;
                <span class="hljs-section">&lt;/releases&gt;</span>
            <span class="hljs-section">&lt;/repository&gt;</span>
        <span class="hljs-section">&lt;/repositories&gt;</span>
      <span class="hljs-section">&lt;/profile&gt;</span>
  <span class="hljs-section">&lt;/profiles&gt;</span>

  <span class="hljs-section">&lt;activeProfiles&gt;</span>
    <span class="hljs-section">&lt;activeProfile&gt;</span><span class="hljs-attribute">github</span>&lt;/activeProfile&gt;
  <span class="hljs-section">&lt;/activeProfiles&gt;</span>
<span class="hljs-section">&lt;/settings&gt;</span>
</code></pre>
<h3 id="heading-clone-the-project">Clone the project</h3>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> git@github.com:kai-niemi/roach-bank.git
</code></pre>
<h3 id="heading-build-the-executable-jars">Build the executable jars</h3>
<p>Using installed Maven:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> roach-bank 
chmod +x mvnw 
mvn clean install
</code></pre>
<p>Using the Maven wrapper (where you need to specify settings.xml):</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> roach-bank 
chmod +x mvnw 
./mvnw clean install -s &lt;path-to&gt;/settings.xml
</code></pre>
<h2 id="heading-local-deployment">Local Deployment</h2>
<blockquote>
<p>Assuming you already have a local CockrochDB cluster running.</p>
</blockquote>
<p>First, create the database:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database roach_bank"</span>
</code></pre>
<p>Then start the server:</p>
<pre><code class="lang-bash">java -jar bank-server/target/bank-server.jar
</code></pre>
<p>Then start the client:</p>
<pre><code class="lang-bash">java -jar bank-client/target/bank-client.jar
</code></pre>
<p>The client is used to issue business transactions to the server's REST API. The client and server could be on separate hosts with an L7 load balancer in-between, but for convenience, the client connects to localhost by default.</p>
<h1 id="heading-next-steps">Next Steps</h1>
<p>In the second part of this series, we'll cover how to run the bank against a multi-regional cloud deployment. The third part goes into design details and the technology stack.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>RoachBank is a full-stack financial accounting ledger demo running on both CockroachDB and PostgreSQL. It follows the double-entry bookkeeping principle and is designed to demonstrate the safety and liveness properties of a globally deployed, system-of-record type of financial workload. This article provides instructions on how to set up and run the demo, as well as details on the technology stack and design.</p>
]]></content:encoded></item><item><title><![CDATA[CockroachDB JDBC Driver: Part III - Bulk Rewrites]]></title><description><![CDATA[The CockroachDB JDBC driver wraps the PostgreSQL driver and offers performance optimizations that are transparent towards applications.
Article series on the JDBC driver:

https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver

Introduction
This...]]></description><link>https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-iii-bulk-rewrites</link><guid isPermaLink="true">https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-iii-bulk-rewrites</guid><category><![CDATA[cockroachdb]]></category><category><![CDATA[JDBC]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Sat, 08 Apr 2023 16:03:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/xguB6_aHXG4/upload/5cff2a91b60a773a844622b9c35bd84e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <a target="_blank" href="https://github.com/cloudneutral/cockroachdb-jdbc">CockroachDB JDBC</a> driver wraps the PostgreSQL driver and offers performance optimizations that are transparent towards applications.</p>
<p>Article series on the JDBC driver:</p>
<ul>
<li><a target="_blank" href="https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver">https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver</a></li>
</ul>
<h1 id="heading-introduction">Introduction</h1>
<p>This article will highlight one specific optimization feature, namely, batch DML rewrites for bulk operations using FROM clause with array unnesting. It's a mouthful, but conceptually it's a transparent rewrite at the driver level of batch DML updates like:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> product <span class="hljs-keyword">SET</span> inventory=?, price=? <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">1</span>
<span class="hljs-keyword">UPDATE</span> product <span class="hljs-keyword">SET</span> inventory=?, price=? <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">2</span>
<span class="hljs-keyword">UPDATE</span> product <span class="hljs-keyword">SET</span> inventory=?, price=? <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span>=<span class="hljs-number">3</span>
...
</code></pre>
<p>Which can be collapsed to a single update statement using FROM:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> product <span class="hljs-keyword">SET</span> inventory=data_table.new_inventory, price=data_table.new_price 
<span class="hljs-keyword">FROM</span> (<span class="hljs-keyword">select</span> <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>, <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> new_inventory, <span class="hljs-keyword">unnest</span>(?) <span class="hljs-keyword">as</span> new_price) <span class="hljs-keyword">as</span> data_table <span class="hljs-keyword">WHERE</span> product.id=data_table.id
</code></pre>
<p>This will dramatically improve the performance for bulk operations that send a series of non-aggregated UPDATEs to independent rows. It's also not limited to UPDATEs either and can also rewrite INSERT and UPSERT statements. This may also improve performance since it allows for higher batch sizes than 128 which is the soft limit in the pgJDBC driver for rewriting INSERTs.</p>
<p>Consider the following example schema:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">exists</span> product
(
    <span class="hljs-keyword">id</span>        <span class="hljs-keyword">uuid</span>           <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">default</span> gen_random_uuid(),
    inventory <span class="hljs-built_in">int</span>            <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    <span class="hljs-keyword">name</span>      <span class="hljs-built_in">varchar</span>(<span class="hljs-number">128</span>)   <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    price     <span class="hljs-built_in">numeric</span>(<span class="hljs-number">19</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
    sku       <span class="hljs-built_in">varchar</span>(<span class="hljs-number">128</span>)   <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span> <span class="hljs-keyword">unique</span>,
    primary <span class="hljs-keyword">key</span> (<span class="hljs-keyword">id</span>)
);
</code></pre>
<p>Next, let's add a few rows:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> product (inventory,<span class="hljs-keyword">name</span>,price,sku)
<span class="hljs-keyword">values</span> (<span class="hljs-number">10</span>, <span class="hljs-string">'A'</span>, <span class="hljs-number">12.50</span>, gen_random_uuid()),
       (<span class="hljs-number">10</span>, <span class="hljs-string">'B'</span>, <span class="hljs-number">13.50</span>, gen_random_uuid()),
       (<span class="hljs-number">10</span>, <span class="hljs-string">'C'</span>, <span class="hljs-number">14.50</span>, gen_random_uuid()),
       (<span class="hljs-number">10</span>, <span class="hljs-string">'D'</span>, <span class="hljs-number">15.50</span>, gen_random_uuid());

<span class="hljs-keyword">select</span> inventory,<span class="hljs-keyword">name</span>,price,sku <span class="hljs-keyword">from</span> product <span class="hljs-keyword">order</span> <span class="hljs-keyword">by</span> <span class="hljs-keyword">name</span>;
</code></pre>
<p>The listed result is equivalent to the next statement, the only difference being its temp data:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-number">10</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>]) <span class="hljs-keyword">as</span> inventory,
       <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-string">'A'</span>, <span class="hljs-string">'B'</span>, <span class="hljs-string">'C'</span>, <span class="hljs-string">'D'</span>]) <span class="hljs-keyword">as</span> <span class="hljs-keyword">name</span>,
       <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-number">12.50</span>, <span class="hljs-number">13.50</span>, <span class="hljs-number">14.50</span>, <span class="hljs-number">15.50</span>]) <span class="hljs-keyword">as</span> price,
       <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[gen_random_uuid(),
                    gen_random_uuid(),
                    gen_random_uuid(),
                    gen_random_uuid()]) <span class="hljs-keyword">as</span> sku
<span class="hljs-keyword">order</span> <span class="hljs-keyword">by</span> <span class="hljs-keyword">name</span>;
</code></pre>
<p>The <code>unnest</code> <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/functions-and-operators.html#set-returning-functions">function</a> uses arrays to generate a temporary table with each array representing a column. Now, let's flip this over to INSERT into FROM:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> product (inventory,<span class="hljs-keyword">name</span>,price,sku)
(<span class="hljs-keyword">select</span> <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-number">10</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>, <span class="hljs-number">10</span>]) <span class="hljs-keyword">as</span> inventory,
        <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-string">'A'</span>, <span class="hljs-string">'B'</span>, <span class="hljs-string">'C'</span>, <span class="hljs-string">'D'</span>]) <span class="hljs-keyword">as</span> <span class="hljs-keyword">name</span>,
        <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[<span class="hljs-number">12.50</span>, <span class="hljs-number">13.50</span>, <span class="hljs-number">14.50</span>, <span class="hljs-number">15.50</span>]) <span class="hljs-keyword">as</span> price,
        <span class="hljs-keyword">unnest</span>(<span class="hljs-built_in">ARRAY</span>[gen_random_uuid(),
                     gen_random_uuid(),
                     gen_random_uuid(),
                     gen_random_uuid()]) <span class="hljs-keyword">as</span> sku);
</code></pre>
<p>That will create four new products out of the contents of the arrays. When put into a JDBC-prepared statement context:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(
        <span class="hljs-string">"INSERT INTO product(id,inventory,price,name,sku)"</span>
                + <span class="hljs-string">" select"</span>
                + <span class="hljs-string">"  unnest(?) as id,"</span>
                + <span class="hljs-string">"  unnest(?) as inventory, unnest(?) as price,"</span>
                + <span class="hljs-string">"  unnest(?) as name, unnest(?) as sku"</span>)) {
    <span class="hljs-comment">// chunks is a segmented stream of products</span>
    chunks.forEach(chunk -&gt; {
        List&lt;Integer&gt; qty = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
        List&lt;BigDecimal&gt; price = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
        List&lt;UUID&gt; ids = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
        List&lt;String&gt; name = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
        List&lt;String&gt; sku = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();

        chunk.forEach(product -&gt; {
            ids.add(product.getId());
            qty.add(product.getInventory());
            price.add(product.getPrice());
            name.add(product.getName());
            sku.add(product.getSku());
        });

        <span class="hljs-keyword">try</span> {
            ps.setArray(<span class="hljs-number">1</span>, ps.getConnection().createArrayOf(<span class="hljs-string">"UUID"</span>, ids.toArray()));
            ps.setArray(<span class="hljs-number">2</span>, ps.getConnection().createArrayOf(<span class="hljs-string">"BIGINT"</span>, qty.toArray()));
            ps.setArray(<span class="hljs-number">3</span>, ps.getConnection().createArrayOf(<span class="hljs-string">"DECIMAL"</span>, price.toArray()));
            ps.setArray(<span class="hljs-number">4</span>, ps.getConnection().createArrayOf(<span class="hljs-string">"VARCHAR"</span>, name.toArray()));
            ps.setArray(<span class="hljs-number">5</span>, ps.getConnection().createArrayOf(<span class="hljs-string">"VARCHAR"</span>, sku.toArray()));

            ps.executeLargeUpdate();
        } <span class="hljs-keyword">catch</span> (SQLException e) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(e);
        }
    });
}
</code></pre>
<p>This is technically equivalent to using <code>addBatch(</code>) or <code>executeLargeBatch</code>() with the important difference that it requires the pgJDBC driver's <code>reWriteBatchedInserts</code> to be set to true.</p>
<p>From a performance standpoint, both approaches are equivalent up to a certain point which is a batch size of <code>128,</code> which is the <a target="_blank" href="https://github.com/pgjdbc/pgjdbc/blob/REL42.5.0/pgjdbc/src/main/java/org/postgresql/jdbc/PgPreparedStatement.java#L1726">hardcoded</a> limit in the pgJDBC driver. Using batch sizes higher than that is not possible so the array unnesting approach is more performant beyond this limit. Depending on the workload you can go up to a batch number of around 16-32K until performance starts to level out again.</p>
<p>The hardcoded limit of 128 may be appropriate for PostgreSQL but CockroachDB is not PostgreSQL (only at the wire protocol level) and can leverage much higher bulk statement sizes. Until this limit is removed or made configurable in pgJDBC, there are only two options:</p>
<ul>
<li><p>Modify the pgJDBC driver and built a custom library. This is quite straightforward but requires a separate forked version to be maintained.</p>
</li>
<li><p>Rewrite INSERTs, UPSERTs and UPDATEs with unnesting of arrays</p>
</li>
</ul>
<h1 id="heading-bulk-inserts">Bulk Inserts</h1>
<p>To recap, here's an UPSERT example that hits the pgJDBC driver 128 size limit on INSERT rewrites.</p>
<pre><code class="lang-java">List&lt;Product&gt; products = Arrays.asList(
        Product.builder().withName(<span class="hljs-string">"A"</span>).withInventory(<span class="hljs-number">1</span>).withPrice(<span class="hljs-keyword">new</span> BigDecimal(<span class="hljs-string">"10.15"</span>)).build(),
        Product.builder().withName(<span class="hljs-string">"B"</span>).withInventory(<span class="hljs-number">2</span>).withPrice(<span class="hljs-keyword">new</span> BigDecimal(<span class="hljs-string">"11.15"</span>)).build(),
        Product.builder().withName(<span class="hljs-string">"C"</span>).withInventory(<span class="hljs-number">3</span>).withPrice(<span class="hljs-keyword">new</span> BigDecimal(<span class="hljs-string">"12.15"</span>)).build()
        <span class="hljs-comment">// .. etc to several 1000s</span>
);

Stream&lt;List&lt;Product&gt;&gt; chunks = chunkedStream(products.stream(), <span class="hljs-number">128</span>);

dataSource.addDataSourceProperty(<span class="hljs-string">"reWriteBatchedInserts"</span>,<span class="hljs-keyword">true</span>);

<span class="hljs-keyword">try</span> (Connection connection = dataSource.getConnection()) {
    connection.setAutoCommit(<span class="hljs-keyword">true</span>);

    chunks.forEach(chunk -&gt; {
        <span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(
                <span class="hljs-string">"INSERT INTO product (id,inventory,price,name,sku) values (?,?,?,?,?) ON CONFLICT (id) DO NOTHING"</span>)) {
            <span class="hljs-keyword">for</span> (Product product : chunk) {
                ps.setObject(<span class="hljs-number">1</span>, product.getId());
                ps.setObject(<span class="hljs-number">2</span>, product.getInventory());
                ps.setObject(<span class="hljs-number">3</span>, product.getPrice());
                ps.setObject(<span class="hljs-number">4</span>, product.getName());
                ps.setObject(<span class="hljs-number">5</span>, product.getSku());
                ps.addBatch();
            }
            ps.executeBatch();
        } <span class="hljs-keyword">catch</span> (SQLException ex) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(ex);
        }
    });
}
</code></pre>
<p>For completeness, the <code>chunkedStream</code> method which just slices up a stream into even chunks:</p>
<pre><code class="lang-java">    <span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> &lt;T&gt; Stream&lt;List&lt;T&gt;&gt; chunkedStream(Stream&lt;T&gt; stream, <span class="hljs-keyword">int</span> chunkSize) {
        AtomicInteger idx = <span class="hljs-keyword">new</span> AtomicInteger();
        <span class="hljs-keyword">return</span> stream.collect(Collectors.groupingBy(x -&gt; idx.getAndIncrement() / chunkSize)).values().stream();
    }
</code></pre>
<p>This UPSERT statement is executed using implicit transactions and it's fairly fast with the <code>reWriteBatchedInserts</code> property. The pgJDBC rewrite feature works both for regular <code>INSERTs</code> and <code>INSERT .. ON CONFLICT</code>, aka <code>UPSERTs</code>.</p>
<h1 id="heading-bulk-updates">Bulk Updates</h1>
<p>Given the previous example, it's fair to assume UPDATEs work in the same way:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(
    <span class="hljs-string">"UPDATE product SET inventory=?, price=? WHERE id=?"</span>)) {

    chunk.forEach(product -&gt; {
        <span class="hljs-keyword">try</span> {
            ps.setInt(<span class="hljs-number">1</span>, product.getInventory());
            ps.setBigDecimal(<span class="hljs-number">2</span>, product.getPrice());
            ps.setObject(<span class="hljs-number">3</span>, product.getId());
            ps.addBatch();
        } <span class="hljs-keyword">catch</span> (SQLException e) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(e);
        }
    });
    ps.executeLargeBatch();  
} <span class="hljs-keyword">catch</span> (SQLException ex) {
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(ex);
}
</code></pre>
<p>However, there's no batching done here whatsoever at the JDBC driver level. To apply individual row UPDATEs in bulk format, you can however use arrays again:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(
        <span class="hljs-string">"UPDATE product SET inventory=data_table.new_inventory, price=data_table.new_price "</span>
                + <span class="hljs-string">"FROM (select "</span>
                + <span class="hljs-string">"unnest(?) as id, "</span>
                + <span class="hljs-string">"unnest(?) as new_inventory, "</span>
                + <span class="hljs-string">"unnest(?) as new_price) as data_table "</span>
                + <span class="hljs-string">"WHERE product.id=data_table.id"</span>)) {
    List&lt;Integer&gt; qty = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
    List&lt;BigDecimal&gt; price = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
    List&lt;UUID&gt; ids = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();

    chunk.forEach(product -&gt; {
        qty.add(product.addInventoryQuantity(<span class="hljs-number">1</span>));
        price.add(product.getPrice().add(<span class="hljs-keyword">new</span> BigDecimal(<span class="hljs-string">"1.00"</span>)));
        ids.add(product.getId());
    });

    ps.setArray(<span class="hljs-number">1</span>, ps.getConnection()
            .createArrayOf(<span class="hljs-string">"UUID"</span>, ids.toArray()));
    ps.setArray(<span class="hljs-number">2</span>, ps.getConnection()
            .createArrayOf(<span class="hljs-string">"BIGINT"</span>, qty.toArray()));
    ps.setArray(<span class="hljs-number">3</span>, ps.getConnection()
            .createArrayOf(<span class="hljs-string">"DECIMAL"</span>, price.toArray()));

    ps.executeLargeUpdate(); 
} <span class="hljs-keyword">catch</span> (SQLException e) {
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(e);
}
</code></pre>
<p>The performance improvement with this approach is monumental. The only problem is that it's rather clunky and requires code refactoring.</p>
<p>The CockroachDB JDBC driver can however rewrite bulk DML operations on behalf of applications, which makes it transparent. See the github repo <a target="_blank" href="https://github.com/cloudneutral/cockroachdb-jdbc">https://github.com/cloudneutral/cockroachdb-jdbc</a> for more details.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article discusses a feature of the CockroachDB JDBC driver to optimize batch updates with array unnesting, allowing for much larger batch sizes than the pgJDBC driver. It also considers the performance improvement of this approach and the code refactoring required. This approach can be used for both INSERTs and UPSERTs, as well as individual row UPDATEs.</p>
]]></content:encoded></item><item><title><![CDATA[Introduction to Spring Data CockroachDB]]></title><description><![CDATA[The Spring Data CockroachDB project aims to provide a familiar and consistent Spring-based programming model for CockroachDB as a SQL database.
The primary goal of the Spring Data project is to make it easier to build Spring-powered applications that...]]></description><link>https://blog.cloudneutral.se/introduction-to-spring-data-cockroachdb</link><guid isPermaLink="true">https://blog.cloudneutral.se/introduction-to-spring-data-cockroachdb</guid><category><![CDATA[Spring Data Jpa]]></category><category><![CDATA[Springboot]]></category><category><![CDATA[cockroachdb]]></category><category><![CDATA[JDBC]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Fri, 07 Apr 2023 16:10:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/koy6FlCCy5s/upload/410f4cac58072ded546a9c58f695480f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <a target="_blank" href="https://github.com/cloudneutral/spring-data-cockroachdb">Spring Data CockroachDB</a> project aims to provide a familiar and consistent Spring-based programming model for CockroachDB as a SQL database.</p>
<p>The primary goal of the <a target="_blank" href="https://projects.spring.io/spring-data">Spring Data</a> project is to make it easier to build Spring-powered applications that use new data access technologies such as relational databases, non-relational databases, map-reduce frameworks, and cloud-based data services.</p>
<p><a target="_blank" href="https://www.cockroachlabs.com/">CockroachDB</a> is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.</p>
<h1 id="heading-features">Features</h1>
<ul>
<li><p>Bundles the <a target="_blank" href="https://github.com/cloudneutral/cockroachdb-jdbc">CockroachDB JDBC driver</a></p>
</li>
<li><p>Meta-annotations for declaring:</p>
<ul>
<li><p>Retryable transactions</p>
</li>
<li><p>Read-only transactions</p>
</li>
<li><p>Strong and stale follower-reads</p>
</li>
<li><p>Custom session variables including timeouts</p>
</li>
</ul>
</li>
<li><p>AOP aspects for:</p>
<ul>
<li><p>Retrying transactions on serialization conflicts</p>
</li>
<li><p>Configuring session variables, like follower-reads</p>
</li>
</ul>
</li>
<li><p>Connection pool factory settings for HikariCP</p>
</li>
<li><p>Datasource proxy logging via TTDDYY</p>
</li>
<li><p>Simple JDBC shell client for ad-hoc queries and testing</p>
</li>
</ul>
<h1 id="heading-getting-started">Getting Started</h1>
<p>Here is a quick teaser of an application using Spring Data JPA Repositories in Java:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Repository</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">interface</span> <span class="hljs-title">AccountRepository</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">JpaRepository</span>&lt;<span class="hljs-title">Account</span>, <span class="hljs-title">UUID</span>&gt; </span>{
    <span class="hljs-function">Optional&lt;Account&gt; <span class="hljs-title">findByName</span><span class="hljs-params">(String name)</span></span>;

    <span class="hljs-meta">@Query(value = "select a.balance "
            + "from Account a "
            + "where a.id = ?1")</span>
    <span class="hljs-function">BigDecimal <span class="hljs-title">findBalanceById</span><span class="hljs-params">(UUID id)</span></span>;

    <span class="hljs-meta">@Query(value = "select a.balance "
            + "from account a AS OF SYSTEM TIME follower_read_timestamp() "
            + "where a.id = ?1", nativeQuery = true)</span>
    <span class="hljs-function">BigDecimal <span class="hljs-title">findBalanceSnapshotById</span><span class="hljs-params">(UUID id)</span></span>;

    <span class="hljs-meta">@Query(value = "select a "
            + "from Account a "
            + "where a.id in (?1)")</span>
    <span class="hljs-meta">@Lock(LockModeType.PESSIMISTIC_READ)</span>
    <span class="hljs-function">List&lt;Account&gt; <span class="hljs-title">findAllForUpdate</span><span class="hljs-params">(Set&lt;UUID&gt; ids)</span></span>;
}

<span class="hljs-meta">@Service</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AccountService</span> </span>{
    <span class="hljs-meta">@Autowired</span>
    <span class="hljs-keyword">private</span> AccountRepository accountRepository;

    <span class="hljs-meta">@NotTransactional</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Account <span class="hljs-title">create</span><span class="hljs-params">(Account account)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.save(account);
    }

    <span class="hljs-meta">@NotTransactional</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Account <span class="hljs-title">findByName</span><span class="hljs-params">(String name)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.findByName(name)
                .orElseThrow(() -&gt; <span class="hljs-keyword">new</span> ObjectRetrievalFailureException(Account.class, name));
    }

    <span class="hljs-meta">@NotTransactional</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Account <span class="hljs-title">findById</span><span class="hljs-params">(UUID id)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.findById(id).orElseThrow(() -&gt; <span class="hljs-keyword">new</span> ObjectRetrievalFailureException(Account.class, id));
    }

    <span class="hljs-meta">@TransactionBoundary</span>
    <span class="hljs-meta">@Retryable</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> Account <span class="hljs-title">update</span><span class="hljs-params">(Account account)</span> </span>{
        Account accountProxy = accountRepository.getReferenceById(account.getId());
        accountProxy.setName(account.getName());
        accountProxy.setDescription(account.getDescription());
        accountProxy.setBalance(account.getBalance());
        accountProxy.setClosed(account.isClosed());
        <span class="hljs-keyword">return</span> accountRepository.save(accountProxy);
    }

    <span class="hljs-meta">@NotTransactional</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> BigDecimal <span class="hljs-title">getBalance</span><span class="hljs-params">(UUID id)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.findBalanceById(id);
    }

    <span class="hljs-meta">@TransactionBoundary(timeTravel = @TimeTravel(mode = TimeTravelMode.FOLLOWER_READ), readOnly = true)</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> BigDecimal <span class="hljs-title">getBalanceSnapshot_Explicit</span><span class="hljs-params">(UUID id)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.findBalanceById(id);
    }

    <span class="hljs-meta">@NotTransactional</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> BigDecimal <span class="hljs-title">getBalanceSnapshot_Implicit</span><span class="hljs-params">(UUID id)</span> </span>{
        <span class="hljs-keyword">return</span> accountRepository.findBalanceSnapshotById(id);
    }

    <span class="hljs-meta">@TransactionBoundary</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">delete</span><span class="hljs-params">(UUID id)</span> </span>{
        accountRepository.deleteById(id);
    }

    <span class="hljs-meta">@TransactionBoundary</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">deleteAll</span><span class="hljs-params">()</span> </span>{
        accountRepository.deleteAll();
    }
}

<span class="hljs-meta">@Configuration</span>
<span class="hljs-meta">@EnableTransactionManagement(order = AdvisorOrder.TRANSACTION_ADVISOR)</span>
<span class="hljs-meta">@EnableJpaRepositories(basePackages = {"org.acme.bank"})</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">BankApplication</span> </span>{
    <span class="hljs-meta">@Bean</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> TransactionRetryAspect <span class="hljs-title">retryAspect</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> TransactionRetryAspect();
    }

    <span class="hljs-meta">@Bean</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> TransactionBoundaryAspect <span class="hljs-title">transactionBoundaryAspect</span><span class="hljs-params">(JdbcTemplate jdbcTemplate)</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> TransactionBoundaryAspect(jdbcTemplate);
    }
}
</code></pre>
<h2 id="heading-maven-configuration">Maven configuration</h2>
<p>Add this dependency to your <code>pom.xml</code> file:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>io.cockroachdb<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-data-cockroachdb<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>{version}<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
</code></pre>
<p>Then add the Maven repository to your <code>pom.xml</code> file (alternatively in Maven's <a target="_blank" href="https://maven.apache.org/settings.html">settings.xml</a>):</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">repository</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">id</span>&gt;</span>cockroachdb-jdbc<span class="hljs-tag">&lt;/<span class="hljs-name">id</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>Maven Packages<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">url</span>&gt;</span>https://maven.pkg.github.com/cloudneutral/cockroachdb-jdbc<span class="hljs-tag">&lt;/<span class="hljs-name">url</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">snapshots</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">enabled</span>&gt;</span>true<span class="hljs-tag">&lt;/<span class="hljs-name">enabled</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">snapshots</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">repository</span>&gt;</span>
</code></pre>
<p>Finally, you need to authenticate to GitHub Packages by creating a personal access token (classic) that includes the <code>read:packages</code> scope. For more information, see <a target="_blank" href="https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-apache-maven-registry#authenticating-to-github-packages">Authenticating to GitHub Packages</a>.</p>
<p>Add your personal access token to the servers section in your <a target="_blank" href="https://maven.apache.org/settings.html">settings.xml</a>:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">server</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">id</span>&gt;</span>github<span class="hljs-tag">&lt;/<span class="hljs-name">id</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">username</span>&gt;</span>your-github-name<span class="hljs-tag">&lt;/<span class="hljs-name">username</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">password</span>&gt;</span>your-access-token<span class="hljs-tag">&lt;/<span class="hljs-name">password</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">server</span>&gt;</span>
</code></pre>
<p>Now you should be able to build your project with the JDBC driver as a dependency:</p>
<pre><code class="lang-shell">mvn clean install
</code></pre>
<p>Alternatively, you can just clone the repository and build it locally using <code>mvn install</code>.</p>
<h2 id="heading-modules">Modules</h2>
<p>There are several modules in this project:</p>
<h3 id="heading-spring-data-cockroachdb">spring-data-cockroachdb</h3>
<p>Provides a <a target="_blank" href="https://projects.spring.io/spring-data">Spring Data</a> module for CockroachDB, bundling the CockroachDB JDBC driver, connection pooling support via <a target="_blank" href="https://github.com/brettwooldridge/HikariCP">Hikari</a> and meta-annotations and AOP aspects for client-side retry logic, as an alternative to JDBC driver level retries.</p>
<h3 id="heading-spring-data-cockroachdb-shell">spring-data-cockroachdb-shell</h3>
<p>An interactive spring shell client for ad-hoc SQL queries and CockroachDB settings and metadata introspection using the CockroachDB JDBC driver.</p>
<h3 id="heading-spring-data-cockroachdb-distribution">spring-data-cockroachdb-distribution</h3>
<p>Distribution packaging of runnable artifacts including the shell client and JDBC driver, in <code>tar.gz</code> format. Activated via Maven profile, see build section further down in this page.</p>
<h3 id="heading-spring-data-cockroachdb-it">spring-data-cockroachdb-it</h3>
<p>Integration and functional test harness. Activated via Maven profile, see build section further down in this page.</p>
<h2 id="heading-building-from-source">Building from Source</h2>
<p>Spring Data CockroachDB requires Java 17 (or later) LTS.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>JDK17+ LTS for building (OpenJDK compatible)</p>
</li>
<li><p>Maven 3+ (optional, embedded)</p>
</li>
</ul>
<p>If you want to build with the regular <code>mvn</code> command, you will need <a target="_blank" href="https://maven.apache.org/run-maven/index.html">Maven v3.x</a> or above.</p>
<p>Install the JDK (Linux):</p>
<pre><code class="lang-bash">sudo apt-get -qq install -y openjdk-17-jdk
</code></pre>
<p>Install the JDK (macOS):</p>
<pre><code class="lang-bash">brew install openjdk@17
</code></pre>
<h3 id="heading-dependencies">Dependencies</h3>
<p>This project depends on the <a target="_blank" href="https://github.com/cockroachlabs-field/cockroachdb-jdbc">CockroachDB JDBC driver</a> whose artifacts are available in <a target="_blank" href="https://github.com/orgs/cockroachlabs-field/packages?repo_name=cockroachdb-jdbc">GitHub Packages</a>.</p>
<h3 id="heading-clone-the-project">Clone the project</h3>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> git@github.com:cloudneutral/spring-data-cockroachdb.git
<span class="hljs-built_in">cd</span> spring-data-cockroachdb
</code></pre>
<h3 id="heading-build-the-project">Build the project</h3>
<pre><code class="lang-bash">chmod +x mvnw
./mvnw clean install
</code></pre>
<p>If you want to build with the regular <code>mvn</code> command, you will need <a target="_blank" href="https://maven.apache.org/run-maven/index.html">Maven v3.5.0</a> or above.</p>
<h3 id="heading-build-the-distribution">Build the distribution</h3>
<pre><code class="lang-bash">./mvnw -P distribution clean install
</code></pre>
<p>The distribution tar.gz is now found in <code>spring-data-cockroachdb-distribution/target</code>.</p>
<h2 id="heading-run-integration-tests">Run Integration Tests</h2>
<p>The integration tests will run through a series of contended workloads to exercise the retry mechanism and other JDBC driver features.</p>
<p>First, start a <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/start-a-local-cluster.html">local</a> CockroachDB node or cluster and create the database:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database spring_data_test"</span>
</code></pre>
<p>Then activate the integration test Maven profile:</p>
<pre><code class="lang-bash">./mvnw -P it clean install
</code></pre>
<p>See the <a target="_blank" href="pom.xml">pom.xml</a> file for changing the database URL and other settings (under <code>ìt</code> profile).</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The Spring Data CockroachDB project offers a Spring-based programming model for CockroachDB, a distributed SQL database. It simplifies building Spring-powered applications with new data access technologies and includes features like bundling the CockroachDB JDBC driver, meta-annotations for transactions, connection pooling support, and a shell client for ad-hoc queries. The project requires Java 17 or later and can be built using Maven.</p>
]]></content:encoded></item><item><title><![CDATA[CockroachDB JDBC Driver: Part II - Design and implementation Details]]></title><description><![CDATA[In this second article, we'll take a closer look at the design and implementation of the custom-made CockroachDB JDBC driver. See Part I for an introduction to the driver.
Article series on the JDBC driver:

https://blog.cloudneutral.se/series/cockro...]]></description><link>https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-ii-design-and-implementation-details</link><guid isPermaLink="true">https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-ii-design-and-implementation-details</guid><category><![CDATA[JDBC]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Thu, 06 Apr 2023 16:06:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/GkinCd2enIY/upload/bbf9b737637ed79523feb509da0d2084.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this second article, we'll take a closer look at the design and implementation of the custom-made CockroachDB JDBC driver. See Part I for an introduction to the driver.</p>
<p>Article series on the JDBC driver:</p>
<ul>
<li><a target="_blank" href="https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver">https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver</a></li>
</ul>
<h1 id="heading-overview">Overview</h1>
<p>The <a target="_blank" href="https://github.com/cloudneutral/cockroachdb-jdbc">CockroachDB JDBC driver</a> wraps the PostgreSQL JDBC driver (<a target="_blank" href="https://jdbc.postgresql.org/">pgjdbc</a>) which must also be on the app's classpath. There are no other dependencies besides <a target="_blank" href="https://www.slf4j.org/">SLF4J</a> for which any supported logging framework can be used.</p>
<p>It works by the JDBC Driver accepting a unique URL prefix <code>jdbc:cockroachdb</code> to separate itself from <code>jdbc:postgressql</code>. When the driver is asked to open a connection (typically by the connection pool), it passes the call forward to pgJDBC and then wraps the connection in a CockroachDB connection proxy with a custom interceptor (invocation handler). The driver delegates all calls to the underlying pgJDBC driver and does not interact directly with the database itself at any point.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678896739021/d56e3e9d-571f-45bc-bf48-2a80157ea87d.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-internal-retries">Internal Retries</h2>
<p>One of the driver features includes <strong>internal retries</strong> in contrast to application-side or client-side retries which is the common option. It works by the driver wrapping each JDBC connection, statement and result set in a dynamic proxy and interceptors capable of detecting and retrying aborted transactions, warranted that the SQL exceptions are of a qualified type. The qualifying exception types are of two main categories: <strong>Serialization Conflicts</strong> and <strong>Connection Errors</strong>.</p>
<h3 id="heading-serialization-conflicts">Serialization Conflicts</h3>
<p>The JDBC driver can optionally perform internal retries of failed transactions due to serialization conflicts denoted by the <code>40001</code> <a target="_blank" href="https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/util/PSQLState.java">state code</a>. Serialization conflict errors are safe to retry by the client, or in this case by the driver. Safe, in terms of not producing duplicate side effects since the transaction was rolled back.</p>
<p>This type of error is more likely to manifest in databases running with serializable transaction isolation (1SR), in particular for contended workloads subject to read/write and write/read conflicts.</p>
<p>There are however more limitations with driver-level retries than application-level. See the implementation section further below for more details.</p>
<h3 id="heading-connection-errors">Connection Errors</h3>
<p>The JDBC driver can also perform internal retries on connection errors denoted by any of the <code>08001, 08003, 08004, 08006, 08007, 08S01 or 57P01</code> <a target="_blank" href="https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/util/PSQLState.java">state codes</a>. Connection errors during in-flight transactions are generally safe to retry, but there is a potential for duplicate side effects if the SQL operations performed are non-idempotent like INSERTs or UPDATEs with increment operations (<code>UPDATE x set y=y-1</code>).</p>
<p>This could happen for example if a transaction commit was successful but the response back to the client was lost due to a connection failure. In that case, the result is ambiguous and the driver can't tell if the transaction was successfully committed or rolled back.</p>
<h3 id="heading-retry-implementation">Retry Implementation</h3>
<p>Transaction conflicts and connection errors can surface at read, write and commit time which means there are retry interceptors wrapped around the following JDBC API artefacts:</p>
<ul>
<li><p><code>java.sql.Connection</code></p>
<ul>
<li><p>implemented by <code>CockroachConnection</code></p>
</li>
<li><p>proxied by <code>ConnectionRetryInterceptor</code>, retries on <code>commit()</code></p>
</li>
</ul>
</li>
<li><p><code>java.sql.Statement</code></p>
<ul>
<li><p>implemented by <code>CockroachStatement</code></p>
</li>
<li><p>proxied by <code>StatementRetryInterceptor</code>, retries on write operations</p>
</li>
</ul>
</li>
<li><p><code>java.sql.PreparedStatement</code></p>
<ul>
<li><p>implemented by <code>CockroachPreparedStatement</code></p>
</li>
<li><p>proxied by <code>PreparedStatementRetryInterceptor</code>, retries on write operations</p>
</li>
</ul>
</li>
<li><p><code>java.sql.ResultSet</code></p>
<ul>
<li><p>implemented by <code>CockroachResultSet</code></p>
</li>
<li><p>proxied by <code>ResultSetRetryInterceptor</code>, retries on read operations</p>
</li>
</ul>
</li>
</ul>
<p>Retries are possible by recording most JDBC operations during an explicit transaction (autoCommit set to false). If a transaction is aborted due to a transient error it will be rolled back and the connection is closed. The recorded operations are then repeated on a new connection delegate while comparing the results against the initial transaction attempt.</p>
<p>If the results observed by the application client are in any way different (determined by SHA-256 checksums), the driver is forced to give up the retry attempt to preserve a serializable outcome towards the application, still waiting for completion.</p>
<p>To illustrate:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (Connection connection 
        = DriverManager.getConnection(<span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable"</span>) {
  <span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(<span class="hljs-string">"update table set x = ? where id = ?"</span>)) {
        ps.setObject(<span class="hljs-number">1</span>, x);
        ps.setObject(<span class="hljs-number">2</span>, y);
        ps.executeUpdate();
  }
}
</code></pre>
<p>In this example, assume the <code>executeUpdate()</code> method throws a <code>SQLException</code> with state code <code>40001</code>. This exception is caught by the retry interceptor which will roll back and close the current connection, then repeat the recorded operations on a new connection delegate and hope for a different interleaving of other concurrent operations that allow for the transaction to complete.</p>
<p>From the perspective of the application, the <code>executeUpdate()</code> operation will block until this process is either successful or considered futile, in which case a separate SQLException is thrown with the same state code.</p>
<h3 id="heading-limitations-of-driver-level-retries">Limitations of driver-level retries</h3>
<p>By contrast, when using application-level retries you would typically need to apply retry logic. Something like the following:</p>
<pre><code class="lang-java"><span class="hljs-keyword">int</span> numCalls=<span class="hljs-number">1</span>;
<span class="hljs-keyword">do</span> {
  <span class="hljs-keyword">try</span> {
      <span class="hljs-comment">// Must begin and commit/rollback transactions</span>
      <span class="hljs-keyword">return</span> businessService.someTransactionBoundaryOperation(); 
  } <span class="hljs-keyword">catch</span> (SQLException sqlException) { <span class="hljs-comment">// Catch r/w and commit time exceptions</span>
      <span class="hljs-comment">// 40001 is the only state code we are looking for in terms of safe retries</span>
      <span class="hljs-keyword">if</span> (PSQLState.SERIALIZATION_FAILURE.getState().equals(sqlException.getSQLState())) {
          <span class="hljs-comment">// handle by logging and waiting with an exponentially increasing delay</span>
      } <span class="hljs-keyword">else</span> {
          <span class="hljs-keyword">throw</span> sqlException; <span class="hljs-comment">// Some other error, re-throw instantly</span>
      }
  }
} <span class="hljs-keyword">while</span> (numCalls &lt; MAX_RETRY_ATTEMPTS);
</code></pre>
<p>This type of logic fits well into an <a target="_blank" href="https://docs.spring.io/spring-framework/docs/current/reference/html/core.html#aop">AOP aspect</a> with an <a target="_blank" href="https://docs.spring.io/spring-framework/docs/current/reference/html/core.html#aop-ataspectj-around-advice">around advice</a> (or interceptor in JavaEE), weaving in between the caller and transaction boundary (typically a <a target="_blank" href="https://en.wikipedia.org/wiki/Facade_pattern">service facade</a>, <a target="_blank" href="https://www.enterpriseintegrationpatterns.com/MessagingAdapter.html">service activator</a>, or web/API controller).</p>
<p>Application-level retries always have a higher chance of success over driver-level because the application logic is applied in each repeat cycle. For example, if you are checking for a negative account balance in the app code, then it may cancel out additional writes based on the value read when the operation is repeated. Neither the JDBC driver nor the database has any visibility to the application logic, which means that a retry attempt can only succeed if all previously observed outcomes are identical to the new ones.</p>
<p>The practical use of driver-level retries is therefore more narrow for common read/write and write/read conflicts, in which case client-side retries are the preferred approach.</p>
<h2 id="heading-implicit-select-for-update-rewrites">Implicit SELECT FOR UPDATE rewrites</h2>
<p>The JDBC driver can optionally append a <code>FOR UPDATE</code> clause to qualified SELECT statements.</p>
<p>A SELECT query qualifies for a rewrite when:</p>
<ul>
<li><p>It's not part of a read-only connection</p>
</li>
<li><p>There are no aggregate functions (max, min, avg, etc.)</p>
</li>
<li><p>There are no distinct or GROUP BY operators</p>
</li>
<li><p>There are no internal CockroachDB schema references</p>
</li>
</ul>
<p>A <code>SELECT .. FOR UPDATE</code> will lock the rows returned by a selection query such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.</p>
<p>Notice that this does not eliminate the chance of serialization conflicts (which can also be due to <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/architecture/transaction-layer.html#transaction-conflicts">time uncertainty</a>) but will greatly reduce it. Combined with driver-level retries, this can eliminate the need for app-level retry logic for some workloads.</p>
<p>The following example shows a write skew (G2-item) scenario which is prevented by CockroachDB serializable isolation:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>T1</td><td>T2</td></tr>
</thead>
<tbody>
<tr>
<td>begin;</td><td>begin;</td></tr>
<tr>
<td>select * from test where id in (1,2);</td><td></td></tr>
<tr>
<td></td><td>select * from test where id in (1,2);</td></tr>
<tr>
<td>update test set value = 11 where id = 1;</td><td>(reads 10,20)</td></tr>
<tr>
<td></td><td>update test set value = 21 where id = 2;</td></tr>
<tr>
<td>commit;</td><td></td></tr>
<tr>
<td></td><td>commit; --- "ERROR: restart transaction.."</td></tr>
</tbody>
</table>
</div><p>Running the same sequence with FOR UPDATE:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>T1</td><td>T2</td></tr>
</thead>
<tbody>
<tr>
<td>begin;</td><td>begin;</td></tr>
<tr>
<td>select * from test where id in (1,2) FOR UPDATE;</td><td></td></tr>
<tr>
<td></td><td>select * from test where id in (1,2) FOR UPDATE;</td></tr>
<tr>
<td></td><td>-- blocks on T1</td></tr>
<tr>
<td>update test set value = 11 where id = 1;</td><td></td></tr>
<tr>
<td>commit;</td><td>-- unblocked, reads 11,20</td></tr>
<tr>
<td></td><td>update test set value = 21 where id = 2;</td></tr>
<tr>
<td></td><td>commit;</td></tr>
</tbody>
</table>
</div><p>The initial read in T1 will lock the rows and T2 is forced to wait for T1 to finish. When T1 has finished with a commit, the read in T2 is reflecting the write of T1 and not that of T2 at the initial read timestamp. The T2 read is effectively pushed into the future with the desired effect of these operations resulting in a serializable transaction ordering, allowing for both to commit.</p>
<h2 id="heading-sequence-diagrams">Sequence Diagrams</h2>
<p>The driver concepts are illustrated with sequence diagrams using <a target="_blank" href="https://www.websequencediagrams.com">https://www.websequencediagrams.com</a>.</p>
<h3 id="heading-happy-path">Happy Path</h3>
<p>This diagram illustrates executing a single update with a happy outcome, equivalent to:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (Connection connection 
        = DriverManager.getConnection(<span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable"</span>) {
  <span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(<span class="hljs-string">"update table set x = ? where id = ?"</span>)) {
        ps.setObject(<span class="hljs-number">1</span>, x);
        ps.setObject(<span class="hljs-number">2</span>, y);
        ps.executeUpdate();
  }
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678897193780/a220ea2a-fcdb-492f-97b6-2a253d8dd71e.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-unhappy-path">Unhappy Path</h3>
<p>This diagram illustrates executing the same single block with an unhappy outcome, equivalent to:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678897203729/31546f36-60d2-4f56-b268-e24a09ed9d81.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This article discusses the design and implementation of a custom-made CockroachDB JDBC driver, which wraps the PostgreSQL JDBC driver and provides features such as internal retries for serialization conflicts and connection errors. The driver also uses driver-level retries and SELECT FOR UPDATE rewrites to reduce the chance of serialization conflicts in a transaction. Sequence diagrams are provided to illustrate the process.</p>
]]></content:encoded></item><item><title><![CDATA[CockroachDB JDBC Driver: Part I - A Beginner’s Guide]]></title><description><![CDATA[Introduction
This article describes the recently released open-source JDBC driver for CockroachDB. It wraps the PostgreSQL JDBC driver (pgjdbc) which in turn communicates in the PostgreSQL native network wire (v3.0) protocol with CockroachDB.
Article...]]></description><link>https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-i-a-beginners-guide</link><guid isPermaLink="true">https://blog.cloudneutral.se/cockroachdb-jdbc-driver-part-i-a-beginners-guide</guid><category><![CDATA[JDBC]]></category><category><![CDATA[cockroachdb]]></category><dc:creator><![CDATA[Kai Niemi]]></dc:creator><pubDate>Wed, 05 Apr 2023 16:00:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/AcW1ZwD-qC0/upload/f541bf03f37eb29a22463c1b84a52d7f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>This article describes the recently released <a target="_blank" href="https://github.com/cloudneutral/cockroachdb-jdbc">open-source JDBC driver</a> for <a target="_blank" href="https://www.cockroachlabs.com/">CockroachDB</a>. It wraps the PostgreSQL JDBC driver (<a target="_blank" href="https://jdbc.postgresql.org/">pgjdbc</a>) which in turn communicates in the PostgreSQL native network wire (v3.0) protocol with CockroachDB.</p>
<p>Article series on the JDBC driver:</p>
<ul>
<li><a target="_blank" href="https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver">https://blog.cloudneutral.se/series/cockroachdb-jdbc-driver</a></li>
</ul>
<h2 id="heading-features">Features</h2>
<p>This JDBC driver adds certain features on top of pgJDBC that are relevant to CockroachDB.</p>
<ul>
<li><p>Internal retries on serialization conflicts.</p>
</li>
<li><p>Internal retries on connection errors.</p>
</li>
<li><p>Rewriting qualified SQL queries to use <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/select-for-update.html">SELECT FOR UPDATE</a> to reduce serialization conflicts.</p>
</li>
<li><p>CockroachDB-specific database metadata and version info.</p>
</li>
</ul>
<p>All these features are disabled by default, which means the driver is operating in a pass-through mode delegating all JDBC API invocations to the pgJDBC driver.</p>
<p>Enabling internal retries may reduce the need for application-level retry logic and thereby enhance compatibility with 3rd-party products that don't implement any transaction retries.</p>
<p>Enabling <code>SELECT FOR UPDATE</code> rewrites may reduce serialization conflicts from appearing in the first place and thereby reduce retries to a bare minimum or none at all, at the expense of imposing locks on every read operation.</p>
<p><code>SELECT FOR UPDATE</code> rewrites can be scope to connection level where all qualified <code>SELECT</code> queries are rewritten, or to transaction level where all qualified <code>SELECT</code> within a given transaction are rewritten.</p>
<p>For more information about client-side retry logic, see also:</p>
<ul>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance-best-practices-overview.html#transaction-contention">Transaction Contention</a></p>
</li>
<li><p><a target="_blank" href="https://www.cockroachlabs.com/docs/stable/node-shutdown.html#connection-retry-loop">Connection Retry Loop</a></p>
</li>
</ul>
<h2 id="heading-getting-started">Getting Started</h2>
<p>Below is an example of creating a JDBC connection and executing a simple <code>SELECT</code> query in an implicit transaction (auto-commit):</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (Connection connection 
        = DriverManager.getConnection(<span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable"</span>) {
  <span class="hljs-keyword">try</span> (Statement statement = connection.createStatement()) {
    <span class="hljs-keyword">try</span> (ResultSet rs = statement.executeQuery(<span class="hljs-string">"select version()"</span>)) {
      <span class="hljs-keyword">if</span> (rs.next()) {
        System.out.println(rs.getString(<span class="hljs-number">1</span>));
      }
    }
  }
}
</code></pre>
<p>Next is an example of executing a <code>SELECT</code> and an <code>UPDATE</code> in an explicit transaction with <code>FOR UPDATE</code> rewrites:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (Connection connection
             = DriverManager.getConnection(<span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable"</span>)) {
    connection.setAutoCommit(<span class="hljs-keyword">false</span>);

    <span class="hljs-keyword">try</span> (Statement statement = connection.createStatement()) {
        statement.execute(<span class="hljs-string">"SET implicitSelectForUpdate = true"</span>);
    }

    <span class="hljs-comment">// Will be rewritten by the driver to include suffix "FOR UPDATE"</span>
    <span class="hljs-keyword">try</span> (PreparedStatement ps = connection.prepareStatement(<span class="hljs-string">"select balance from account where id=?"</span>)) {
        ps.setLong(<span class="hljs-number">1</span>, <span class="hljs-number">100L</span>);

        <span class="hljs-keyword">try</span> (ResultSet rs = ps.executeQuery()) {
            <span class="hljs-keyword">if</span> (rs.next()) {
                BigDecimal balance = rs.getBigDecimal(<span class="hljs-number">1</span>); <span class="hljs-comment">// check</span>
                <span class="hljs-keyword">try</span> (PreparedStatement ps2 = connection.prepareStatement(<span class="hljs-string">"update account set balance = balance + ? where id=?"</span>)) {
                    ps2.setBigDecimal(<span class="hljs-number">1</span>, <span class="hljs-keyword">new</span> BigDecimal(<span class="hljs-string">"10.50"</span>));
                    ps2.setLong(<span class="hljs-number">2</span>, <span class="hljs-number">100L</span>);
                    ps2.executeUpdate(); <span class="hljs-comment">// check</span>
                }
            }
        }
    }
    connection.commit();
}
</code></pre>
<p>Same as above where all qualified <code>SELECT</code>s are suffixed with <code>FOR UPDATE</code>:</p>
<pre><code class="lang-java"><span class="hljs-keyword">try</span> (Connection connection
             = DriverManager.getConnection(<span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable&amp;implicitSelectForUpdate=true"</span>)) {
    connection.setAutoCommit(<span class="hljs-keyword">false</span>);
    ...
    connection.commit();
}
</code></pre>
<h2 id="heading-maven-configuration">Maven configuration</h2>
<p>Add this dependency to your <code>pom.xml</code> file:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>io.cockroachdb.jdbc<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>cockroachdb-jdbc-driver<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>{version}<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
</code></pre>
<p>Then add the Maven repository to your <code>pom.xml</code> file (alternatively in Maven's <a target="_blank" href="https://maven.apache.org/settings.html">settings.xml</a>):</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">repository</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">id</span>&gt;</span>github<span class="hljs-tag">&lt;/<span class="hljs-name">id</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>Maven Packages<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">url</span>&gt;</span>https://maven.pkg.github.com/cloudneutral/cockroachdb-jdbc<span class="hljs-tag">&lt;/<span class="hljs-name">url</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">snapshots</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">enabled</span>&gt;</span>true<span class="hljs-tag">&lt;/<span class="hljs-name">enabled</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">snapshots</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">repository</span>&gt;</span>
</code></pre>
<p>You need to authenticate to GitHub Packages by creating a personal access token (classic) that includes the <code>read:packages</code> scope. For more information, see <a target="_blank" href="https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-apache-maven-registry#authenticating-to-github-packages">Authenticating to GitHub Packages</a>.</p>
<p>Add your personal access token to the servers section in your <a target="_blank" href="https://maven.apache.org/settings.html">settings.xml</a>:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">server</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">id</span>&gt;</span>github<span class="hljs-tag">&lt;/<span class="hljs-name">id</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">username</span>&gt;</span>your-github-name<span class="hljs-tag">&lt;/<span class="hljs-name">username</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">password</span>&gt;</span>your-access-token<span class="hljs-tag">&lt;/<span class="hljs-name">password</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">server</span>&gt;</span>
</code></pre>
<p>Take note that the server and repository id:s must match (it can be different than <code>github</code>). Now you should be able to build your project with the JDBC driver as a dependency:</p>
<pre><code class="lang-shell">mvn clean install
</code></pre>
<p>Alternatively, you can just clone the repository and build it locally using <code>mvn install</code>.</p>
<h2 id="heading-modules">Modules</h2>
<p>The JDBC driver project is a multi-module Maven project with the following components:</p>
<h3 id="heading-cockroachdb-jdbc-driver">cockroachdb-jdbc-driver</h3>
<p>The main library for the CockroachDB JDBC driver. This is all you need and it transitively pulls in pgJDBC and log4j as only dependencies.</p>
<h3 id="heading-cockroachdb-jdbc-it">cockroachdb-jdbc-it</h3>
<p>Integration tests and functional tests that are activated via Maven profiles.</p>
<h3 id="heading-cockroachdb-jdbc-demo">cockroachdb-jdbc-demo</h3>
<p>A standalone demo app to showcase the retry mechanism and other features.</p>
<h2 id="heading-supported-cockroachdb-and-jdk-versions">Supported CockroachDB and JDK Versions</h2>
<p>The driver is CockroachDB version agnostic and supports any version supported by the PostgreSQL JDBC driver v 42.5+ (pgwire protocol v3.0). It's built for Java 8 at the language source and target level but requires Java 17 LTS for building.</p>
<h2 id="heading-url-properties">URL Properties</h2>
<p>The driver uses the <code>jdbc:cockroachdb:</code> JDBC URL prefix and supports all PostgreSQL URL properties on top of that. To configure a data source to use this driver, you typically configure it for PostgreSQL and only change the URL prefix and the driver class name.</p>
<p>The general format for a JDBC URL for connecting to a CockroachDB server:</p>
<pre><code class="lang-apache"><span class="hljs-attribute">jdbc</span>:cockroachdb:[//host[:port]/][database][?property<span class="hljs-number">1</span>=value<span class="hljs-number">1</span>[&amp;property<span class="hljs-number">2</span>=value<span class="hljs-number">2</span>]...]
</code></pre>
<p>See <a target="_blank" href="https://github.com/pgjdbc/pgjdbc">pgjdbc</a> for all supported driver properties and the semantics.</p>
<p>In addition, this driver has the following CockroachDB-specific properties:</p>
<h3 id="heading-retrytransienterrors">retryTransientErrors</h3>
<p>(default: false)</p>
<p>The JDBC driver will automatically retry serialization failures (40001 <a target="_blank" href="https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/util/PSQLState.java">state code</a>) at read, write or commit time. This is done by keeping track of all statements and the results during a transaction, and if the transaction is aborted due to a transient 40001 error, it will roll back and retry the recorded operations on a new connection and compare the results with the initial commit attempt. If the results are different, the driver will be forced to give up the retry attempt to preserve a serializable outcome.</p>
<p>Enable this option if you want to handle aborted transactions internally in the driver, preferably combined with select-for-update locking. Leave this option disabled if you want to handle aborted transactions in your application.</p>
<h3 id="heading-retryconnectionerrors">retryConnectionErrors</h3>
<p>(default: false)</p>
<p>The CockroachDB JDBC driver will automatically retry transient connection errors with SQL state <code>08001, 08003, 08004, 08006, 08007, 08S01</code> or <code>57P01</code> at read, write or commit time.</p>
<p>Applicable only when <code>retryTransientErrors</code> is also true.</p>
<p>Disable this option if you want to handle connection errors in your own application or connection pool.</p>
<p><strong>CAUTION!</strong> Retrying on non-serializable conflict errors (i.e anything but <code>40001</code>) may produce duplicate outcomes if the SQL statements are non-idempotent. See the <a target="_blank" href="docs/DESIGN.md">design notes</a> for more details.</p>
<h3 id="heading-retrylistenerclassname">retryListenerClassName</h3>
<p>(default: <code>io.cockroachdb.jdbc.retry.LoggingRetryListener</code>)</p>
<p>Name of the class that implements <code>io.cockroachdb.jdbc.retry.RetryListener</code> to be used to receive callback events when retries occur. One instance is created for each JDBC connection.</p>
<h3 id="heading-retrystrategyclassname">retryStrategyClassName</h3>
<p>(default: <code>io.cockroachdb.jdbc.retry.ExponentialBackoffRetryStrategy</code>)</p>
<p>Name of the class that implements <code>io.cockroachdb.jdbc.retry.RetryStrategy</code> to be used when <code>retryTransientErrors</code> property is set to <code>true</code>. If this class also implements <code>io.cockroachdb.jdbc.proxy.RetryListener</code> it will receive callback events when retries happen. One instance of this class is created for each JDBC connection.</p>
<p>The default <code>ExponentialBackoffRetryStrategy</code> will use an exponentially increasing delay with jitter and a multiplier of 2 up to the limit set by <code>retryMaxBackoffTime</code>.</p>
<h3 id="heading-retrymaxattempts">retryMaxAttempts</h3>
<p>(default: 15)</p>
<p>A maximum number of retry attempts on transient failures (connection errors/serialization conflicts). If this limit is exceeded, the driver will throw a SQL exception with the same state code signalling yielding further retry attempts.</p>
<h3 id="heading-retrymaxbackofftime">retryMaxBackoffTime</h3>
<p>(default: 30s)</p>
<p>Maximum exponential backoff time in the format of a duration expression (like <code>12s</code>). The duration applies to the total time for all retry attempts at transaction level.</p>
<p>Applicable only when <code>retryTransientErrors</code> is true.</p>
<h3 id="heading-implicitselectforupdate">implicitSelectForUpdate</h3>
<p>(default: false)</p>
<p>The driver will automatically append a <code>FOR UPDATE</code> clause to all qualified <code>SELECT</code> statements within connection scope. This parameter can also be set in an explicit transaction as a session variable in which case its scope to the transaction.</p>
<p>The qualifying requirements include:</p>
<ul>
<li><p>Not used in a read-only connection</p>
</li>
<li><p>No time travel clause (<code>as of system time</code>)</p>
</li>
<li><p>No aggregate functions</p>
</li>
<li><p>No group by or distinct operators</p>
</li>
<li><p>Not referencing internal table schema</p>
</li>
</ul>
<p>A <code>SELECT .. FOR UPDATE</code> will lock the rows returned by a selection query such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows. It does not eliminate the chance of serialization conflicts but greatly reduces it.</p>
<h3 id="heading-usecockroachmetadata">useCockroachMetadata</h3>
<p>(default: false)</p>
<p>By default, the driver will use PostgreSQL JDBC driver metadata provided in <code>java.sql.DatabaseMetaData</code> rather than CockroachDB-specific metadata. While the latter is more correct, it causes incompatibilities with libraries that bind to PostgreSQL version details, such as Flyway and other tools.</p>
<h2 id="heading-logging">Logging</h2>
<p>This driver uses <a target="_blank" href="https://www.slf4j.org/">SLF4J</a> for logging which means it's agnostic to the logging framework used by the application. The JDBC driver module does not include any logging framework dependency transitively.</p>
<h2 id="heading-additional-examples">Additional Examples</h2>
<h3 id="heading-plain-java-example">Plain Java Example</h3>
<pre><code class="lang-java">Class.forName(CockroachDriver.class.getName());

<span class="hljs-keyword">try</span> (Connection connection 
        = DriverManager.getConnection("jdbc:cockroachdb://localhost:<span class="hljs-number">26257</span>/jdbc_test?sslmode=disable&amp;implicitSelectForUpdate=<span class="hljs-keyword">true</span>&amp;retryTransientErrors=<span class="hljs-keyword">true</span><span class="hljs-string">") {
  try (Statement statement = connection.createStatement()) {
    try (ResultSet rs = statement.executeQuery("</span><span class="hljs-function">select <span class="hljs-title">version</span><span class="hljs-params">()</span>")) </span>{
      <span class="hljs-keyword">if</span> (rs.next()) {
        System.out.println(rs.getString(<span class="hljs-number">1</span>));
      }
    }
  }
}
</code></pre>
<h3 id="heading-spring-boot-example">Spring Boot Example</h3>
<p>Configure the datasource in <code>src/main/resources/application.yml</code>:</p>
<pre><code class="lang-yml"><span class="hljs-attr">spring:</span>
  <span class="hljs-attr">datasource:</span>
    <span class="hljs-attr">driver-class-name:</span> <span class="hljs-string">io.cockroachdb.jdbc.CockroachDriver</span>
    <span class="hljs-attr">url:</span> <span class="hljs-string">"jdbc:cockroachdb://localhost:26257/jdbc_test?sslmode=disable&amp;application_name=MyTestAppe&amp;implicitSelectForUpdate=true&amp;retryTransientErrors=true"</span>
    <span class="hljs-attr">username:</span> <span class="hljs-string">root</span>
    <span class="hljs-attr">password:</span>
</code></pre>
<p>Optionally, configure the data source programmatically and use the <a target="_blank" href="https://github.com/jdbc-observations/datasource-proxy">TTDDYY</a> logging proxy:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Bean</span>
<span class="hljs-meta">@Primary</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> DataSource <span class="hljs-title">dataSource</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> ProxyDataSourceBuilder
            .create(hikariDataSource())
            .traceMethods()
            .logQueryBySlf4j(SLF4JLogLevel.DEBUG, <span class="hljs-string">"io.cockroachdb.jdbc"</span>)
            .asJson()
            .multiline()
            .build();
}

<span class="hljs-meta">@Bean</span>
<span class="hljs-meta">@ConfigurationProperties("spring.datasource.hikari")</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> HikariDataSource <span class="hljs-title">hikariDataSource</span><span class="hljs-params">()</span> </span>{
    HikariDataSource ds = dataSourceProperties()
            .initializeDataSourceBuilder()
            .type(HikariDataSource.class)
            .build();
    ds.setAutoCommit(<span class="hljs-keyword">false</span>);
    ds.addDataSourceProperty(PGProperty.REWRITE_BATCHED_INSERTS.getName(), "<span class="hljs-keyword">true</span>");
    ds.addDataSourceProperty(CockroachProperty.IMPLICIT_SELECT_FOR_UPDATE.getName(), "<span class="hljs-keyword">true</span>");
    ds.addDataSourceProperty(CockroachProperty.RETRY_TRANSIENT_ERRORS.getName(), "<span class="hljs-keyword">true</span>");
    ds.addDataSourceProperty(CockroachProperty.RETRY_MAX_ATTEMPTS.getName(), "<span class="hljs-number">5</span><span class="hljs-string">");
    ds.addDataSourceProperty(CockroachProperty.RETRY_MAX_BACKOFF_TIME.getName(), "</span><span class="hljs-number">10000</span><span class="hljs-string">");
    return ds;
}</span>
</code></pre>
<p>To configure <code>src/main/resources/logback-spring.xml</code> to capture all SQL statements and JDBC API calls:</p>
<pre><code class="lang-xml"><span class="hljs-meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">configuration</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">include</span> <span class="hljs-attr">resource</span>=<span class="hljs-string">"org/springframework/boot/logging/logback/defaults.xml"</span>/&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">include</span> <span class="hljs-attr">resource</span>=<span class="hljs-string">"org/springframework/boot/logging/logback/console-appender.xml"</span> /&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">logger</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"org.springframework"</span> <span class="hljs-attr">level</span>=<span class="hljs-string">"INFO"</span>/&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">logger</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"io.cockroachdb.jdbc"</span> <span class="hljs-attr">level</span>=<span class="hljs-string">"DEBUG"</span>/&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">root</span> <span class="hljs-attr">level</span>=<span class="hljs-string">"INFO"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">appender-ref</span> <span class="hljs-attr">ref</span>=<span class="hljs-string">"CONSOLE"</span>/&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">root</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">configuration</span>&gt;</span>
</code></pre>
<h2 id="heading-building">Building</h2>
<p>The CockroachDB JDBC driver requires Java 17 (or later) LTS but is cross-compiled to run on any platform for which there is a Java 8 runtime.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>JDK17+ LTS for building (OpenJDK compatible)</p>
</li>
<li><p>Maven 3+ (optional, embedded)</p>
</li>
</ul>
<p>If you want to build with the regular <code>mvn</code> command, you will need <a target="_blank" href="https://maven.apache.org/run-maven/index.html">Maven v3.x</a> or above.</p>
<p>Install the JDK (Linux):</p>
<pre><code class="lang-bash">sudo apt-get -qq install -y openjdk-17-jdk
</code></pre>
<p>Install the JDK (macOS):</p>
<pre><code class="lang-bash">brew install openjdk@17
</code></pre>
<h3 id="heading-clone-the-project">Clone the project</h3>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> git@github.com:cloudneutral/cockroachdb-jdbc.git
<span class="hljs-built_in">cd</span> cockroachdb-jdbc
</code></pre>
<h3 id="heading-build-the-project">Build the project</h3>
<pre><code class="lang-bash">chmod +x mvnw
./mvnw clean install
</code></pre>
<p>The JDBC driver jar is now found in <code>cockroachdb-jdbc-driver/target</code>.</p>
<h3 id="heading-run-integration-tests">Run Integration Tests</h3>
<p>The integration tests will run through a series of contended workloads to exercise the retry mechanism and other driver features.</p>
<p>First, start a <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/start-a-local-cluster.html">local</a> CockroachDB node or cluster.</p>
<p>Create the database:</p>
<pre><code class="lang-bash">cockroach sql --insecure --host=localhost -e <span class="hljs-string">"CREATE database jdbc_test"</span>
</code></pre>
<p>Then activate the integration test Maven profile:</p>
<pre><code class="lang-bash">./mvnw -P it -Dgroups=anomaly-test clean install
</code></pre>
<p>Test groups include:</p>
<ul>
<li><p>anomaly-test - Runs through a series of RW/WR/WW anomaly tests.</p>
</li>
<li><p>connection-retry-test - Run a test with connection retries enabled.</p>
</li>
<li><p>batch-insert-test - Batch inserts load test.</p>
</li>
<li><p>batch-update-test - Batch updates load test.</p>
</li>
</ul>
<p>See the <a target="_blank" href="pom.xml">pom.xml</a> file for changing the database URL and other settings (under <code>ìt</code> profile).</p>
<h2 id="heading-summary">Summary</h2>
<p>This article provides instructions on how to configure and build a new open-source JDBC driver for CockroachDB. It covers parameters such as <code>retryTransientErrors</code>, <code>implicitSelectForUpdate</code> to reduce transient SQL exceptions on contended workloads. It also explains the configuration of the driver, including retry strategies, URL properties, and logging settings, as well as examples of how to configure the driver in plain Java and Spring Boot.</p>
]]></content:encoded></item></channel></rss>