Effective Strategies for Planning and Executing Data Migration Projects

Lift-and-shift vs Strangler for migrating both data and services

·

6 min read

Introduction

In this article, we'll look at two common strategies for both data and service migrations: Lift and Shift vs Strangler. Which one works best depends on many factors often guided by business requirements.

Some typical migration requirements and objectives include:

  • Zero planned downtime

    • The migration process must not require any planned downtime windows that would disrupt production traffic.
  • Option to pause and resume

    • Migration processes can run for a long time and cannot interfere with other priority tasks.

    • Team focus can shift and migration must be put on hold

  • Option to go back and abort up to a given point

    • Things fail so having the option to revert and cancel the process until passing a confidence level may be needed.
  • Ability to test and verify up to a given point

    • To reach a confidence level that things will go accordingly to plan.

Migrations projects are complicated and involve risk assessments and mitigation steps. What could go wrong, in which steps, what's the business impact and what are the actions to take if it happens?

Risks can include:

  • Data loss, data corruption or duplication

  • Performance degradation

  • Reduced feature velocity during migration

Risk mitigations include:

  • Well-defined goals and measurable success

  • Rollback option until the point of no return

  • Repeatability of migration steps

  • Testability of migration steps

  • Observability of the process

All these requirements and goals guide towards either a lift and shift approach or a strangler approach.

  • Lift and Shift

    • Take a snapshot of the old system and load it into the new system

    • Planned downtime is needed

  • Strangler

    • Gradually migrate data and business components until the old system is drained and decommissioned

    • Very limited downtime if any

Let's break these two approaches down into pros/cons.

Lift and Shift

This approach could be composed of the following rough phases:

Phases

  1. Take a snapshot of the primary DB, bulk load to replacement DB

  2. Write to both primary and replacement DB via the change stream

  3. Switch all reads to the replacement DB but keep writing to both

  4. Switch writes to the replacement DB, turn off the change stream

    • At this point, only rolling forward is possible.
  5. Decommission primary

Pros

  1. Reduced migration project timeline (a bit simpler)

  2. Good tooling available (dump/export/import)

  3. One-off, completed over a short period

Cons

  1. Downtime required at snapshot/load time (step 1)

  2. Less control of the pace (has to be completed)

  3. Higher risk of things going wrong

Strangler

The strangler fig metaphor was originally coined by Martin Fowler. It reflects the strangler fig tree which has seeds of branches that descend to the ground and eventually, these branches root in the soil and give birth to new trees while the old one is strangled to death and left to decay.

The parallel in software is to have the new system initially supported by and wrapping the existing system, gradually taking over.

The stranger approach is a typical architecture pattern for larger system rewrites as well as migrations. Sometimes these efforts include migrated stored procedure logic in the database to be refactored and moved to the application tier.

Example scenarios:

  • One monolithic system to another (refactoring/redesign)

  • One monolithic system decomposed into multiple microservices (rewrite)

  • Externalizing functionality to foreign systems

  • Migrating data and mechanisms, such as stored procedures

This strangler approach to data migration can be outlined in the following phases:

Phases

  1. Route traffic for migrated data to a replacement DB through a proxy/gateway

  2. Initiate a per-customer or market migration through a change feed trigger

  3. Channel back to primary DB via change feed, signalling completion

  4. Eventually decommission primary

Pros

  1. No planned downtime windows needed

  2. Reduced risk by more control of the pace

Cons

  1. More complicated, more components

  2. Takes a longer time to complete

There's much more in the details of course but one important distinction to lift-and-shift is that there are two separate instances of the service running. The new one can also be implemented using a different more modern tech stack while still preserving all external contracts. The gateway mechanism can be external or embedded into both components to reduce network traffic.

Application Migrations

Often both the application codebase and the database need to be migrated simultaneously. For the application tier, there are a few different approaches with different systems and business impacts.

  • Redesign

    • Create a new project that includes all key features and alters external properties.
  • Rewrite

    • Major refactoring and new features at the same time without altering existing external properties.
  • Refactor

    • Improve a software system's internal structure without altering external properties (mainly quality attributes or non-functional requirements).

Summary

Method

Description

System Impact

Business Impact

Redesign

Complete redesign and implementation

Internal and external properties change. The system is not in an operational state.

New features are paused.

Rewrite

Reimplementation of existing functionality

Mainly internal properties change. The system is not in an operational state.

Larger features are paused.

Refactor

Larger and smaller incremental improvements

Mainly internal properties change. System in an operational state.

Allows for new features.

Strangler Principles

How do we go about strangling a system? The first step is to identify an isolated part of the system. The next one is to implement that in a new service while improving/evolving it. It’s still not used or available for traffic which allows off-the-side incremental development of this section without interfering with the primary system. The last step is to redirect the calls to the new service while leaving the old one in place since it's not worth the effort of decommissioning. This works quite well if the functional areas are well-isolated.

Functionality is however often entangled, so when moving one piece of functionality it may bring these dependencies with it. To avoid that, the moved functionality can make use of downstream functionality in the old system through an API. That way, the yet-to-be-migrated functionality is partly used while maintaining a controlled and incremental approach to moving things over.

Strangling Stored Procedures

Applying the strangler approach to stored procedures follows the same architectural pattern. The business logic is rewritten in a higher-level language in the application tier of the new service.

Diagram showing the combination of both application refactoring/rewrites:

Conclusion

In this article, we looked at two classical approaches for data and service migration projects. One is lift-and-shift, where things are more or less copied over with some planned downtime. The other is a strangler approach where systems run for a longer period in parallel while the old one is gradually strangled by moving data and functions to the new platform. Both approaches have pros/cons which need to be put in a business context to make sense.