In this article, we'll look at two common strategies for both data and service migrations: Lift and Shift vs Strangler. Which one works best depends on many factors often guided by business requirements.
Some typical migration requirements and objectives include:
Zero planned downtime
- The migration process must not require any planned downtime windows that would disrupt production traffic.
Option to pause and resume
Migration processes can run for a long time and cannot interfere with other priority tasks.
Team focus can shift and migration must be put on hold
Option to go back and abort up to a given point
- Things fail so having the option to revert and cancel the process until passing a confidence level may be needed.
Ability to test and verify up to a given point
- To reach a confidence level that things will go accordingly to plan.
Migrations projects are complicated and involve risk assessments and mitigation steps. What could go wrong, in which steps, what's the business impact and what are the actions to take if it happens?
Risks can include:
Data loss, data corruption or duplication
Reduced feature velocity during migration
Risk mitigations include:
Well-defined goals and measurable success
Rollback option until the point of no return
Repeatability of migration steps
Testability of migration steps
Observability of the process
All these requirements and goals guide towards either a lift and shift approach or a strangler approach.
Lift and Shift
Take a snapshot of the old system and load it into the new system
Planned downtime is needed
Gradually migrate data and business components until the old system is drained and decommissioned
Very limited downtime if any
Let's break these two approaches down into pros/cons.
Lift and Shift
This approach could be composed of the following rough phases:
Take a snapshot of the primary DB, bulk load to replacement DB
Write to both primary and replacement DB via the change stream
Switch all reads to the replacement DB but keep writing to both
Switch writes to the replacement DB, turn off the change stream
- At this point, only rolling forward is possible.
Reduced migration project timeline (a bit simpler)
Good tooling available (dump/export/import)
One-off, completed over a short period
Downtime required at snapshot/load time (step 1)
Less control of the pace (has to be completed)
Higher risk of things going wrong
The strangler fig metaphor was originally coined by Martin Fowler. It reflects the strangler fig tree which has seeds of branches that descend to the ground and eventually, these branches root in the soil and give birth to new trees while the old one is strangled to death and left to decay.
The parallel in software is to have the new system initially supported by and wrapping the existing system, gradually taking over.
The stranger approach is a typical architecture pattern for larger system rewrites as well as migrations. Sometimes these efforts include migrated stored procedure logic in the database to be refactored and moved to the application tier.
One monolithic system to another (refactoring/redesign)
One monolithic system decomposed into multiple microservices (rewrite)
Externalizing functionality to foreign systems
Migrating data and mechanisms, such as stored procedures
This strangler approach to data migration can be outlined in the following phases:
Route traffic for migrated data to a replacement DB through a proxy/gateway
Initiate a per-customer or market migration through a change feed trigger
Channel back to primary DB via change feed, signalling completion
Eventually decommission primary
No planned downtime windows needed
Reduced risk by more control of the pace
More complicated, more components
Takes a longer time to complete
There's much more in the details of course but one important distinction to lift-and-shift is that there are two separate instances of the service running. The new one can also be implemented using a different more modern tech stack while still preserving all external contracts. The gateway mechanism can be external or embedded into both components to reduce network traffic.
Often both the application codebase and the database need to be migrated simultaneously. For the application tier, there are a few different approaches with different systems and business impacts.
- Create a new project that includes all key features and alters external properties.
- Major refactoring and new features at the same time without altering existing external properties.
- Improve a software system's internal structure without altering external properties (mainly quality attributes or non-functional requirements).
Complete redesign and implementation
Internal and external properties change. The system is not in an operational state.
New features are paused.
Reimplementation of existing functionality
Mainly internal properties change. The system is not in an operational state.
Larger features are paused.
Larger and smaller incremental improvements
Mainly internal properties change. System in an operational state.
Allows for new features.
How do we go about strangling a system? The first step is to identify an isolated part of the system. The next one is to implement that in a new service while improving/evolving it. It’s still not used or available for traffic which allows off-the-side incremental development of this section without interfering with the primary system. The last step is to redirect the calls to the new service while leaving the old one in place since it's not worth the effort of decommissioning. This works quite well if the functional areas are well-isolated.
Functionality is however often entangled, so when moving one piece of functionality it may bring these dependencies with it. To avoid that, the moved functionality can make use of downstream functionality in the old system through an API. That way, the yet-to-be-migrated functionality is partly used while maintaining a controlled and incremental approach to moving things over.
Strangling Stored Procedures
Applying the strangler approach to stored procedures follows the same architectural pattern. The business logic is rewritten in a higher-level language in the application tier of the new service.
Diagram showing the combination of both application refactoring/rewrites:
In this article, we looked at two classical approaches for data and service migration projects. One is lift-and-shift, where things are more or less copied over with some planned downtime. The other is a strangler approach where systems run for a longer period in parallel while the old one is gradually strangled by moving data and functions to the new platform. Both approaches have pros/cons which need to be put in a business context to make sense.