Migration Timelines - opensearch-project/opensearch-migrations GitHub Wiki
There is no one-size-fits-most migration strategy, this guide seeks to describe possible sample scenario(s) with the goal of helping customers plan their own migration strategy and estimate costs accordingly.
15 Day Historical and Live Migration
Key phases:
- Setup, Planning, and Verification (Days 1-5)
- Historical backfill, Catchup, and Validation (Days 6-10)
- Final Validation, Traffic Switchover, and Teardown (Days 11-15)
Timeline
%%{
init: {
"gantt": {
"fontSize": 20,
"barHeight": 40,
"sectionFontSize": 24,
"leftPadding": 175
}
}
}%%
gantt
dateFormat D HH
axisFormat Day %d
todayMarker off
tickInterval 1day
section Steps
Setup and Verification : prep, 1 00, 5d
Clear Test Environment : milestone, clear, after prep, 0d
Traffic Capture : traffic_capture, after clear, 6d
Snapshot : snapshot, after clear, 1d
Scale Up Target Cluster for Backfill : backfill_scale, 6 22, 2h
Metadata Migration : metadata, after snapshot, 1h
Reindex from Snapshot : rfs, after metadata, 71h
Scale Down Target Cluster for Replay : replay_scale, after rfs, 2h
Traffic Replay: replay, after replay_scale, 46h
Traffic Switchover : milestone, switchover, after replay, 0d
Validation : validation, after snapshot, 7d
Scale Down Target Cluster : 11 00, 2h
Teardown : teardown, 14 00, 2d
Explanation of Scaling Operations
This section assumes a customer chooses to deliberatly scale their target cluster for backfill and/or replay to enable a faster and/or cheaper overall migration. In the absence of this, backfill and replay steps may take much longer (likely increasing overall cost).
This plan assumes we can replay 6 days of captured data in under 2 days in order for the source and target clusters to be in sync. Take an example of a source cluster operating at avg. 90% CPU utilization to handle reads/writes from application code, it's improbable that a target cluster with the same scale and configuration will be able to support a request throughput of at least 3x in order to catchup in the given time. The same holds for backfill for write-heavy clusters or clusters where data has accumulated for a long time period, to follow this plan, the target cluster should be scaled such that it can ingest/index all the source data in under 3 days.
-
Scale Up Target Cluster for Backfill: Occurs after metadata migration and before reindexing. The target cluster is scaled up to handle the resource-intensive reindexing process faster.
-
Scale Down Target Cluster for Replay: Once the reindexing is complete, the target cluster is scaled down to a more appropriate size for the traffic replay phase. While still provisioned higher than normal production workloads, given replayer has a >1 speedup factor.
-
Scale Down Target Cluster: After the validation phase, the target cluster is scaled down to its final operational size. This step ensures that the cluster is rightsized for normal production workloads, balancing performance needs with cost-efficiency.
Component Durations
This component duration breakdown is useful for identifying the cost of resources deployed during the migration process. It provides a clear overview of how long each component is active or retained, which directly impacts resource utilization and associated costs.
Note: Duration excludes weekends. If actual timeline extends over weekends, duration (and potentially costs) will increase.
%%{
init: {
"gantt": {
"fontSize": 20,
"barHeight": 40,
"sectionFontSize": 24,
"leftPadding": 175
}
}
}%%
gantt
dateFormat D HH
axisFormat Day %d
todayMarker off
tickInterval 1day
section Services
Core Services Runtime (15d) : active, 1 00, 15d
Capture Proxy Runtime (6d) : active, capture_active, 6 00, 6d
Capture Data Retention (4d) : after capture_active, 4d
Snapshot Runtime (1d) : active, snapshot_active, 6 00, 1d
Snapshot Retention (9d) : after snapshot_active, 9d
Reindex from Snapshot Runtime (3d) : active, historic_active, 7 01, 71h
Replayer Runtime (2d) : active, replayer_active, after historic_active, 2d
Replayer Data Retention (4d) : after replayer_active, 4d
Target Proxy Runtime (4d) : active, after replayer_active, 4d
Component | Duration |
---|---|
Core Services Runtime | 15d |
Capture Proxy Runtime | 6d |
Capture Data Retention | 4d |
Snapshot Runtime | 1d |
Snapshot Retention | 9d |
Reindex from Snapshot Runtime | 3d |
Replayer Runtime | 2d |
Replayer Data Retention | 4d |
Target Proxy Runtime | 4d |