Clone Plugin Background and Internals - laurynas-biveinis/mysql-5.6 GitHub Wiki

The current version is at https://github.com/facebook/mysql-5.6/wiki/Clone-Plugin-Background-and-Internals

User manual

Worklogs:

User-facing terms used throughout the worklogs, code, and this Wiki:

  • Donor: the MySQL server instance that is being cloned
  • Client: the MySQL server instance that is cloning from a donor
  • Local clone: both the donor and the client are located on the same machine
  • Remote clone: the donor and the client are on different network-connected machines
  • In-place clone: the client clone instance is discarding its current data to be replaced by the cloned data

Clone API

Concepts

  • Locator: a non-persistent ID of the data snapshot being cloned or applied, which is specific to each storage engine. It may optionally contain the completed application state, which is used for remote clone resumes after intermittent network errors.
  • Task: the work of sending clone data (on the donor) or applying it (on the client) that is performed by a single thread. A clone operation may have multiple parallel threads.
  • Task ID: an ID associated with a task. The main clone thread has a constant ID zero, and the additional spawned worker threads have unique non-zero IDs. The main thread 0 ID may become non-unique in the case of clone resuming after a network error, where both the old still-connected main thread and the newly-connecting main thread have the same ID.

enum Ha_clone_type

Sometimes a single flag and sometimes a bitset of:

  • HA_CLONE_BLOCKING, HA_CLONE_REDO, HA_CLONE_PAGE, HA_CLONE_HYBRID: Supposed to indicate, in an InnoDB-centric way, how much the clone operation can block in a storage engine. Only one of these is supposed to be set. In practice as of 8.0.28 HA_CLONE_REDO, HA_CLONE_PAGE are unused, and the clone type is always set to HA_CLONE_HYBRID on the donor and HA_CLONE_BLOCKING on the client.
  • HA_CLONE_MULTI_TASK: Supposed to indicate that the storage engine can clone using multiple concurrent threads. In practice as of 8.0.28 the code assumes it is always set and ignores it otherwise.
  • HA_CLONE_RESTART: Supposed to indicate that the storage engine supports remote clone resume after a network error. In practice as of 8.0.28 the code assumes it is always set and ignores it otherwise.

enum Ha_clone_mode

For both donor and client, indicates the type of the operation being started. One of:

  • HA_CLONE_MODE_START: Start a new clone session.
  • HA_CLONE_MODE_RESTART: Restart an existing clone session after a recoverable error (i.e. an intermittent network failure)
  • HA_CLONE_MODE_ADD_TASK: A new thread has been started for the ongoing clone session and is joining it.
  • HA_CLONE_MODE_VERSION: Used only on the client, instructs the client to prepare a version locator that is used for clone version negotiation with the donor.

Clone_interface_t

A storage engine wishing to support clone must implement the handlerton API defined in Clone_interface_t struct. Here we discuss Oracle MySQL API, without the MyRocks clone extensions. TODO: link to them

Donor & Client

  • using Clone_capability_t = void (*)(Ha_clone_flagset &flags): return the bitset of clone capabilities this storage engine supports. As of 8.0.28, all SEs must support everything. Failing to provide a capability will assert in a debug build and will be ignored otherwise.

Donor

  • using Clone_begin_t = int (*)(handlerton *hton, THD *thd, const uchar *&loc, uint &loc_len, uint &task_id, Ha_clone_type type, Ha_clone_mode mode): start, or resume, or attach to, the clone session.
  • using Clone_copy_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, Ha_clone_cbk *cbk): send the clone data through the provided callbacks.
  • using Clone_ack_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, int in_err, Ha_clone_cbk *cbk): acknowledge three types of events coming from clients: 1) completed client side application for a particular clone stage and move to the next one; 2) successful application of memory buffer data (as opposed to file data); 3) clone application errors.
  • using Clone_end_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, int in_err): finish the clone session. If the session had multiple threads attached, it is called for each one.

Client

  • using Clone_apply_begin_t = int (*)(handlerton *hton, THD *thd, const uchar *&loc, uint &loc_len, uint &task_id, Ha_clone_mode mode, const char *data_dir): start the clone session, or attach to a clone session, or get the version negotiation locator.
  • using Clone_apply_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, int in_err, Ha_clone_cbk *cbk): apply the next received chunk of data from the donor.
  • using Clone_apply_end_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, int in_err): finish the clone session. If the session had multiple threads attached, it is called for each one.

Clone of Replication Coordinates

Cloning handles replication coordinates as follows:

  • Positional: each transaction commit updates the transaction system header in InnoDB; the clone enables ordered commit (if not already enabled) and waits for any unordered transactions to commit. At the end of InnoDB redo log copy XA operations are blocked, resulting in the binary log position being consistent with the redo log.
  • GTID: not discussed in depth at the moment. It is copied consistently with the last committed transaction without any locking.

After the clone finishes, both positional and GTID replication coordinates are available in the performance_schema.clone_status table. The clone plugin on the clone side uses this table to set up replication, including setting gtid_executed. Note that the clone_status table does not have information about storage engine log positions.

InnoDB Clone Internals

This is far from a complete InnoDB clone internals description. Only discussing enough for the cross-engine synchronization.

InnoDB clone concepts:

  • InnoDB Snapshot. At any time multiple snapshots can be active.
  • InnoDB Clone. At any time multiple clones can be active and attached to a single snapshot, but this is not a currently used feature.

The clone operation in InnoDB proceeds in stages:

  • file copy: the tablespace files are copied. While it is in progress, a separate thread tracks all the flushed page IDs.
  • page copy: the flushed pages from the previous stage are copied. While it is in progress, the redo log from the last checkpoint LSN is being archived. The last archived LSN is the LSN of the cloned instance. A check at MTR commit guards overwriting unarchived logs (log0write.c::log_writer_wait_on_archiver).
  • redo log copy: the archived redo log is copied.

innodb_clone-2

Concurrent DDL during clone support seems to consist mostly of filesystem-level operation tracking.