Long context training recipes - allenai/OLMo-core GitHub Wiki

Model settings

Settings to avoid OOMs when scaling up the context length (CL).

Context parallelism

TODO...

Tensor parallelism

TODO...

Dataset settings

You'll to set/change the following options:

  1. (required) Set dataset.sequence_length to your target CL.
  2. (required) Set train_module.max_sequence_length to your target CL.
  3. (optional) Set dataset.max_target_sequence_length to your longest CL you intend to train with on a given dataset. This will ensure token-by-token data order is consistent for different CLs and allow you to potentially restart a run in the middle of an epoch while changing the CL. Note that this option isn't supported by all numpy dataset types.
  4. (optional) If you're training with in-loop evaluator callbacks you should also the their sequence length appropriately, e.g. --trainer.callbacks.lm_evaluator.eval_dataset.sequence_length=16384.

Dataset types

Default FSL dataset (concat + chunk)

TODO...

Document packing (best-fit packing)

TODO...

Document interleaving

TODO...