Long context training recipes - allenai/OLMo-core GitHub Wiki
Model settings
Settings to avoid OOMs when scaling up the context length (CL).
Context parallelism
TODO...
Tensor parallelism
TODO...
Dataset settings
You'll to set/change the following options:
- (required) Set
dataset.sequence_length
to your target CL. - (required) Set
train_module.max_sequence_length
to your target CL. - (optional) Set
dataset.max_target_sequence_length
to your longest CL you intend to train with on a given dataset. This will ensure token-by-token data order is consistent for different CLs and allow you to potentially restart a run in the middle of an epoch while changing the CL. Note that this option isn't supported by all numpy dataset types. - (optional) If you're training with in-loop evaluator callbacks you should also the their sequence length appropriately, e.g.
--trainer.callbacks.lm_evaluator.eval_dataset.sequence_length=16384
.
Dataset types
Default FSL dataset (concat + chunk)
TODO...
Document packing (best-fit packing)
TODO...
Document interleaving
TODO...