Long context training recipes - allenai/OLMo-core GitHub Wiki

Model settings

Settings to avoid OOMs when scaling up the context length (CL).

TODO...

TODO...

You'll to set/change the following options:

(required) Set dataset.sequence_length to your target CL.
(required) Set train_module.max_sequence_length to your target CL.
(optional) Set dataset.max_target_sequence_length to your longest CL you intend to train with on a given dataset. This will ensure token-by-token data order is consistent for different CLs and allow you to potentially restart a run in the middle of an epoch while changing the CL. Note that this option isn't supported by all numpy dataset types.
(optional) If you're training with in-loop evaluator callbacks you should also the their sequence length appropriately, e.g. --trainer.callbacks.lm_evaluator.eval_dataset.sequence_length=16384.

TODO...

TODO...

TODO...