GPT J - AshokBhat/ml GitHub Wiki

About

  • A 6 Billion Parameter English Autoregressive Language Model
  • Trained on the Pile.
  • Released on Jun 2021.
  • Part of MLPerf Inference benchmark.
  • By EleutherAI

Characteristics

Model Component Value
Number of Layers 28
Model Dimension 4096
Feedforward Dimension 16384
Number of Heads 16
Head Dimension 256
Rotary Position Embedding (RoPE) 64 per head
Tokenization Vocabulary 50257
BPEs Same as GPT-2/GPT-3

Training details

  • Trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod

See also