EleutherAI - AshokBhat/ml GitHub Wiki

About

  • A grass-roots non-profit AI research group
  • Considered an open-source version of OpenAI
  • Formed in July 2020 to organize a replication of GPT-3

Pile

  • A curated dataset of diverse text for training LLMs.
  • 825 GiB English text corpus
  • From 22 different sources.

GPT-J

  • 6B parameter
  • Open source
  • English autoregressive language model
  • Trained on the Pile.
  • Released on Jun 2021.
  • Part of MLPerf Inference benchmark.

See also