BERT - AshokBhat/ml GitHub Wiki

About

Bidirectional Encoder Representations from Transformers (BERT)
for Natural Language Processing (NLP) pre-training
Open-sourced in 2018 by Google
Uses self-attention mechanism to learn the relationship between different words in a sentence

Quote

'BERT is a substantial breakthrough and has helped researchers and data engineers across the industry achieve state-of-art results in many NLP tasks' - AWS blog

MobileBERT

Lightweight version of BERT designed for mobile devices
Fewer parameters, faster inference
Lower accuracy compared to BERT
Ideal for mobile applications where efficiency is crucial

DistilBERT

Aspect	BERT	DistilBERT
Model Size	Larger (e.g., BERT-base has 110M params)	Smaller (e.g., around 66M params)
Training Process	Pre-training + Fine-tuning	Pre-training + Distillation
Performance	High performance on NLP tasks	Slightly lower performance compared to BERT
Inference Speed	Slower due to larger model size	Faster due to smaller model size

FAQ

What is BERT useful for?
Why is it so quoted and important?
How is DistilBERT different than BERT?
Is DistilBERT better than BERT?

See more

Google Research BERT Repo - https://github.com/google-research/bert
GPT