BERT - AshokBhat/ml GitHub Wiki
About
- Bidirectional Encoder Representations from Transformers (BERT)
- for Natural Language Processing (NLP) pre-training
- Open-sourced in 2018 by Google
- Uses self-attention mechanism to learn the relationship between different words in a sentence
Quote
'BERT is a substantial breakthrough and has helped researchers and data engineers across the industry achieve state-of-art results in many NLP tasks' - AWS blog
MobileBERT
- Lightweight version of BERT designed for mobile devices
- Fewer parameters, faster inference
- Lower accuracy compared to BERT
- Ideal for mobile applications where efficiency is crucial
DistilBERT
Aspect | BERT | DistilBERT |
---|---|---|
Model Size | Larger (e.g., BERT-base has 110M params) | Smaller (e.g., around 66M params) |
Training Process | Pre-training + Fine-tuning | Pre-training + Distillation |
Performance | High performance on NLP tasks | Slightly lower performance compared to BERT |
Inference Speed | Slower due to larger model size | Faster due to smaller model size |
FAQ
- What is BERT useful for?
- Why is it so quoted and important?
- How is DistilBERT different than BERT?
- Is DistilBERT better than BERT?
See more
- Google Research BERT Repo - https://github.com/google-research/bert
- GPT