Brainstorming - AIMLOps-C4-G16/aimlops-capstone-project GitHub Wiki

Possible prototypes

  • Image captioning from Google Cloud (link)
  • Microsoft Image Captioning (link)
  • Huggingface (link)
  • Text to Image Search using standard Open AI CLIP Model + FAISS vector store (link) (link)

Architectures

  • YOLOv3 encoder + Llama 2 decoder
  • finetune existing VLM eg. Llama 3.2 vision instruct