AI - kamack38/Essentials GitHub Wiki

AI Tools

Upscaling

Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.

AlphaGeometry

This article describes how to set it up inside a docker container

Pdf

OCRmyPDF - adds an OCR text layer to scanned PDF files, allowing them to be searched

OCR

Frog - Intuitive text extraction tool for GNOME. Can extract text from any image, video, QR Code and etc.

Deepfake

facefusion - Industry leading face manipulation platform with docker support
Deep-Live-Cam - real time face swap and one-click video deepfake with only a single image

Speech to Text

transcribe-anything - Multi-backend whisper app

Subtitles

subsai - Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants

docker pull absadiki/subsai:main
docker run --gpus=all -p 8501:8501 -v /path/to/your/media_files/folder:/media_files absadiki/subsai:main

Text to speech

TTS - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Zonos - Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.
bark - Text-Prompted Generative Audio Model