FFmpegOutputHandler - shibotsu/obs-clone GitHub Wiki

FFmpegRecorder

FFmpegRecorder is a C++ class designed to handle real-time video and audio recording using the FFmpeg library. It supports encoding video (e.g., H.264) and audio (e.g., AAC) and writing them to a file or streaming endpoint.

This recorder handles frame timing, resampling, color format conversion, and interleaving streams into a muxed output file.

✅ Features

Video encoding to H.264 (YUV420P)
Audio encoding to AAC (FLT to target format)
Automatic resampling and pixel format conversion
Output to files (.mp4, .mkv, etc.) or network streams (e.g., rtmp://)
Time synchronization and timestamping
FIFO buffering for smoother audio input
Adjustable parameters (resolution, FPS, sample rate, etc.)

🔧 Initialization

bool initialize(const char\* outputFile, int width, int height, int fps, int sampleRate, int channels);

This method configures the output context, selects appropriate encoders based on file/stream destination, and sets up:

Video and audio codec contexts
Resampling (SwrContext) and pixel format conversion (SwsContext)
FIFO buffer for audio
File or stream I/O
FFmpeg muxing headers

Arguments:

outputFile: Target file path or stream URL
width, height: Video resolution
fps: Video frame rate
sampleRate: Audio sample rate (e.g., 44100)
channels: Number of audio channels

🟢 Recording

bool sendVideoFrame(unsigned char\* rgbaData, int64_t timestamp);

Converts incoming RGBA video data to YUV420P and sends it for encoding.

Timestamps are converted to stream time_base using FFmpeg scaling.
Allocates and prepares an AVFrame for encoding.
Encoded frames are written to the output stream.

⏱ Timestamping

int64_t getRecordingTimestamp();

Returns the number of milliseconds since recording started using a high-resolution clock. Used to maintain sync between audio and video.

🎵 Audio Flow

The functions are designed to:

Accept interleaved AV_SAMPLE_FMT_FLT samples
Use SwrContext to resample to codec-compatible format
Accumulate samples in a FIFO until a full frame (e.g., 1024 samples) can be encoded

Key Members (Audio):

m_aacFrameSize: Frame size for AAC
m_audioFifo: Sample buffer to align input to required frame size
m_audioSamplesProcessed, m_totalAudioSamplesQueued: Stats
m_audioTimingInitialized: Ensures sync logic starts from a stable point

🧠 Internal Design Notes

Stream Format Decision:

Chooses container and codec formats based on outputFile (e.g., flv for RTMP, guessed for .mp4)
Encoder Hints: Preset: "medium"
CRF: "23" for good quality and compression

Robust Logging:

Uses av_strerror to provide clear FFmpeg error messages

🛑 Dependencies

You need FFmpeg development libraries installed:

libavformat
libavcodec
libswscale
libswresample
libavutil