CodeProfiling - shibotsu/obs-clone GitHub Wiki

🧪 Code Profiling: `sendVideoFrame()`

In this section, we'll compare two versions of a function responsible for sending video frames using FFmpeg. The first one contains poor practices — highlights several performance and memory issues, while the second demonstrates clean, production-ready implementation.

❌ The original version: `sendVideoFrame()`

bool FFmpegRecorder::sendVideoFrame(unsigned char* rgbaData, int64_t timestamp) {
    if (!rgbaData || !videoCodecContext) {
        fprintf(stderr, "Invalid input for sendVideoFrameWorse\n");
        return false;
    }

    SwsContext* tempSwsContext = sws_getContext(videoCodecContext->width,
    videoCodecContext->height, AV_PIX_FMT_BGRA, videoCodecContext->width,
    videoCodecContext->height, AV_PIX_FMT_YUV420P, SWS_BICUBIC,
    nullptr, nullptr, nullptr);

    if (!tempSwsContext) {
        fprintf(stderr, "Failed to allocate SWS context in
        sendVideoFrameWorse\n");
        return false;
    }

    AVFrame* frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Failed to allocate video frame in worse version\n");
        return false;
    }

    frame->width = videoCodecContext->width;
    frame->height = videoCodecContext->height;
    frame->format = videoCodecContext->pix_fmt;

    int ret = av_frame_get_buffer(frame, 1);
    if (ret < 0) {
        char errBuf[AV_ERROR_MAX_STRING_SIZE] = { 0 };
        av_strerror(ret, errBuf, AV_ERROR_MAX_STRING_SIZE);
        fprintf(stderr, "Failed to allocate video frame buffer: %s\n",
        errBuf);
        av_frame_free(&frame);
        sws_freeContext(tempSwsContext);
        return false;
    }

    uint8_t* srcData[1] = { rgbaData };
    int srcLinesize[1] = { 4 * videoCodecContext->width };

    sws_scale(tempSwsContext, srcData, srcLinesize, 0,
        videoCodecContext->height, frame->data, frame->linesize);

    frame->pts = av_rescale_q(timestamp, {1, 1000},
    videoCodecContext->time_base);

    bool success = (encodeVideoFrame(frame) >= 0);

    av_frame_free(&frame);

    // ❌ Forgot to free tempSwsContext
    return success;
}

🔍 Why this version is problematic

Re-initializing sws_getContext on every frame
- sws_getContext is computationally expensive and meant to be reused
- High CPU usage, frame drops, and overall poor performance
Allocating AVFrame and buffer for every frame
- Repeatedly calling av_frame_alloc and av_frame_get_buffer without reuse
- Increased memory churn and slower execution
Missing sws_freeContext() on failure paths
- Memory leak if error occurs before freeing tempSwsContext
- Steady memory growth → eventual crash in long-running apps
Poor buffer alignment with av_frame_get_buffer(frame, 1)
- 1-byte alignment is insufficient for many SIMD operations
- Slower pixel processing
No call to av_frame_make_writable()
- If frames were pooled/reused, they might be read-only
- May silently fail or corrupt data
Using external timestamp directly for PTS
- timestamp might be jittery or non-monotonic
- Playback glitches, A/V desync, or encoder errors
Memory leak: forgot to sws_freeContext()
- tempSwsContext is never released
- Massive memory leak across many frames

✅ The Improved Version: `sendVideoFrame()`

bool FFmpegRecorder::sendVideoFrame(unsigned char* rgbaData, int64_t timestamp) {
    if (!rgbaData || !videoCodecContext || !swsContext) {
        fprintf(stderr, "Invalid input for sendVideoFrame\n");
        return false;
    }

    int64_t currentRecordingRelativeMs = getRecordingTimestamp();
    int64_t video_pts_val = av_rescale_q(currentRecordingRelativeMs,
        AVRational{ 1, 1000 },
        videoCodecContext->time_base);

    AVFrame* frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Failed to allocate video frame\n");
        return false;
    }

    frame->width = videoCodecContext->width;
    frame->height = videoCodecContext->height;
    frame->format = videoCodecContext->pix_fmt;

    int ret = av_frame_get_buffer(frame, 32);
    if (ret < 0) {
        char errBuf[AV_ERROR_MAX_STRING_SIZE] = { 0 };
        av_strerror(ret, errBuf, AV_ERROR_MAX_STRING_SIZE);
        fprintf(stderr, "Failed to allocate video frame buffer: %s\n", errBuf);
        av_frame_free(&frame);
        return false;
    }

    ret = av_frame_make_writable(frame);
    if (ret < 0) {
        char errBuf[AV_ERROR_MAX_STRING_SIZE] = { 0 };
        av_strerror(ret, errBuf, AV_ERROR_MAX_STRING_SIZE);
        fprintf(stderr, "Failed to make video frame writable: %s\n", errBuf);
        av_frame_free(&frame);
        return false;
    }

    uint8_t* srcData[1] = { rgbaData };
    int srcLinesize[1] = { 4 * videoCodecContext->width };

    sws_scale(swsContext, srcData, srcLinesize, 0,
        videoCodecContext->height, frame->data, frame->linesize);

    frame->pts = video_pts_val;
    videoFrameCount++;

    bool success = (encodeVideoFrame(frame) >= 0);

    av_frame_free(&frame);
    return success;
}

✅ Why this version is better

Reuses swsContext, initialized once in initialize().
Manages memory properly: allocates and frees per frame without leaks.
Uses optimal 32-byte alignment for better performance.
Calls av_frame_make_writable() to prevent undefined behavior on reused frames.
Uses consistent timestamp source (getRecordingTimestamp()), which ensures monotonicity.
Clean encoding pipeline: receives data → converts → encodes → frees.