Desktop Overview - shibotsu/obs-clone GitHub Wiki

Desktop and Audio Capture: System Overview

🧠 Purpose

This module captures both the screen and audio of a Windows system for real-time monitoring, visualization, or recording. It utilizes low-level Windows APIs to achieve efficient and precise access to system-level inputs and outputs.


🧰 Key Technologies

  • Direct3D 11
    A low-level graphics API used to interface with the GPU. We use it here not for rendering but to access the screen as a GPU texture.

  • DXGI (DirectX Graphics Infrastructure)
    DXGI provides access to outputs (monitors), adapters (graphics cards), and handles the duplication of screen content.

  • WASAPI (Windows Audio Session API)
    Provides access to real-time audio input and output streams.

  • COM (Component Object Model)
    A foundational technology in Windows that defines how objects are created, referenced, and interacted with — Direct3D and DXGI are built entirely on COM.


🧱 Component Architecture

🖥️ 1. Screen Capture (Desktop Duplication)

The capture process begins by creating a Direct3D 11 device and context using D3D11CreateDevice. These objects allow access to GPU resources and command submission.

COM Interface Querying

Once we have a D3D device, we use QueryInterface to get related COM interfaces like:

  • IDXGIDevice: Represents the DXGI version of the D3D device.
  • IDXGIAdapter: The graphics adapter (GPU).
  • IDXGIOutput: Represents a display output (monitor).
  • IDXGIOutput1: Needed to call DuplicateOutput.

Each interface is obtained by querying the previous one — this chaining is a COM-specific mechanism.

3. Creating a Duplication Session

Using IDXGIOutput1::DuplicateOutput(), we create a duplication interface that gives access to the desktop image as a GPU texture. This lets us acquire frames in near real-time.

4. Frame Acquisition and Copying

Each acquired frame is a GPU-only texture (ID3D11Texture2D). We create a CPU-readable "staging texture" and use CopyResource to copy the image, then map it to access its pixel data and convert it to a QImage.

🔊 2. Audio Capture (WASAPI)

1. Audio Client Initialization

  • Uses IMMDeviceEnumerator to select:

    • Default microphone (input)
    • Default output device (system loopback)
  • Activates each device using IAudioClient and IAudioCaptureClient.

2. Input & Output Capture

  • Both input and output clients are initialized in event-driven mode.
  • Audio buffers are accessed periodically to retrieve raw PCM data.

3. Volume Level Extraction

  • Extracts RMS (Root Mean Square) levels from audio buffers.
  • Converts volume into decibels (dB) for visualization or analysis.

3. COM: What You Need to Know

🔧 What is COM?

The Component Object Model (COM) is a Microsoft architecture for software components that can interact regardless of language or binary boundaries. All Direct3D and DXGI objects are COM objects under the hood.

Core Concepts

  • Interfaces
    COM objects are accessed through interfaces — e.g., ID3D11Device, IDXGIOutput. You never interact with the object directly, only through its interface.

  • Reference Counting
    Every COM object has a reference count. When you call QueryInterface or a creation method, the count increases. You must call .Release() when done, or you'll leak memory.

  • QueryInterface
    This is used to ask a COM object if it supports another interface (like asking a D3D11Device if it also implements IDXGIDevice).

IDXGIDevice* dxgiDevice = nullptr;
d3dDevice->QueryInterface(__uuidof(IDXGIDevice), (void**)&dxgiDevice);
  • HRESULT and Error Checking
    COM methods return HRESULT, which must be checked to ensure the call succeeded (S_OK or similar).

Lifetime management Guidelines

  • Always match each successful QueryInterface or object creation with a Release.
  • Never assume an interface stays valid if its parent is released.
  • Use smart pointers (CComPtr, ComPtr) if possible to automate cleanup (not currently used in this project, but good practice for future extensions)

🗃️ Source Files

  • AudioCapture.h/.cpp — WASAPI-based audio input/output
  • ScreenCapture.h/.cpp — Direct3D-based screen duplication
  • MainWindow.cpp (example integration with Qt GUI)

🖥️ Platform

  • OS: Windows 10+
  • Compiler: MSVC / MinGW
  • Dependencies: Qt, DirectX 11, Windows SDK