Desktop Overview - shibotsu/obs-clone GitHub Wiki
Desktop and Audio Capture: System Overview
🧠 Purpose
This module captures both the screen and audio of a Windows system for real-time monitoring, visualization, or recording. It utilizes low-level Windows APIs to achieve efficient and precise access to system-level inputs and outputs.
🧰 Key Technologies
-
Direct3D 11
A low-level graphics API used to interface with the GPU. We use it here not for rendering but to access the screen as a GPU texture. -
DXGI (DirectX Graphics Infrastructure)
DXGI provides access to outputs (monitors), adapters (graphics cards), and handles the duplication of screen content. -
WASAPI (Windows Audio Session API)
Provides access to real-time audio input and output streams. -
COM (Component Object Model)
A foundational technology in Windows that defines how objects are created, referenced, and interacted with — Direct3D and DXGI are built entirely on COM.
🧱 Component Architecture
🖥️ 1. Screen Capture (Desktop Duplication)
The capture process begins by creating a Direct3D 11 device and context using D3D11CreateDevice
. These objects allow access to GPU resources and command submission.
COM Interface Querying
Once we have a D3D device, we use QueryInterface
to get related COM interfaces like:
IDXGIDevice
: Represents the DXGI version of the D3D device.IDXGIAdapter
: The graphics adapter (GPU).IDXGIOutput
: Represents a display output (monitor).IDXGIOutput1
: Needed to callDuplicateOutput
.
Each interface is obtained by querying the previous one — this chaining is a COM-specific mechanism.
3. Creating a Duplication Session
Using IDXGIOutput1::DuplicateOutput()
, we create a duplication interface that gives access to the desktop image as a GPU texture. This lets us acquire frames in near real-time.
4. Frame Acquisition and Copying
Each acquired frame is a GPU-only texture (ID3D11Texture2D
). We create a CPU-readable "staging texture" and use CopyResource
to copy the image, then map it to access its pixel data and convert it to a QImage
.
🔊 2. Audio Capture (WASAPI)
1. Audio Client Initialization
-
Uses
IMMDeviceEnumerator
to select:- Default microphone (input)
- Default output device (system loopback)
-
Activates each device using
IAudioClient
andIAudioCaptureClient
.
2. Input & Output Capture
- Both input and output clients are initialized in event-driven mode.
- Audio buffers are accessed periodically to retrieve raw PCM data.
3. Volume Level Extraction
- Extracts RMS (Root Mean Square) levels from audio buffers.
- Converts volume into decibels (dB) for visualization or analysis.
3. COM: What You Need to Know
🔧 What is COM?
The Component Object Model (COM) is a Microsoft architecture for software components that can interact regardless of language or binary boundaries. All Direct3D and DXGI objects are COM objects under the hood.
Core Concepts
-
Interfaces
COM objects are accessed through interfaces — e.g.,ID3D11Device
,IDXGIOutput
. You never interact with the object directly, only through its interface. -
Reference Counting
Every COM object has a reference count. When you callQueryInterface
or a creation method, the count increases. You must call.Release()
when done, or you'll leak memory. -
QueryInterface
This is used to ask a COM object if it supports another interface (like asking aD3D11Device
if it also implementsIDXGIDevice
).
IDXGIDevice* dxgiDevice = nullptr;
d3dDevice->QueryInterface(__uuidof(IDXGIDevice), (void**)&dxgiDevice);
- HRESULT and Error Checking
COM methods returnHRESULT
, which must be checked to ensure the call succeeded (S_OK
or similar).
Lifetime management Guidelines
- Always match each successful
QueryInterface
or object creation with aRelease
. - Never assume an interface stays valid if its parent is released.
- Use smart pointers (
CComPtr
,ComPtr
) if possible to automate cleanup (not currently used in this project, but good practice for future extensions)
🗃️ Source Files
- AudioCapture.h/.cpp — WASAPI-based audio input/output
- ScreenCapture.h/.cpp — Direct3D-based screen duplication
- MainWindow.cpp (example integration with Qt GUI)
🖥️ Platform
- OS: Windows 10+
- Compiler: MSVC / MinGW
- Dependencies: Qt, DirectX 11, Windows SDK