User Manual - LamEmil/Audio-Reactive-Video-Animator GitHub Wiki
Audio Reactive Video Animator - User Manual
Welcome to the Audio Reactive Video Animator! This guide will walk you through the features and functionalities of the application, helping you create stunning video animations that respond to your audio.
1. Introduction
The Audio Reactive Video Animator is a tool designed for artists, musicians, and content creators who want to add dynamic, audio-synchronized visuals to their projects. By analyzing an audio file, the application can modulate various visual effects on an input image or video, such as optical flow, zoom, brightness, blur, saturation, and color shifts.
This manual assumes you have successfully installed the application and its dependencies as outlined in the README.md
file.
2. Getting Started
2.1. Launching the Application
- Windows: Double-click the
run.bat
file located in the project directory. - macOS/Linux (and Windows via command line):
- Open your terminal or command prompt.
- Navigate to the project directory.
- Activate your Python virtual environment (e.g.,
source venv/bin/activate
or.\venv\Scripts\activate
). - Run the command:
python main_gui.py
2.2. The Main Window
Upon launching, you'll see the main application window, which is divided into two primary sections:
- Controls Panel (Left): This is where you'll load your files and adjust all the parameters for your animation.
- Preview Panel (Right): This area displays a preview of your input image/video and will play the generated video.
At the bottom of the Controls Panel, you'll find:
- Progress Bar: Shows the progress of video generation.
- Status Label: Displays messages about the application's current state or actions.
3. Controls Panel - In Detail
The Controls Panel is organized into an input/output section and several tabs for detailed parameter adjustments.
3.1. Input / Output & FFMPEG
This top section allows you to manage your project files.
-
Input Image/Video:
- Field: Shows the path to your selected image or video.
- Browse Button: Click to open a file dialog and select your source visual.
- Supported image formats: PNG, JPG, JPEG, BMP, GIF.
- Supported video formats: MP4, AVI, MOV, MKV, WEBM, WMV, FLV.
- Tip: If using a video as input, the animation effects will be applied to its first frame as the base, and the video's original frame rate and duration can influence the output unless overridden. The current version primarily uses the first frame of an input video as the base for generating new animated frames.
- A preview of the selected image or the first frame of the video will appear in the "Input Image Preview" area on the right.
-
Audio File (Optional):
- Field: Shows the path to your selected audio file.
- Browse Button: Click to select an audio file.
- Supported formats: WAV, MP3, FLAC.
- Crucial: This file is essential for all audio-reactive effects. If no audio file is provided, effects that depend on audio analysis (like peak-triggered flow or breathing effects) will either not work or might produce static results.
-
Output Video:
- Field: Specify the path and filename for your generated video. It defaults to
animated_output.mp4
in the directory where the application is run. - Browse Button: Click to choose a different save location and filename. The output is always an MP4 file.
- Field: Specify the path and filename for your generated video. It defaults to
-
FFMPEG Path (Optional):
- Field: Path to your FFMPEG executable.
- Browse Button: Click to locate your
ffmpeg.exe
(Windows) orffmpeg
(macOS/Linux) file. - Explanation: FFMPEG is required to combine the generated video frames with your audio track into the final MP4 file.
- If
ffmpeg
is already in your system's PATH environment variable, you can leave this field asffmpeg
. - If not, or if you want to use a specific FFMPEG version, you must provide the full path here.
- Refer to the
README.md
for FFMPEG installation guidance.
- If
3.2. Parameter Tabs
These tabs contain all the settings to customize your animation.
3.2.1. General Settings Tab
Basic settings for your output video.
-
FPS (Frames Per Second):
- Range: 1 to 300. Default: 24.
- What it does: Determines the frame rate of your output video. Higher FPS means smoother animation but more frames to generate (longer processing time).
- Best use: 24-30 FPS is standard for video. Higher values can be used for specific artistic effects or if you plan to slow down the footage later.
-
Seed (-1 for random):
- Default: -1.
- What it does: Controls the starting point for any random number generation used in the effects (if applicable in future effect additions or for specific initializations). Using the same seed with the same parameters will produce the same output if randomness is involved.
- Best use: Set to -1 for a different result each time (if randomness is used). Use a specific positive integer if you want to reproduce a particular random outcome.
-
Override Frames (0=auto):
- Range: 0 to 999999. Default: 0.
- What it does: Allows you to set a specific number of frames for the output video.
- If set to
0
(auto), the video duration is determined by the length of the audio file. - If set to a positive number, the video will have exactly that many frames, potentially truncating or looping audio analysis data to fit.
- If set to
- Best use: Use
0
for most cases. Set a specific number if you need a fixed-length video loop or a specific duration independent of the audio length.
3.2.2. Breathing Effects Tab
These effects create continuous, often subtle, modulations based on the overall energy (RMS - Root Mean Square) of the audio, creating a "breathing" or "pulsing" feel.
-
Enable Breathing Effects (Checkbox):
- Default: Checked.
- What it does: Master switch for all effects in this tab. Uncheck to disable all breathing modulations.
-
Breathing General Controls:
- Smoothing Window (s):
- Range: 0.0 to 10.0 seconds. Default: 0.5.
- What it does: Defines the time window (in seconds) over which the audio's RMS energy is averaged. A larger window results in a smoother, slower "breathing" envelope. A smaller window makes the effect more responsive to quick changes in audio energy.
- Best use: 0.2s to 1.0s is a good starting range. Experiment to match the tempo and feel of your audio.
- RMS Target Percentile:
- Range: 1 to 100. Default: 75.
- What it does: The audio's RMS energy is normalized based on this percentile of its loudest parts. A lower percentile makes the effect more sensitive to quieter parts of the audio (the "breath" will be stronger even in softer sections). A higher percentile means only louder sections will significantly drive the effect.
- Best use: 70-90 is a good range. Adjust based on your audio's dynamic range.
- Smoothing Window (s):
-
Breathing Zoom:
- Enable Zoom (Checkbox): Default: Checked. Toggles the breathing zoom effect.
- Max Zoom Factor:
- Range: 0.0 to 2.0. Default: 0.05.
- What it does: Determines the maximum amount of additional zoom applied when the breath intensity is at its peak (1.0). A value of 0.05 means up to 5% additional zoom. The zoom is centered.
- Best use: Small values (0.01 to 0.1) for subtle pulsing. Larger values for more dramatic effects.
-
Breathing Brightness:
- Enable Brightness (Checkbox): Default: Unchecked. Toggles the breathing brightness effect.
- Min Brightness:
- Range: 0.0 to 2.0. Default: 0.9.
- What it does: Brightness multiplier when breath intensity is at minimum (0.0). 1.0 is original brightness.
- Max Brightness:
- Range: 0.0 to 2.0. Default: 1.1.
- What it does: Brightness multiplier when breath intensity is at maximum (1.0).
- Best use: Use values around 1.0 (e.g., 0.8 to 1.2) for subtle pulsing. Extreme values can lead to fully black or white images.
-
Breathing Blur:
- Enable Blur (Checkbox): Default: Unchecked. Toggles the breathing blur effect.
- Max Blur Radius:
- Range: 0.0 to 10.0. Default: 1.0.
- What it does: Maximum Gaussian blur radius (in pixels) applied when breath intensity is at maximum.
- Best use: Small values (0.5 to 2.0) for subtle pulsing focus. Higher values create a more pronounced blur.
-
Breathing Saturation:
- Enable Saturation (Checkbox): Default: Unchecked. Toggles the breathing saturation effect.
- Min Saturation:
- Range: 0.0 to 2.0. Default: 0.8.
- What it does: Color saturation multiplier at minimum breath intensity. 0.0 is grayscale, 1.0 is original.
- Max Saturation:
- Range: 0.0 to 2.0. Default: 1.2.
- What it does: Saturation multiplier at maximum breath intensity.
- Best use: Values around 1.0 (e.g., 0.7 to 1.3) for dynamic color vibrancy.
-
Breathing Color Shift:
- Enable Color Shift (Checkbox): Default: Unchecked. Toggles the breathing color shift effect.
- Min Hue Shift:
- Range: 0.0 to 1.0. Default: 0.0.
- What it does: Hue shift amount (as a fraction of the 360-degree color wheel) at minimum breath intensity.
- Max Hue Shift:
- Range: 0.0 to 1.0. Default: 0.15.
- What it does: Hue shift amount at maximum breath intensity.
- Best use: Small values (e.g., 0.0 to 0.2) for subtle color cycling. Larger values create more dramatic shifts.
3.2.3. Peak/Flow Effects Tab
These effects are typically more sudden and are triggered by distinct peaks (transients) in the audio, like drum beats. The primary effect here is optical flow, which creates a warping or motion effect.
-
Enable Peak/Optical Flow Effects (Checkbox):
- Default: Checked.
- What it does: Master switch for all effects in this tab. Uncheck to disable peak-driven optical flow.
-
Audio Peak Analysis (for Flow):
- Peak Threshold Multiplier:
- Range: 0.0 to 5.0. Default: 0.8.
- What it does: Controls sensitivity to audio peaks. Lower values detect more (quieter) peaks. Higher values detect only louder peaks. This is a multiplier for the standard deviation above the mean onset strength.
- Best use: 0.5 to 1.5 is a common range. Adjust based on how many transients in your audio you want to trigger effects.
- Peak Hold Frames:
- Range: 0 to 300. Default: 12.
- What it does: The duration (in frames) that a peak-triggered effect will last (attack, sustain, decay).
- Best use: Depends on FPS and desired effect length. For 24 FPS, 12 frames is 0.5 seconds.
- Alternate Direction Every N Peaks:
- Range: 0 to 100. Default: 1.
- What it does: If greater than 0, the direction of the peak flow strength (e.g., zoom in vs. zoom out) will alternate after this many detected peaks. If 0, it will not alternate based on this count (sticking to the initial direction or a fixed pattern).
- Best use:
1
makes every peak alternate direction.2
makes every other peak alternate, etc.
- Peak Threshold Multiplier:
-
Peak Effect Envelope (for Flow): Defines the shape of the effect's strength over its
Peak Hold Frames
duration.- Attack Ratio:
- Range: 0.0 to 1.0. Default: 0.2.
- What it does: Proportion of
Peak Hold Frames
for the effect to ramp up to full strength. (e.g., 0.2 means 20% of the hold time is attack).
- Sustain Ratio:
- Range: 0.0 to 1.0. Default: 0.5.
- What it does: Proportion of
Peak Hold Frames
for the effect to stay at full strength after attack.
- Decay Ratio:
- Range: 0.0 to 1.0. Default: 0.3.
- What it does: Proportion of
Peak Hold Frames
for the effect to ramp down from full strength to zero. - Note: The sum of these ratios should ideally be around 1.0.
- Easing Type:
- Options:
linear
,sine
,quad_in
,quad_out
. Default:sine
. - What it does: Determines the curve of the attack and decay phases.
linear
is a straight line,sine
is smoother,quad_in
starts slow and accelerates,quad_out
starts fast and decelerates. - Best use:
sine
often looks natural. Experiment for different feels.
- Options:
- Attack Ratio:
-
Flow Target & Calculation: Settings for how the base optical flow field is generated. This base field is then scaled by the peak envelope.
- Flow Target Mode:
- Options:
Zoom_In_Flow_Target
,Zoom_Out_Flow_Target
. Default:Zoom_In_Flow_Target
. - What it does: Determines if the target frame (used to calculate flow against the original frame) is a zoomed-in or zoomed-out version of the original. This defines the inherent "direction" of the flow.
- Options:
- Target Transform Amount:
- Range: 0.0 to 2.0. Default: 0.15.
- What it does: The amount of zoom (as a factor) applied to create the flow target frame. E.g., 0.15 means the target is 15% more zoomed in/out than the original.
- Best use: Small values (0.05 to 0.2) are typical. Larger values create more extreme base flows.
- Flow Calc Scale Factor:
- Range: 0.01 to 1.0. Default: 0.5.
- What it does: Downscales the image before calculating optical flow. Lower values (e.g., 0.25) significantly speed up flow calculation but may result in less detailed or blockier flow. 1.0 uses full resolution (slowest, highest detail).
- Best use: 0.5 is a good balance. Lower for faster previews, higher for final renders if quality is paramount.
- Flow Target Mode:
-
Flow Strength:
- Idle Flow Strength:
- Range: -2.0 to 2.0. Default: 0.0.
- What it does: A constant multiplier for the base flow field, applied on every frame, even when no peak is active. Can create a subtle, continuous background warp.
- Best use: Small values (e.g., -0.05 to 0.05) if used. 0.0 means no flow when idle.
- Peak Strength (Zoom In):
- Range: -2.0 to 2.0. Default: 0.4.
- What it does: Multiplier for the base flow field when a peak triggers a "zoom in" phase (as determined by
Flow Target Mode
andAlternate Every N Peaks
). Positive values enhance the target mode's direction (e.g., more zoom in if target was zoom in), negative values reverse it.
- Peak Strength (Zoom Out):
- Range: -2.0 to 2.0. Default: -0.3.
- What it does: Multiplier for the base flow field for the "zoom out" phase.
- Best use: Experiment with magnitudes. Positive values generally mean "outward" or "expansive" flow, negative values "inward" or "contracting" flow, relative to the base flow direction.
- Idle Flow Strength:
-
Interpolation & Boundary (for Flow): How pixel values are determined during image transformations.
- Warp Interpolation (cv2):
- Options:
Nearest_cv2
,Linear_cv2
,Cubic_cv2
,Lanczos4_cv2
. Default:Linear_cv2
. - What it does: Method used by OpenCV to interpolate pixel colors when warping the image with optical flow.
- Best use:
Linear_cv2
is fast and often good enough.Cubic_cv2
orLanczos4_cv2
can be smoother but are slower.Nearest_cv2
is very fast but blocky.
- Options:
- Zoom Interpolation (PIL):
- Options:
Nearest
,Bilinear
,Bicubic
,Lanczos
. Default:Lanczos
. - What it does: Method used by Pillow (PIL) for resizing operations, specifically for generating the flow target frame and for the breathing zoom effect.
- Best use:
Lanczos
orBicubic
generally offer the best quality for scaling.Bilinear
is faster.Nearest
is blocky.
- Options:
- Boundary Mode (cv2):
- Options:
Constant_cv2
,Replicate_cv2
,Reflect_cv2
,Wrap_cv2
,Reflect_101_cv2
. Default:Reflect_101_cv2
. - What it does: Defines how pixels outside the original image boundaries are handled when the image is warped.
Reflect_101_cv2
(default) usually gives good results by reflecting the image without repeating the border pixel.Constant_cv2
would fill with black (or a specified color). - Best use:
Reflect_101_cv2
orReflect_cv2
are often good choices to avoid hard edges.
- Options:
- Warp Interpolation (cv2):
3.3. Generate Video Button
- Action: Once you've set your inputs and parameters, click this button to start the video generation process.
- The button will be disabled during processing. The progress bar and status label will update.
4. Preview Panel (Right Side)
- Input Image Preview: Displays the image or the first frame of the video you selected as input. This helps you confirm you've loaded the correct visual.
- Video Widget: After video generation is successful, the output video will be loaded here.
- Play/Pause Video Button: Becomes enabled after a video is generated and loaded. Click to play or pause the output video in the Video Widget.
5. Workflow & Tips for Best Results
- Load Files: Start by loading your input image/video and your audio file. Set your desired output video path.
- Configure FFMPEG: Ensure the FFMPEG path is correct, especially if it's not in your system PATH.
- Start Simple:
- Begin with either "Breathing Effects" or "Peak/Flow Effects" enabled, not necessarily both at once, to understand how each set of parameters works.
- Use a short audio clip or set "Override Frames" to a small number (e.g., 100-200 frames) for quicker test renders.
- Tweak General Settings: Set your desired FPS.
- Adjust Effect Parameters:
- For Breathing Effects:
- Start with
Smoothing Window
andRMS Target Percentile
to get the general responsiveness. - Then, enable one effect at a time (e.g., Breathing Zoom) and adjust its
Max Factor
orMin/Max
values.
- Start with
- For Peak/Flow Effects:
- Adjust
Peak Threshold Multiplier
to match how many audio events you want to react to. - Set
Peak Hold Frames
for the desired effect duration per peak. - Experiment with
Peak Strength
values. Flow Calc Scale Factor
is key for balancing speed and flow detail.
- Adjust
- For Breathing Effects:
- Iterate: Generate a short preview. If you like the direction, refine the parameters. If using an input video, remember the effects are based on the first frame of that video.
- Combine Effects: Once you're comfortable with individual effect groups, try enabling both Breathing and Peak/Flow effects to see how they interact. The breathing effects will modify the base image, and then the peak flow effects will warp that modified image.
- Audio Characteristics Matter:
- Percussive Audio: Good for
Peak/Flow Effects
. Clear transients will trigger the flow. - Sustained/Ambient Audio: Good for
Breathing Effects
. The RMS envelope will capture the swells and fades. - Dynamic Audio: Audio with a good mix of quiet and loud parts will allow
RMS Target Percentile
andPeak Threshold Multiplier
to work effectively. Highly compressed audio (always loud) might make some reactive effects less nuanced.
- Percussive Audio: Good for
- Performance vs. Quality:
- Higher FPS, higher resolution input images, and a
Flow Calc Scale Factor
closer to 1.0 will increase processing time but generally yield better quality. Lanczos
orCubic
interpolation methods are higher quality but slower thanLinear
orNearest
.
- Higher FPS, higher resolution input images, and a
6. Troubleshooting
- "Librosa failed to import...": See
README.md
for advice on Numba/NumPy compatibility. - "FFMPEG not found" / Silent Video: Ensure FFMPEG is installed and the path in the GUI is correct or
ffmpeg
is in your system PATH. - Slow Generation: This is expected for complex effects. Reduce frame count for tests, lower
Flow Calc Scale Factor
, or use a lower input resolution. - No Reaction to Audio:
- Check if the correct audio file is loaded.
- For Peak effects:
Peak Threshold Multiplier
might be too high. Try lowering it. - For Breathing effects:
RMS Target Percentile
might be too high, orSmoothing Window
too large for the audio's dynamics. - Ensure the respective "Enable" checkboxes for the effect groups are checked.
- Video Playback Issues in GUI: The built-in player is for convenience. If it has trouble with a generated video, try playing the MP4 file in a standard media player (like VLC) to confirm the file itself is okay.
We hope this manual helps you explore the creative possibilities of the Audio Reactive Video Animator! Happy animating!