FAQ: HD UHD 4k HDR Video Playback - Psychtoolbox-3/Psychtoolbox-3 GitHub Wiki

High Performance Video Playback of high resolution, or deep color, or HDR movies

High resolution / high framerate / deep color / HDR quality video playback can be taxing on your CPU, video card, and disks, so first and foremost you'll want a fast machine. Psychtoolbox is highly optimized for such playback scenarios and also provides access to the highly optimized GStreamer media framework for high performance playback. On many combinations of operating systems and graphics hardware (gpu), Psychtoolbox can configure and employ GStreamer to take advantage of gpu hardware accelerated video decoding for especially efficient playback, but what works and what doesn't is often very context dependent, so there are different routes.

Option 1: For arbitrary movies, also with sound, use the standard playback approach:

This is demonstrated by PlayMoviesDemo(moviename [, hdr=0]), with the optional hdr flag left out or set to zero for Standard Dynamic Range (SDR) movies, or hdr=1 for High Dynamic Range (HDR) movies, e.g., HDR-10 movies with 10 bpc color depth per color channel. Psychtoolbox will play back via GStreamer with automatic audio-video sync via GStreamer and also perform needed colorspace conversions and similar for HDR content. Non HDR movies can also use an optimized playback path by choosing the optional pixelFormat parameter as 11 to choose Psychtoolbox especially performance optimized gpu decode shaders. Psychtoolbox will configure GStreamer to utilize hardware accelerated video decoding of your graphics card (gpu) if the movie is encoded in a format supported by the gpu for fast hardware accelerated video decoding. Otherwise a cpu only software fallback will be chosen.

On modern state of the art graphics cards from AMD and NVidia, or also from Intel and even on the RaspberryPi computer, hardware accelerated video decoding allows for high performance. E.g., a modern NVidia GeForce RTX 5000 "Blackwell" series gpu can support hardware decoding of all common video formats like MPEG-1, MPEG-2, VC1, VP-8, VP-9, very modern formats like H264 (AVC), H265 (HEVC) and even AV1 with both 8 bpc content and 10 bpc HDR content, sometimes even 12 bpc content (cfe. https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new). Such gpu's can decode 1920x1080 full HD movies at 10 bpc in HEVC / H265 format at ~1500 fps! Cfe. https://docs.nvidia.com/video-technologies/video-codec-sdk/12.1/nvdec-application-note/index.html#nvdec-performance

Option 2: For long movies without sound, to reduce chances of dropped frames for demanding content, use buffering:

The approach is the same as for Option 1, but with some special flags set in Screen('OpenMovie', ...). In your code, set the optional specialFlags1 parameter to 2 + 256 to disable any decoding and output of sound (saving all computation time and overhead for that), and to disable deinterlacing (pointless for non-interlaced footage, but maybe gets some processing block out of the way for slightly lower overhead anyway). Then set the optional async flag to 4 and set preloadSecs to a certain amount of seconds of movie footage to prebuffer, e.g., 4 for 4 seconds or something around that. This will prebuffer preloadSecs seconds of movie footage internally, so in case of slight performance fluctuations the movie playback engine has some reservoir of already decoded video content to draw from. You then start playback of the movie via Screen('PlayMovie', ...), but wait for a few seconds before entering the actual movie playback loop in your script, to allow the playback engine to predecode and prebuffer multiple seconds of video content in RAM. Clever selection of preloadSecs and wait time before start of the actual movie playback loop and installing sufficient RAM can allow to play longer movies without dropping frames, even if the hardware is not fast enough to really play back a demanding movie, but a bit too slow. This approach is demonstrated in more detail in the following post: https://psychtoolbox.discourse.group/t/parallel-playback-of-multiple-4k-hdr-videos-on-different-displays/5189/4

Sketched out approach:

OpenMovie with asyncflag set to 4.
Start movie playback. This starts the decoding process.
Wait for a few seconds to prebuffer data.
Start your Screen('GetMovieImage') fetch and draw loop.

The engine will decode video buffers and queue them in an internal queue, as soon as playback is started, until it gets stopped. GetMovieImage will fetch the oldest buffer (fifo order) and convert it to a texture. You can control the maximum amount of buffered video via the preloadSecs parameter (default = 1 second), a setting of -1 would allow infinite buffering, ie., until you run out of system memory.

You'll probably have to use Priority() to make sure your main thread isn't deprived of computation time by all the GStreamer threads running at maximum decoding speed.

Option 3: For short movies, without audio track, and highest need for timing and control:

If you are concerned about dropped frames, the safest option is to preload all of your video frames into VRAM, and then play them back. The LoadMovieIntoTexturesDemo is good for determining what is possible, in terms of load times, and playback rates, when called in benchmark mode:

LoadMovieIntoTexturesDemo(<video file name> ,[], [], [], 1)

This demo shows how to open a movie file in a format and encoding supported by GStreamer and its plugins, and then converting each movie frame upfront into a Psychtoolbox texture. This will store the images into your graphics cards (gpu) video memory (VRAM), and also into your computers system RAM. Once loaded, drawing the textures to the onscreen window for presentation is extremely fast, and even faster if the whole movie fits into VRAM. Usually the limiting factor is decoding the videos into textures, not the drawing of the textures to the screen.

Q: What kind of file→texture decode rates are possible?

A: Using Version 3.0.22 - (Build date: May 25 2025), on an Apple Silicon MacBookPro with M2 Pro SoC and builtin fast SSD drive as source storage:

Video: 4k HDR-10 3840x2160 encoded as H265 / HEVC, on a 16 GB RAM machine:

Decoding and movie to texture conversion speed is 107 frames per second ~ 846 Megapixels/second. Movie texture playback rate is 503 frames per second ~ 3982 Megapixels/second.

Of course a 10 bit per color channel HDR-10 video frame at this resolution typically takes up about 3 Bytes per pixel, so 23.73 MB RAM per frame, so a 10 second movie clip playing at 60 fps would consume 13.9 GB RAM. In other words, even rather short 4k high resolution clips require a lot of RAM, so tread carefully, or buy lots of RAM.

On the topic of video formats that might decode fastest, Tobias Wolf said about the popular "Handbrake" video encoding application:

Just my ¤0.02, Handbrake's defaults enable all the bells and whistles that codec developers have come up with in the last 10 years to make the file as small as possible. This means decoding is complex. So don't use the defaults for fast decode.

There are profiles in H.264 that limit decoding complexity. In x264 they are exposed as:
--profile <string> Force the limits of an H.264 profile Overrides all
settings. - baseline,main,high,high10,high422,high444
I guess you want baseline. Note that high10 can give you 10-bit color and high444 gives you RGB. Otherwise you get 4:2:2 subsampled color, and much smaller gamut. Not all decoders support these. But Gstreamer does.

You can also try --tune fastdecode in addition. Don't worry about quality, this is governed by CRF constant quality factor. The file will just be bigger, but faster to decode.

FFmpeg might use a different MP4 muxer (for both ASP and AVC) [Than handbreak]. The encoder is the same. I would go with matroska mkv in any case. Quicktime won't play that though.

Note that while decoding a video frame does take a while, uploading that frame to VRAM isn't instantaneous. See MakeTextureTimingTest2 to find out how long this step takes.

Some more thoughts and tips:

For playing back movies back-to-back, with no or minimal breaks between two successive movies, try PlayGaplessMoviesDemo2.m or PlayGaplessMoviesDemo1.m, because they can make use of GStreamers builtin gapless playback support.

Choice of operating system: There's always Linux as the best alternative for high performance, followed by Microsoft Windows wrt. movie playback performance. Apple macOS as of May 2025 will generally have the lowest movie playback performance. Steps:

Further possible micro-optimizations to consider if performance is of essence:

You can use the dontclear=2 flag in Screen('Flip', ...) to prevent clearing the framebuffer if you're overdrawing the video stimulus anyway – may save up to one msec or so.

Using the additional Screen('Preference', 'Conservevram', ...); setting 512 aka kPsychAvoidCPUGPUSync could also make sense (see help ConserveVRAMSettings, all numbers of all used flags add up). This would disable any kind of OpenGL/GPU error checking in texture creation, DrawTexture, Flip etc. Usually not recommended, but once your code works error-free it may save some fraction of a msec.

Clever use of Screen('DrawingFinished') after the last drawing command, before you do other stuff like KbChecks and such may also help to increase parallelism between CPU and GPU. Could be that the remaining skipped frames are due to delays on the GPU, not CPU – you can only time the CPU with GetSecs, tic/toc, the profiler etc. For the GPU there is special profiling support on supported GPUs, as shown in DrawingSpeedTest if you follow the gpumeasure flag.

In the end, if we talk about occassional misses by a (few) msecs, we're in the world of endless tweaks. E.g., running the GPU always at its highest performance setting to avoid interference of GPU power management, choosing the right operating system instead of the wrong one, tweaking CPU power management and other settings on operating systems that support such things, and so on.