Tech Art Compendium - GhislainGir/BlenderGameToolsDoc GitHub Wiki
This compendium focuses on topics relevant to this Blender addon, primarily addressing side effects and important considerations when using vertex offsets and storing data in UVs and Vertex Colors. It may be worth reading, whether you use this addon or not.
Unique vertex attributes
A vertex can only have one attribute of a given type:
- One position
- One normal
- One RGBA vertex color
- One UV coordinate per UV map
What does this mean for UV seams? A vertex might have two UV coordinates in a single UV map.
In simple terms, the mesh will be cut, and these vertices will be duplicated so that one set of vertices has one UV coordinate, and the other set has the other UV coordinate.
This can easily be proven. When exporting two cylinders to Unreal Engine, one with no UV map and the other with a UV map containing a single seam, the latter contains two extra vertices for the seam.
This principle applies to all vertex attributes, such as normals. To create a hard edge or, more broadly, to create flat faces, vertices must be duplicated so that some have their normals pointing in one direction, and others have their normals pointing in another direction.
Again, this can be easily demonstrated. When exporting two spheres to Unreal Engine, one with smooth shading and the other with flat shading, the latter contains a significantly higher number of vertices.
That's simply because each face duplicates all its vertices so they can have unique normals.
The same principle applies to vertex colors. For a face to have a vertex color different from an adjacent face, the mesh must be cut, and the connected vertices must be duplicated so that some store one color, while the rest store another color.
This results in an increased vertex count that can be predicted and confirmed in any game engine.
[!NOTE] This concept may cause confusion for new artists and tech artists, mainly because a DCC software tries its best to hide this process for a better user experience. However, this still happens under the hood in all DCC softwares, and it is certainly true for all game engines as well. When in doubt, if you’re using UVs or Vertex Colors to store arbitrary data or manipulating normals in an unusual way, just ask yourself if the data is per face or per vertex. Most of the time, it will be per vertex, with UV seams and hard edges being the most notable exception.
[!IMPORTANT] This isn’t a concern for VAT, BAT, OAT, and Pivot Painter. The extra UV map they all create assigns a single UV coordinate per vertex, so no splits are induced and no vertices are duplicated. However, this may be a concern when baking arbitrary data in UVs, such as a random value per face, but this is a very rare use case. Most reasons for baking data into UVs, such as storing pivots, axes, etc., all assign a single UV per vertex, so no splits are induced. TL;DR: you shouldn't worry too much about this, at least in the context of using this addon. The concept of a vertex being limited to storing a single attribute of a given type is fundamental to tech art and was mentioned in this documentation for the sake of completeness.
UV map cost and count
There are many misconceptions about the cost of UV maps, with some claiming that each additional UV map induces an extra draw call. This is simply not true.
While UV maps do have a cost, it’s not as drastic as often suggested. The main impact is on the mesh's memory footprint: two 16-bit floats per vertex per UV map, in Unreal Engine at least, unless 'full precision UVs' is enabled, in which case the cost increases to two 32-bit floats per vertex per UV map.
The increased memory footprint can be easily predicted. For example, for a mesh with 400 vertices, adding one UV map results in an additional 1.6 kilobytes of memory. This can be measured and confirmed in Unreal Engine using the Size Map feature.
While an increased memory footprint does have some impact on memory bandwidth sooner or later, that's about it. Adding an extra UV map to a static mesh is unlikely to cause a measurable performance impact. However, using 8 UV maps on all your assets could tell a different story. If you need that many UV maps to store arbitrary data, one might question whether UV maps are the best medium for storing this data.
[!WARNING] In most game engines, including Unreal Engine, there’s a hard limit on the number of UV maps: 8 for static meshes and 4 for skeletal meshes.
UV Precision
In most game engines, UVs are stored as 16-bit floats for performance purposes and to save memory. 16-bit floats provide sufficient precision for everyday tasks, such as sampling up to 4K textures, and allow for positive and negative values across a wide range. However, when storing arbitrary data in UVs, 16-bit floats may not provide enough precision, especially when using bit-packing techniques. In such cases, 32-bit UVs can be enabled in most game engines.
In Unreal Engine, this is exposed through the 'full precision UVs' option in the static mesh editor.
Similarly, when using VAT, BAT, OAT, Pivot Painter, or any other technique that requires vertices to sample a very specific texel, Nearest sampling is often mandatory to avoid data corruption. In such cases, 32-bit UVs might also be necessary, depending on the texture resolution and the texel precision required. For example, 4K textures will likely require 32-bit UVs to achieve texel accuracy.
Lightmap UV
Good old lightmaps require unique UV layouts. In game engines where using lightmaps is common practice, or at least was, like Unreal Engine, it’s typical to see lightmap UVs automatically generated upon mesh import. This assumption can interfere with your setup and override any additional UV map you've used to store pivots, etc. Always double-check that such options don’t get in your way.
A static mesh can be further edited in the static mesh editor in case it was imported with the wrong settings.
Collisions
Applying any kind of vertex movement or offset in a vertex shader occurs on the GPU, near the end of the rendering pipeline, and the CPU is completely unaware of this step.
As a result, collisions, which are typically solved on the CPU except in a few specific custom game engines, will still be computed based on the original collision primitives you may have set up in your game engine, or the mesh's triangles, as if no vertex shader had been used.
There's no way around it. Some game engines might expose settings to bake a fixed vertex offset into some kind of collision data, like Unreal Engine's landscape that may account for the landscape's material WPO, but it's extremely limited and it is best assumed that collisions and vertex animation don't go hand in hand, period.
[!NOTE] It doesn't mean you can't use vertex animation with colliding meshes! Just be prepared to account for it when authoring collisions, or accept that things won’t be perfect. Things don't have to perfect!
Bounds
Bounds are used by the CPU to determine if a mesh is in view and thus, if it should be rendered. Similarly to collisions, bounds are precomputed based on the static mesh's raw vertex data.
As a result, a mesh that may appear to be in view based on its precomputed bounds might no longer be in view once its vertices are displaced on the GPU by a vertex shader.
Similarly, a mesh that isn’t in view and is therefore occluded might actually be in view after its vertices are displaced on the GPU by a vertex shader, but by that point, it’s already been culled. So, you can effectively make objects disappear!
You get the idea, this widely used, bounds-based, occlusion process doesn’t work well with vertex animation and vertex offsets. Unfortunately, there’s no magic solution to this issue, except to arbitrarily increase the mesh’s bounds by a certain amount to account for vertex animation or the maximum expected offset.
This will make the CPU think the mesh takes more screen-space than it actually does and prevent undesired occlusion culling.
[!WARNING] Increasing bounds will reduce the chances of the mesh being culled, essentially leading to a performance impact, as the mesh will statistically be rendered more often due to its increased apparent overall size. The impact is hard to quantify and depends on the specific mesh and scene. Bounds should be extended with care.
Distance Fields
Still on the topic of precomputed data: distance fields. They have become quite popular and are now widely used in almost all game engines.
A distance field is essentially a 3D volume encapsulating a mesh, where each voxel encodes the distance to the nearest surface of the mesh, negative if inside it.
With this data, and using a few derivative tricks, the direction to the nearest surface can be deduced. For example, sampling the field at a given point, and then slightly to the left of that point, reveals whether the distance increases or decreases in that direction. This indicates which way to go to get closer to the surface along the X axis. Repeat this process for Y and Z, and the resulting three components can be normalized into a direction vector (D)
Together with the distance (d), this allows the nearest position on the mesh to be found from any sample location (p) within the 3D volume.
However, computing the distance to the nearest surface of the mesh for each voxel can be quite a taxing process for the CPU and/or GPU, which is why distance fields are almost always precomputed offline based on the mesh’s rest pose. As a result, no vertex animation or offset is taken into account, which can mess up lighting, shadows, and any other rendering feature that relies on distance fields in one way or another. A mesh may be animated on the GPU using a vertex shader, but its distance field will remain fixed just like any effect computed based on it.
[!NOTE] Similar to bounds, there’s no magic solution to this, and the one "hack" typically exposed in game engines like Unreal Engine is the ability to offset the distance field self-shadowing distance. Unfortunately, this doesn’t offer much flexibility. Moving vertices on the GPU isn’t really compatible with techniques that rely on precomputed data, that’s just how things are.
Virtualized Rendering Systems (Nanite & VSM)
Nanite and similar virtualized geometry techniques don’t work well with vertex animations and offsets for reasons that are well beyond the scope of this documentation.
The same goes for virtual shadow maps, which are, at their core, a caching technique. And just like with any caching technique, the cache isn't meant to be trashed every frame just because a few vertices have moved. It can be, but the cost of cache invalidation is usually very high.
[!NOTE] I’m not up to date with the solutions currently offered in Unreal Engine to alleviate these issues, and I’d imagine it's a similar situation in other engines implementing similar technologies. Unfortunately, offsetting vertices on the GPU doesn’t work well with these new technologies; that's just the reality of the situation. It doesn’t mean it’s impossible, but you’ll need to dig deep and find the tricks that work best for your specific use case (e.g. it’s possible to tell Nanite a maximum offset per material, tweak the VSM invalidation cache behavior per asset etc.)
[!WARNING] @TODO Further research need to be conducted
Lumen
[!WARNING] @TODO Further research need to be conducted
Raytracing
This is not my area of expertise, but vertex shaders are typically supported in ray-tracing implementations. Some options might need to be enabled to account for any vertex offset applied in the vertex shader, as this feature may not be enabled by default due to the increased cost. In Unreal Engine, this used to be exposed through the ‘Evaluate World Position Offset’ option at one point. However, I’m not up-to-date and can't guarantee that this is still the case, as raytracing implementations are still quite in the early days and all rapidely evolving still.
[!WARNING] @TODO Further research need to be conducted
Motion Vectors
The game industry has largely shifted towards temporal solutions for effects that require multiple frames to converge or are too costly to compute in a single frame. These include Temporal Anti-Aliasing, Temporal Super Resolution, Global Illumination and Denoising solutions, and many more. For better or for worse, the rendering stack of many game engines is now heavily temporal. While using these features isn't mandatory, it’s becoming increasingly difficult to ship something that doesn't rely on a temporal effect at some point in the pipeline. These algorithms may not be the most appreciated, but they do make sense and, at times, are a necessary evil.
A vertex shader that induces pixel motion via vertex animation can cause problems because these temporal effects rely heavily on motion vectors to minimize ghosting and help the algorithms understand how to treat each pixel’s history. The GPU will treat a pixel rendering a moving object, using the velocity buffer/motion vectors, differently than a pixel rendering a static one.
The issue is that vertex shaders don’t output motion vectors by default in most game engines. Therefore, it’s up to the artist to provide the shader with the vertex position of the previous frame so the engine can derive velocity and render motion vectors in the velocity buffer.
In Unreal Engine, this is exposed in material graphs through the previous frame switch node. This static switch tells the shader to branch out when the engine applies the vertex shader for the current frame, and when it renders motion vectors by computing the difference between the vertex positions of this frame and the previous one.
Simply put, you need to plug the current vertex offset into the ‘current frame’ pin, just like you would plug in the world position offset input in the material node.
You also need to plug the previous vertex offset into the ‘previous frame’ pin.
Since there’s no way to cache that data, it must be recomputed. This may sound complex, but most vertex shaders are time-based, so the same logic can be simply duplicated. One might think that time should be replaced by $time - dt$ but this is automatically done for you in the vertex shader, so you can just copy & paste the same logic. This will output the vertex offset for the previous frame and allow the engine to write motion vectors.
This will essentially double the cost of the vertex shader, but there’s no way around it. It’s a necessary evil to avoid a pixelated mess when using vertex animation along with temporal effects.
[!IMPORTANT] This may have changed in recent UE versions, as there seems to be no need for that with time-based effects anymore. The vertex shader now has a special HLSL function to compute the previous world position offset, which replaces $time$ with $time - dt$, making the previous frame switch unnecessary. This applies only to time-based effects, as effects with Time provided by external systems, such as particles with Dynamic Material Parameters, will still require the previous frame switch and duplicated logic, where the 'custom Time' is replaced by a 'custom PreviousTime'.
Here are VAT-animated mesh particles with a custom Time provided by Niagara, without the use of the previous frame switch:
And with it. As you can see, this allows temporal effects to eliminate most artifacts and enables motion blur to function properly.
Texture compression settings, interpolation & nearest sampling
When using textures to store arbitrary data, it’s important to understand not only how the data can be stored and compressed but also how it is sampled.
Texture formats are discussed in great details the remapping section. To summarize, textures are most often RGBA 8-bit integers, commonly referred to as RGBA8. This format works well for everyday use and is more than sufficient for most PBR maps (such as diffuse, roughness, etc.), even allowing for different DXT compression modes depending on the use case.
That said, RGBA8 is likely impractical for storing arbitrary data in many technical applications, such as VATs or Pivot Painter, among others. The range is simply too limiting. HDR textures, either 16- or 32-bit, are typically required for these applications. Additionally, texture compression is usually not an option. Compression typically scrambles bits and performs optimizations in blocks of pixels, something that can't be allowed in a lot of cases. VATs and similar techniques store critical information per pixel that can't be averaged with nearby pixels without causing undesired issues. Moreover, Pivot Painter uses bit-packing methods, and any form of compression would scramble bits, corrupting the packed data.
For all these reasons, you'll likely want to limit yourself to uncompressed RGBA8, HDR16, or HDR32 texture compression settings when using techniques like VATs.
Having a solid understanding of the technical implications of working with a specific data storage format is just the beginning. Sampling can be just as complex.
Most often, textures are sampled using coordinates stored in UV maps. As mentioned here, these UVs are typically stored as 16-bit floats, which limits precision. This is usually not a problem, as small amounts of jitter or imprecision in UVs typically don't cause visible issues when sampling regular textures, such as diffuse or roughness maps. However, for VATs, Pivot Painter, and similar techniques where each vertex needs to precisely read the data stored in a particular pixel, UV jitter resulting from the use of 16-bit floats can become an issue.
By default, GPUs perform bilinear interpolation between the four texels (a, b, c, d) closest to the given UV coordinates. Texels a and b are linearly interpolated based on the U coordinate, and c and d in the same way. Then, the results of those two interpolations are themselves interpolated based on the V coordinate, producing the final bilinear result.
Therefore, if you want to read the value stored in a specific pixel rather than sampling it, this could become an issue.
Let’s assume the VAT is 2x2, like in the illustration above. An unrealistically low resolution, but let’s go with it. Each texel is separated by half a unit, with one vertex centered on the left texel at 0.25 and another centered on the right texel at 0.75.
The smallest positive value that can be represented by a 16-bit float is 2⁻¹⁴ × (0 + 0/1024) ≈ 0.000000059604645
. This gives you an idea of how precise UVs can be.
With such a low-resolution VAT, even considering that level of imprecision, UVs will be dead-centered on each texel, and GPU interpolation won’t be an issue, as this small amount of deviation wouldn’t translate into a meaningful deviation percentage. In other words, the other three nearest texels won’t contribute to the sample in a meaningful way.
[!WARNING] This assumes data isn't packed using a bit-packing algorithm like Pivot Painter's! In that case, even the tiniest amount of interpolation could scramble the bits and corrupt the packed data!
That said, floats do not have a fixed precision step. Instead, they get increasingly imprecise as you move away from 0. Precision can range from ~0.000000059604645
near 0 for a 16-bit float, to ~0.00097656
around 0.49, and up to ~0.00048828125
around 0.99.
Let’s assume the VAT is now 400x200, still quite a low resolution. Each texel is separated by 1/400, or 0.0025
, along the U axis. The first vertex should be centered on the first texel at 0.00125, the second vertex centered at 0.00375, and so on. That’s already quite precise and likely to cause interpolation issues when sampling a VAT.
We would expect UVs to be very precise for the first few texels, near 0, but closer to 1.0, an imprecision of ~0.00048828125
would equate to ~19.5% of imprecision! Sampling a texel would then output a value that includes about a fifth of the neighboring texel’s value. This is especially problematic with VAT and similar techniques, where neighboring texels may encode offsets for vertices moving in a completely different manner than the current texel. Even small amounts of interpolation could drastically alter the original expected value.
At this point, Nearest sampling becomes mandatory. This method instructs the GPU to avoid interpolation and select the nearest texel based on a given UV coordinate. In this case, the UV coordinate is closest to texel 'b'.
However, we’re still not out of trouble.
Let’s assume the VAT is 4096x500. Each texel is separated by 1/4096, or 0.00024414062
, along the U axis. This is extremely precise, and compared to ~0.00048828125
of deviation, it represents a 200% deviation. In other words, even with Nearest sampling, a 16-bit UV could instruct the GPU to pick the wrong texel for a given vertex! At this point, 32-bit UV becomes mandatory, which means Nearest sampling may no longer be necessary, depending on the VAT resolution, due to the much more precise UV and significantly reduced interpolation.
All in all, relying on GPU interpolation is perfectly fine for sampling everyday textures, but it may only be practical in certain scenarios for more technical applications. Typically, techniques that involve storing specific data in specific texels don't work well with interpolation (or any kind of texture compression, for that matter) and require Nearest sampling, 32-bit UVs or both.
[!NOTE] As mentioned in the VAT section, pixel interpolation can provide frame interpolation for free. However, as we've noted, Nearest sampling may be the preferred choice to avoid unexpected results. In that case, frame interpolation can still be achieved, but you'll need an additional texture sample to fetch the data one frame ahead and perform the interpolation manually.
To summarize, it’s not the higher texture resolution itself that causes precision issues. Rather, for VATs and similar techniques, the problem arises because vertices need to be dead-centered on texels, and as texture resolution increases, the texels become smaller, making imprecision issues more apparent.
Fixing Normals
Offsetting vertices in a vertex shader does not update the normals, and for good reasons.
Normals can be computed in your DCC software, like Blender, in many different ways and re-evaluated at will (e.g., smooth/flat/weighted normals). Some of these methods require averaging the normals of all triangles surrounding a vertex, which a vertex shader simply can't do.
Additionally, there's no direct correlation between a vertex's position (or its offset) and its normal, so the normal can't be derived from the offset alone. For example, a vertex moved along its normal would change position, but its normal wouldn't. Yet it may very well change the normal of neighboring vertices!
This is a more complex topic than many tech artists may initially realize. For this reason, normals often have to be baked along with offsets when using vertex animation textures. However, normals can still be manually fixed in certain cases, such as when using object animation textures where the object's rotation and pivot point are encoded in the texture, allowing the normal to be rotated and corrected. Bone animation textures also encode bone pivot and quaternion information, which can be used to fix the normal.
Long story short, when rotations are involved, normals can be fixed, but they most often cannot be when dealing with arbitrary offsets.
[!NOTE] DDX/DDY can be used in pixel shaders to derive flat normals from the surface position but it results in a faceted look that is most often undesired.
NPOT Textures
While non-power-of-two textures were once not even considerable in most game engines, the situation has greatly improved but there are still some important things to note. It’s very hard to find information on what happens under the hood in older and more recent GPUs and coming with absolute truths on such a broad and obscure topic is unwise.
That said, it wouldn’t be unrealistic to assume that, on some hardware (e.g. mobile?) an NPOT texture may be stored as the next power-of-two (POT) texture. One may even read here and there older reports stating that a NPOT texture may be automatically padded with black pixels to be converted to a POT texture, potentially causing interpolation issues on borders, but we digress. A 400x200 texture may be stored as a 512x256 texture depending on your targeted hardware, game engine, graphics API etc. It is however not something I have directly experienced with Unreal Engine.
While this doesn’t directly affect the user if it was true (this is a memory layout thingy), it would waste precious GPU memory and may therefore affect memory-bandwidth. This shouldn't be worrying for smaller textures, but a 2049x2048 texture, for example, would be theoretically stored as a 4096x2048 texture on some, likely ancient, hardware, just because of that extra pixel in width! Worrying, but again, it doesn't seem to be the case on recent hardware. Everything seems to point to an NPOT texture behaving just like a POT texture in memory, but, you know, GPUs... They can’t be trusted!
Moreover, while NPOT textures are now widely supported in most game engines, that doesn't mean they are widely supported by the hardware that you may target with said game engines (e.g. mobile?). Support may even be partial or bugged. It may be a good idea to double-check your targeted hardware specs and ensure NPOT textures behave well on it.
NPOT textures can also cause problems with mipmapping and most compression algorithms like DXT often have requirements on the texture resolution (POT or multiple of 4). While this doesn't apply to VATs, OATs and BATs (which shouldn’t be compressed nor mipmapped), it’s still worth mentioning.
In short, NPOT should work fine for most use cases in 2025. This isn’t an absolute truth, of course, so don’t take my word for it. That said, I can’t help but think that some efforts should be made to ensure reasonable NPOT resolutions. Personally, I wouldn’t feel comfortable shipping a 2049x2048 VAT. I’d ditch that extra frame or vertex just to be safe, but hey, you do you!
Remapping
Storing data in a texture, UVs, or Vertex Colors requires an understanding of the format you're working with.
Most textures use 8-bit integers per pixel per channel, which allow for storing 256 values per channel, ranging from 0 to 255. However, textures can also be configured to use 16-bit integers, 16-bit floats, 32-bit integers, or 32-bit floats per pixel per channel. In most game engines, you'll primarily use 8-bit integer textures for everyday tasks (such as storing diffuse, roughness data, etc.) and 16- to 32-bit float HDR textures for more specific use cases (tech art, VFXs etc.).
[!NOTE] 8-bit textures can be compressed using various algorithms like DXT, assuming their resolution meets the requirements of these algorithms (POT or multiples of 4). HDR textures can also be compressed, but this won't be covered in this documentation, and it will be assumed that HDR textures are uncompressed.
UVs can be 16- to 32-bit floats, while Vertex Colors are typically 8-bit integers.
While both 16- to 32-bit floats offer the ability to store any value —small or large, positive or negative— and differ only in the precision they provide and maximum range (65504 for 16-bit, 16777217 for 32-bit), 8-bit integers require a different approach. Being limited to 256 integers, you can draw two main conclusions:
- The range is very small, from 0 to 255.
- It only supports positive integers.
Therefore, storing arbitrary values in an 8-bit integer often requires a process called remapping.
For example, let's assume we want to encode a normal in an 8-bit texture. The normal being a unit vector that may point in any direction, its XYZ components may each range from [-1 to 1]. This range needs to be remapped to [0:255] for storage in an 8-bit integer. Thus, the math to remap a unit vector from [-1:1] to [0:255] is as follows:
- First, remap the value from [-1:1] to [0:1] using the formula: $(x + 1) * 0.5$.
- Then, multiply by 255 and use the floor function to round to the nearest integer in the [0:255] range
[!NOTE] $(x + 1) * 0.5$ is the same as $(x * 0.5) + 0.5$, and this operation is often referred to as a constant-bias-scale. The convention is to usually apply the bias first.
[!NOTE] Using the floor function might be unnecessary as the process of writing any value to an 8-bit integer will itself floor the value.
This remapping process is what causes our beloved normal maps to look the way they do. Normal maps are mostly baked in tangent space, meaning the orientation is relative to the underlying low-poly surface. Because of this, it’s likely that the orientation of the source high-poly surface relative to the target low-poly surface will produce a vector that is almost always mostly pointing towards +Z and neutral in X and Y.
As a result of the remapping process, this typically results in a light bluish color in the normal map, as the remapped XYZ values tend to center around the positive Z axis, with minimal displacement in the X and Y axes. This bluish tint reflects the neutral, upward-facing direction in tangent space.
Next, when sampling the texture and reading the normal in the [0:255] range, the opposite operation needs to be performed:
- Divide the value in the [0:255] range by 255 to bring it back to the [0:1] range.
- Remap from [0:1] to [-1:1] using the formula: $(x - 0.5) * 2$
[!NOTE] $(x - 0.5) * 2$ is the same as $(x * 2) - 1$ and is just a constant-bias-scale operation with different bias and scale parameters.
[!NOTE] In most game engines, sampling an 8-bit texture usually don’t spit out values in the [0:255] range but right away in the [0:1] range so the first step is likely unnecessary.
[!NOTE] Most game engines use specific texture compression settings for normal maps, which allow texture samplers to identify normal maps and automatically perform the remapping under the hood. When sampling the RGB channels, the engine can then output the initial XYZ normal unit vector, simplifying the workflow for artists and developers, as they don’t need to manually handle the remapping.
Such operations are lossy! Assuming the normal XYZ components were initially stored as 32-bit floats with great sub-decimal precision, converting to 8-bit integers obviously reduces this precision and rounds the remapped XYZ components to the nearest corresponding integer amongst 256 possibilities. For a unit vector, this is usually not a significant issue: normal maps, for instance, are almost always stored using in 8-bit BC5 compressed textures where one component of the unit vector is discarded and reconstructed when sampled. However, for more arbitrary values, like positions and offsets, 8-bit could be problematic depending on your use case. Moreover, for arbitrary values, the remapping process involves one extra step.
Let’s assume we want to store an XYZ position in the RGB channels of an 8-bit texture. Such a position’s range is theoritically infinite. It could be something like (-127.001, 253.321, 15.314)
or (1558.324, -5428.256, -94644.135)
, or anything, really. Thus, first, it needs to be remapped to a [-1:1] range. This involves identifying the greatest absolute position or offset in the entire set of positions or offsets you want to bake. Once you have the highest value, all positions can be divided by it to bring all values back into the [-1:1] range.
The formula ends up being
- $(((pos/absmaxpos)+1)*0.5)*255$
And to retrieve the position when sampling the texture, the inverse need to be performed
- $(((value/255)-0.5)*2)*absmaxpos$
The $absmaxpos$ value needs to be computed in advance and stored to correctly decode positions encoded in 8-bit integers. This addon makes extensive use of this technique, but often with a slight variation: instead of using the maximum absolute value, it computes both a minimum and a maximum value, enabling the use of the following formulas:
Remapping from [min:max] to [0:1] and then to [0:255]
- $((val - minval) / (maxval - minval)) * 255$
Remapping from [0:255] to [0:1] and then to [min:max]
- $((val / 255) * (maxval - minval)) + minval$
[!IMPORTANT] The same exact principles apply to Vertex Colors, which can be used to store a unit vector, like a normal, or an offset or position with the same exact constraints. Since Vertex Colors are typically stored as 8-bit integers, you will face similar limitations in terms of range and precision. When storing a unit vector or other data, the values will need to be remapped from their original range (e.g. [-1:1]) to fit within the 8-bit integer range of [0:255]. The remapping process will result in a loss of precision, and care must be taken when using Vertex Colors to store data like positions or offsets.
Packing
Storing data in UVs or Textures brings up an interesting topic called bit-packing. Bit-packing can be thought of as the process of storing more data than what’s typically possible in a given format, like a 32-bit float, by using some kind of packing and unpacking algorithm. This usually involves some precision loss, as you can’t expect to, say, pack two 32-bit floats into one and maintain the same level of precision.
That said, some algorithms are extremely clever, like the "smallest three" method, which compresses a 4-component quaternion into a single 32-bit float with sufficient precision for most real-time applications. This is discussed in greater detail in a later section.
Pivot Painter 2.0 also famously uses a complex bit-packing algorithm to encode a 16-bit integer into a 16-bit float in such a way that it survives the 16-to-32-bit float conversion, which is also discussed in more detail in a later section.
Taking a step back, one may not even need to resort to low-level bit-wise operations to pack data; simple arithmetic can serve as a starting point.
Arithmetic - Integer/Fraction
A simple packing method involves using the integer part of a 32-bit float to store the first piece of data, and using its fractional part to store the second.
An example of this would be storing an object’s XYZ position components and its forward vector XYZ components in just three 32-bit floats. Each position component would be rounded to the nearest integer, which, assuming the position is expressed in centimeters, would result in a precision loss of 1 centimeter—arguably not a significant issue. The forward vector/unit axis XYZ components would then be remapped and stored in the fractional parts.
Extra care must be taken with the latter step. Storing a unit value ranging from [0:1] in the fractional part could cause issues because the fractional part of 1.0 is 0, meaning 1.0 can’t be packed. As such, some kind of remapping needs to be performed to adjust the data packed in the fractional part from [0:1] to [0:1<], with enough margin to account for 32-bit float precision issues.
Once this hurdle is overcome, packing the position component (x) and the axis component (y) into a single 32-bit float (w) becomes quite straightforward.
Packing:
w = floor(x) + y
Unpacking:
x = w - frac(w)
y = frac(w)
Let’s assign values to x, y:
let x=432.124, y=0.5643
Packing:
w = floor(432.124) + 0.5643 = 432.564
Unpacking:
x = 432.564 - frac(432.564) = 432.564 - 0.564 = 432
y = frac(432.564) = 0.564
[!NOTE] The X could arguably be left as-is, assuming the value it encodes is in centimeters and you’re not concerned about a deviation of up to 1 cm.
[!NOTE] Minimal precision loss can be expected.
Arithmetic - Two floats
Another simple packing method involves scaling two normalized 32-bit floats (x,y) to fit them into one 32-bit float (w). This method is quite rudimentary and results in moderate precision loss.
Packing:
a = floor(x * (4096 - 1)) * 4096
b = floor(y * (4096 - 1))
w = a+b
Unpacking:
x = floor(w / 4096) / (4096 - 1)
y = (w % 4096) / (4096 - 1)
Again, let’s assign two values to x and y
let x=0.3341, y=0.7644
Packing:
a = floor(0.3341 * (4096 - 1)) * 4096 = 5603328
b = floor(0.7644 * (4096 - 1)) = 3130
w = 5603328+3130 = 5606458
Unpacking:
x = floor(5606458 / 4096) / (4096 - 1) = 0.3340
y = (5606458 % 4096) / (4096 - 1) = 0.7643
The unpacked values differ slightly from the original ones, but the precision might be sufficient for some use cases.
[!NOTE] Remapping can be used to bring any value into the range [0:1] for packing, and then back to its original range after unpacking. However, the imprecision introduced during packing may make this impractical.
Arithmetic - Three floats
Similarly, a different packing method involves scaling three normalized 32-bit floats (x,y,z) to fit them into one 32-bit float (w). This method is equally rudimentary and results in severe precision loss, making it impractical for packing anything other than unit vectors.
Given three x, y, z 32-bit floats, the packing algorithm is as follows:
a = ceil(x*100*10)
b = ceil(y*100)*0.1
c = ceil(z*100)*0.001
w = a + b + c
Unpacking:
x = (w*0.001)
y = (w*0.1 - floor(w*0.1))
z = (w*10 - floor(w*10))
Let’s assign values to x, y, and z:
let x=0.3341, y=0.7644, z=0.0123
Packing:
a = ceil(0.3341*100*10) = 340
b = ceil(0.7644*100)*0.1 = 7.7
c = ceil(0.0123*100)*0.001 = 0.002
w = a+b+c = 340 + 7.7 + 0.002 = 347.702
Unpacking:
x = (347.702*0.001) = 0.347702
y = (347.702*0.1 - floor(347.702*0.1)) = 0.7702
z = (347.702*10 - floor(347.702*10)) = 0.019999
As you can see, the unpacked values deviate quite a bit from the packed values. This is the result of packing, and the precision loss may be acceptable for some use cases.
These packing methods can be extremely useful when storing data in various media, especially UVs, as the number of UV maps is limited and extra UV maps consume precious memory. For instance, the forward axis' XY components of grass blades could be packed and unpacked almost for free along their XY pivots using the 'Integer/Fraction' method, assuming they are stored in centimeters and you don't mind a one-centimeter precision loss.
This axis could then be used to modulate the amount of grass rotation around the pivot based on the wind, taking into account both the wind direction and the orientation of the grass blades.
Using arithmetic to pack data is fine, but bit-wise operations typically offer much more flexibility and are an essential tool in any tech artist's toolbox.
Bit-packing - Introduction
First, a brief introduction to binary numbers and bit-wise operations.
Let’s take an arbitrary integer: 742
Its binary representation can be easily computed by summing the largest powers of two until the integer is reached:
1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Indeed, $512 + 128 + 64 + 32 + 4 + 2 = 742$
So, the binary representation of 742
is: 1011100110
Bit-wise operations can be performed on this binary number. In the context of bit-packing, these operations often involve:
- Shifting bits to the left with the
<<
operator - Shifting bits to the right with the
>>
operator - Masking a certain number of bits with the
&
operator, also called AND - Combining a certain number of bits with the
|
operator, also called OR
The &
operator outputs 1 if both input bits are 1. For instance, if we want to read just the four rightmost bits, we can perform an AND operation like this:
1011100110
& 0000001111
= 0000000110
We end up with 0110
, which are indeed the four rightmost bits of the original binary number.
This process is called masking, as it discards any bits outside the mask. To create such a mask, the bit-wise <<
operator can be used. If we want to mask out the four rightmost bits, the operation to generate the mask would be: $(1 << 4) - 1$. Let's see why.
First, the <<
operator shifts all bits to the left and fills in the empty bits with zeros. So, shifting the decimal number 1 (which is also 1 in binary) four bits to the left gives us:
0000000001
<< 0000010000
Now, 010000
in binary equals 16
in decimal. Using the formula described above, we know that $1 << 4 = 16$ and we know that $16 - 1 = 15$. 15
in binary is 001111
. Compared to 010000
, the binary representation of 16
, we can observe the following: any power of two minus one simply sets all bits to the right to 1s—handy! This is how we can create the necessary mask to isolate the four rightmost bits.
If we were to mask out the 9 rightmost bits, the same principle applies:
1011100110
& 0111111111 (1 << 9) - 1
= 0011100110
Which translates to this in HLSL code:
uint MyInteger = 742;
uint MaskedBits = MyInteger & ((1 << 9) - 1);
But what if we want to read the 7 leftmost bits? In other words, discard the 3 rightmost bits. Simple! Let's use the >>
to shift 3 bits to the right.
1011100110
>> 0001011100
We end up with 1011100
which are indeed the 7 leftmost bits of the original binary number.
Finally, what if we want to extract the 4 center bits?
1011100110
000XXXX000 ?
First, we shift 3 bits to the left.
1011100110
>> 0001011100
Then we mask the four rightmost bits.
0001011100
& 0000001111 (1<<4) - 1
= 0000001100
We end up with 1100
which are indeed the 4 center bits of the original binary number.
That's cool and all, but what are we supposed to do with all this?
It’s important to understand that when you use the bits of an integer to pack data, the resulting bits no longer represent a valid integer that can be interpreted directly. The resulting decimal value loses its meaningful significance, as the bits are now being used arbitrarily to store data based on your chosen packing method, rather than representing a single integer according to the original format and bit usage.
Let’s consider a 32-bit unsigned integer: 00000000000000000000000000000000
[!NOTE] Signed integers use the most significant bit for the sign and employ Two’s complement which makes bit-packing unpractical.
You might create an algorithm that packs three arbitrary data values—X, Y, and Z—using various numbers of bits depending on the precision needed. For example: XXXXXXXXYYYYYYYYYYYYYYYYYYZZZZZZ
Storing just a 1
in the X data would result in the binary number 00000001000000000000000000000000
, or $1*2^{24}$, which equals 16777216
in decimal. This is why you shouldn’t focus too much on the resulting value—it’s just the outcome of arbitrarily packed bits. The X, Y, and Z components are meant to be individually extracted using the bitwise methods explained above.
- The X component needs to be shifted 24 bits to the left.
- The Y component requires both left-shifting and masking to be properly extracted.
- The Z component would only need to be masked with $(1 << 6) - 1$, or
0111111
.
Which leads to the topic of how to create such binary numbers in the first place.
It’s crucial to understand the precision you're working with. For instance, the X component only occupies 8 of the 32 bits.
As you may know, 8 bits are enough to represent 256 unique values, ranging from 0 to 255. Therefore, any data packed into the X component must first be remapped to the range [0:1], then multiplied by 255 for storage in integer form. During unpacking, once the X component bits are isolated and read, the inverse operation must be performed: division by 255 to return it to the range [0:1], followed by further remapping to restore its original range.
Retrieving the original value is definitely not guaranteed, as it was floored to the nearest of the 256 possible values that the 8 bits can represent.
Similar principles apply to the Y and Z components. The Y component, for example, uses 18 bits of precision with the chosen packing scheme, which allows it to floor the value to the nearest of the 262144 possible values that 18 bits can represent, offering significantly greater precision.
On the other hand, the Z component is more limited, with only 6 bits, which restricts the range of values it can represent. This could be suitable for packing an enum state or similar data.
Having each of the component floored to the nearest integer, bitwise operations can once again be used to pack all bits.
First, the X component would need to be shifted 24 bits to the left.
000000000000000000000000XXXXXXXX
<< XXXXXXXX000000000000000000000000
Second, the Y component would need to be shifted 6 bits to the left.
00000000000000YYYYYYYYYYYYYYYYYY
<< 00000000YYYYYYYYYYYYYYYYYY000000
Third, the resulting bits for the X, Y & Z components might be combined with the |
operator, also called 'OR'
XXXXXXXX000000000000000000000000
| 00000000YYYYYYYYYYYYYYYYYY000000
| 00000000000000000000000000ZZZZZZ
= XXXXXXXXYYYYYYYYYYYYYYYYYYZZZZZZ
And voilà! The X, Y, and Z values are now stored in integer form—with different bit lengths—inside a single 32-bit integer. They can then be extracted using bit-shifting and masking operations, as demonstrated above, and converted back from their integer form to a float in the range [0:1] using a simple division.
- Division by $(1 << 8) - 1$, or 255, for the X component because stored in 8 bits.
- Division by $(1 << 18) - 1$, or 262143, for the Y component because stored in 18 bits.
- Division by $(1 << 6) - 1$, or 63, for the Y component because stored in 6 bits.
The same principle applies to floating-point numbers. A 32-bit float uses 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa (with 1 implicit bit), allowing it to represent numbers—large or small—using scientific binary notation.
For example, the decimal number 5.75
is written as 01000000101110000000000000000000
in pure binary form according to the IEEE 754 standard, or 1.0111 × 2²
in scientific binary notation, which can be read as (1 + 0×2⁻¹ + 1×2⁻² + 1×2⁻³ + 1×2⁻⁴)×2² = 5.75
The key difference when bit-packing into a float instead of an integer is the risk of generating a non-floating-point number, also known as 'NaN' (Not a Number). Floating-point values are designed to represent a wide range of numbers, but they are still limited by 32 bits of precision. When a value exceeds that range, specific bits in the float are all set to 1s, signaling to the system reading the float that the bits do not represent a valid number. Instead, they indicate that whatever was meant to be stored could not be, and the result should be interpreted as 'NaN'.
This is especially important when using the 32 bits of a float to pack arbitrary data, as you might inadvertently set bits in a way that results in a 'NaN'. While this isn’t necessarily a technical problem—the float can’t be interpreted as a valid decimal number, but the bits still contain the data you packed—it can be a dealbreaker in certain scenarios. For example, when packing data into vertex attributes like UVs, most FBX exporters/importers and game engines won’t process a mesh if the UVs contain 'NaNs'.
I’m not yet well-versed in the techniques used to prevent this, but one simple approach is to lock one of the bits checked for 'NaNs' to 0. This effectively reduces the usable bits from 32 to 31, helping avoid accidentally creating a 'NaN' while still allowing you to pack data reliably.
Bit-packing - Two floats
Following the instructions described above, an algorithm can be created to pack two floats in 16 and 15 bits respectively, with one bit left out to prevent 'NaNs'.
Here's the documented packing algorithm: HLSL
And here's the documented unpacking algorithm: HLSL
Bit-packing - Three floats
Following the instructions described above, an algorithm can be created to pack three floats in 11, 10 and 10 bits respectively, with one bit left out to prevent 'NaNs'.
Here's the documented packing algorithm: HLSL
And here's the documented unpacking algorithm: HLSL
Bit-packing - Data corruption
This is a topic partially covered in the vertex animation section, but it's worth emphasizing now that you're more familiar with bit-packing and the idea of manipulating integer or float bits in arbitrary ways. This technique comes with some caveats, and one should be especially cautious about how bit-packed data can be corrupted—most notably due to interpolation.
You're probably already familiar with interpolation: it's the process of blending between two values, A and B, to get a value C that lies somewhere in between. The most common form is linear interpolation, or lerp, using the formula: C = A + (B - A) * t
. If t = 0
, then C = A
; if t = 1
, then C = B
; and if t = 0.25
, then C = A + 0.25B - 0.25A = 0.75A - 0.25B
, and so on.
This arithmetic operation is performed on the decimal values of A and B—regardless of whether they’re stored as integers or floats. The problem is, once you've bit-packed data into these values, the decimal representation becomes meaningless. You're no longer dealing with numbers that should be blended—you're working with structured bit patterns that represent packed information.
So, interpolating between two bit-packed values—no matter how small the blend—can mix completely unrelated and seemingly random values together. This scrambles the bits in a chaotic and unpredictable way. In short, any arithmetic operation on the decimal form of a bit-packed value is very likely to corrupt the data.
The natural solution might seem to be: just don’t perform any operations on bit-packed values! And while that’s the right instinct, the tricky part is that this can happen automatically—and sometimes without you even realizing it.
Let’s assume UVs were used to bit-pack the object’s XYZ position. Depending on the bits stored in each U and V float, the resulting decimal values might read like -0.000002135
or 13548.123
. That’s the nature of bit-packing: once those bits are interpreted as a float or integer, the decimal representation becomes unpredictable and seemingly random, because it's no longer meaningful in a traditional numeric sense.
Previously, I demonstrated how to use data baked into UVs in a vertex shader—for example, retrieving baked pivot points to rotate grass blades around their individual origins. But now let’s say we want to read that data in a pixel shader. Imagine we're trying to generate an emissive color gradient based on pivot data bit-packed into UVs.
Here’s where things get tricky.
A pixel shader interpolates vertex attributes to produce values for each individual pixel. For instance, imagine a triangle where vertex A’s color is red, vertex B’s is green, and vertex C’s is blue. For every pixel inside the triangle, the shader blends these colors based on how close the pixel is to each vertex.
This is done using barycentric coordinates—(λA, λB, λC)—which determine each vertex’s influence over a pixel. These weights are calculated so that: λA + λB + λC = 1
For a pixel in the exact center of the triangle, each weight might be approximately ⅓
, resulting in a perfectly even mix: ⅓A + ⅓B + ⅓C
Now, if each vertex of the mesh contains the exact same bit-packed data in their UVs—like the object’s position—then A = B = C. Logically, interpolating identical values should return the same value, right? This may be more easily demonstrated with a simple lerp. If A = B, then lerp(A, B, t) should always return A, no matter the value of t. The same principle applies to barycentric interpolation.
In theory, yes. In practice, not quite.
Each barycentric coordinate is itself a float with limited precision. For example, ⅓
isn’t exactly representable in binary—it’s approximated. That means the sum of the weights might not be exactly 1.0, just very close to it.
So what does this have to do with data corruption?
Well, remember that the UVs we’re interpolating contain bit-packed data. Their decimal representations might be huge or tiny, but either way, they don’t carry real-world numeric meaning—they're just encoded bits. Performing any arithmetic on this kind of data, even something as subtle as the tiniest amount of interpolation, alters those bits in unpredictable ways.
As a result, the interpolated UV in the pixel shader will most likely not match the original UV stored in the vertex, even if all three vertices had identical values. The small inaccuracies introduced by the floating-point weights will scramble the packed bits, resulting in corrupted data.
This can be visualized in Unreal Engine by using one of the debug material functions to read the bits. Some bits will appear unreadable or glitched. That’s simply because the debug function is evaluated in the pixel shader, where the bit-packed data I stored in UVs is interpolated—unexpectedly scrambling the bits due to floating-point precision issues.
Fortunately, the solution is simple: bit-packed data should be read and unpacked in the vertex shader, where each vertex is processed individually and no interpolation occurs. The resulting unpacked decimal values can then be passed to the pixel shader via vertex attributes. In Unreal Engine, this is easily done using a vertex interpolator. In this case, the actual decoded pivot position is stored in the vertex attributes (extra UVs), so the pixel shader can safely read and interpolate the decoded data (which is fine) and use it to perform the distance check and generate the blend I originally intended. Phew!
Pivot Painter
Pivot Painter 2 uses a special algorithm to store the indices of individual mesh elements in a 16-bit float HDR texture. These elements can be numerous, meaning the indices may reach into the thousands or even tens of thousands. Such a range of integers cannot be accurately represented with a 16-bit float. This wikipedia page includes an insightful precision table showing that beyond 2048, a 16-bit float can only represent integers in steps of two. Beyond 4096, it can only represent them in steps of four, and so on. You can experiment with this in this float-toy. Moreover, storing these indices in a 32-bit float HDR texture would raise memory usage concerns.
The solution is to store a 16-bit integer within the bits of a 16-bit float. One might think that you could simply tell the GPU to store the 16-bit integer in the 16-bit float using the asfloat(int_index)
HLSL method, and then convert it back with asint(float_index)
, but this overlooks an important detail.
Sampling a 16-bit float HDR texture in a material graph in Unreal Engine induces a 32-bit float conversion, scrambling the bits and making the asfloat() and asint() methods unusable. Therefore, a special algorithm must be used to 'split' the integer bits and distribute them across the sign, exponent, and mantissa components of the 16-bit float in a way that survives the 16-bit to 32-bit conversion.
Here's the documented packing algorithm: HLSL
And here's the documented unpacking algorithm: HLSL
[!NOTE] This may seem complex and costly, but it's not. The packing is only applied during the baking process, offline, and the unpacking, computed in the vertex shader, involves just a couple of bitwise operations—the fastest operations. The cost is well buried under the cost of the GPU having to wait for the dependent texture fetches anyway, as discussed here.
'Three Smallest'
The method for packing the three smallest quaternion components is well-documented, so this section will be brief. In short, the method leverages quaternion properties to identify the largest component between the XYZ and W components, discarding it. We’re left with the 'three smallest' components, hence the name of the method. These three components are known to be no greater than $1/sqrt(2)$, thanks to certain quaternion properties, allowing their range to be remapped to minimize precision loss during packing. The three components can be converted into 10-bit integers and stored in 30 bits, leaving 2 bits to store the sign of the largest discarded component in a 32-bit float. This allows the quaternion to be fully reconstructed during decoding with minimal precision loss. While the method can be pushed further to encode a quaternion in less than 30 bits with decent precision, 32 bits are most commonly used for the convenience of the format.
Here's the documented packing algorithm: HLSL
And here's the documented unpacking algorithm: HLSL
Space Switching
The concept of switching space can be a bit tricky at first, but it's something every tech artist should fully understand. It’s particularly relevant when baking data into UVs, Vertex Colors, Normals, or Textures, since this data often describes vertex attributes—like positions—based on how the mesh is transformed (positioned, rotated, and scaled) at the time of baking. If the mesh is transformed again after the bake, space switching might be necessary.
Let’s start simple and assume we’re baking XY pivots into the UVs. To keep things straightforward, we’ll ignore any coordinate system differences between applications (like Blender and Unreal Engine) and assume that the X and Y positions, in meters, are stored directly in the UVs, as-is. To make it even simpler, the mesh consists of a single cube positioned at 0.5, 0.153, and 0.014 in X, Y, and Z respectively.
In this setup, all the vertices of the cube are collapsed onto the UV location that corresponds to its XY position: (0.5, 0.153)—Z is discarded for simplicity's sake.
Now, you might wonder: what exactly is a 'position'? What gives this cube its specific XYZ coordinates in the first place?
Simply put, its XYZ position describes how far it is offset from the origin: (0, 0, 0).
If you had control over that origin and moved it to (0.5, 0.1, 0.0) in XYZ, the cube’s position would then read as (0.0, 0.053, 0.014)—relative to this new origin.
Moving the origin towards the cube is the same as moving the cube towards the origin. This idea of relativity is at the heart of space switching. It's all about deciding what the point of reference in the world is, and how everything else is positioned, rotated, or scaled in relation to it.
Now, if we assume this world origin becomes the mesh’s own origin when exported from Blender to FBX (which is usually the case), the data stored in the UVs would then describe the cube’s XY position relative to the mesh’s origin: (0.5, 0.153). This is what we refer to as local space.
The vertex UVs would still read (0.5, 0.153) regardless of what happens to the object afterward. It could be placed in a different world, offset from the origin, rotated, or scaled—but the values in the UVs would not change. They’ll always represent the position as if the mesh were centered at the world origin, unrotated and unscaled—in other words, the position in local space, relative to the mesh’s own origin.
To summarize, the local space is essentially a self-contained world where the 'static mesh asset' is always at the center, untransformed.
Knowing this, once the mesh is placed in a game engine and further transformed, the pivot stored in the UVs—in local space—would no longer correlate to its actual position in this new world, which is now determined by the mesh’s new transform. This new context is what we refer to as world space.
To go from local to world space, the solution is simple: read the local-space position from the UVs and apply the mesh’s 4x4 world matrix.
A 4x4 matrix is like a magic black box—four rows of four values, precisely ordered so that when a vector is multiplied by this matrix, all the affine transformations it describes are applied: translation, rotation, scale, and even shear.
A 4x4 matrix that doesn't apply any transform is called the identity matrix. It's the equivalent of "1" in matrix multiplication, just like how multiplying by 1 in arithmetic doesn't change a number.
|1|0|0|0|
|0|1|0|0|
|0|0|1|0|
|0|0|0|1|
A translation matrix contains three specific values encoding the X, Y, and Z offset.
|1|0|0|X|
|0|1|0|Y|
|0|0|1|Z|
|0|0|0|1|
A scale matrix has values that define the X, Y, and Z scaling.
|X|0|0|0|
|0|Y|0|0|
|0|0|Z|0|
|0|0|0|1|
A rotation matrix defines the forward, right, and up directions using XYZ unit vectors—just one of many ways to encode a rotation. To put it simply, a rotation matrix for a counterclockwise rotation by an angle θ around the X-axis is:
|1|0|0|0|
|0|cos(θ)|-sin(θ)|0|
|0|sin(θ)|cos(θ)|0|
|0|0|0|1|
While a rotation matrix for a counterclockwise rotation by an angle θ around the Y-axis is:
|cos(θ)|0|sin(θ)|0|
|0|1|0|0|
|-sin(θ)|0|cos(θ)|0|
|0|0|0|1|
These individual matrices can be combined into a single transformation matrix using matrix multiplications in a specific order—typically scale first, then rotation, then translation. When this combined matrix is multiplied by a vector (X, Y, Z, W), all the transformations are applied in sequence.
|a|b|c|d| |X| = |aX + bY + cZ + dW| = |Xₜ|
|e|f|g|h| |Y| = |eX + fY + gZ + hW| = |Yₜ|
|i|j|k|l| × |Z| = |iX + jY + kZ + lW| = |Zₜ|
|m|n|o|p| |W| = |mX + nY + oZ + pW| = |Wₜ|
[!NOTE] To inexperienced tech artists, this might seem quite expensive at first glance—lots of multiplications and additions. But keep in mind that matrix operations are what GPUs do all day long. They’ve been optimized down to the bone and are massively parallelized. That does not mean they’re free, but it does mean you shouldn’t be afraid to use a matrix multiplication here and there, thinking it will hurt performance. It likely won’t and your bottleneck is almost certainly elsewhere. In doubt, profile!
Of course, no one is expected to build 4x4 matrices by hand. It’s cumbersome, error-prone, and generating the axes that describe a rotation involves trigonometry—something no one is likely to compute in their head anyway. Most game engines and DCC software expose ways to build them from individual location, rotation, and scale components.
Back to our mesh example—let’s say it’s simply translated by 1 meter along the X axis in this new world. Its world matrix would reflect that translation.
|1|0|0|1|
|0|1|0|0|
|0|0|1|0|
|0|0|0|1|
- The upper-left 3×3 portion is the identity matrix → no rotation, no scale, no shear.
- The last column [1,0,0,1] adds a translation of +1 along the X-axis.
- The bottom row [0,0,0,1] is standard for homogeneous coordinates.
Now, recall the local XY position we read from the UVs: (0.5, 0.153). To turn this into a full 3D position, we can simply append a 0 for the Z axis, resulting in the vector (0.5, 0.153, 0.0). To perform matrix multiplication, we need to write this vector in homogeneous coordinates by adding a 1 at the end: (0.5, 0.153, 0.0, 1.0).
Multiplying this vector by the mesh’s world matrix gives us the final world-space position of the cube.
(1×0.5 + 0×0.153 + 0×0.0 + 1×1) = 1.5
(0×0.5 + 1×0.153 + 0×0.0 + 0×1) = 0.153
(0×0.5 + 0×0.153 + 1×0.0 + 0×1) = 0
(0×0.5 + 0×0.153 + 0×0.0 + 1×1) = 1
And that makes sense—the cube’s X position was 0.5 at the time of baking (in local space), and the mesh was offset by 1.0 in X in world space, so the result is exactly what we expect: (1.5, 0.153, 0.0).
The same process applies to a mesh that is not only offset from the origin, as in the example above, but also rotated and scaled. The 4x4 world matrix would include all these transformations, and the math remains exactly the same. It won’t be demonstrated here to avoid unnecessary clutter.
[!NOTE] Matrix can be row-major or column-major. This dictates the way elements are laid out in memory, row by row or column by column and affects the way multiplications are performed.
In row-major, you usually write vectors as row vectors on the left: v * M
In column-major, you usually write vectors as column vectors on the right: M * v
If you multiply in the wrong order or mix conventions, your transforms will be wrong—rotations might be skewed, translations might be off, etc.
HLSL expects row-major matrices by default, as does Unreal Engine in general. Going from row-major to column-major, or the other way around, is called transposing a matrix.
Just like building 4x4 matrices, no one is expected to multiply a vector by a matrix manually. Graphics APIs provide handy operators to do this. In HLSL, a simple mul(xyzw, matrix)
will do! Most game engines expose them to the user in various ways. In Unreal Engine’s material graph, for example, this is available as the 'TransformPosition' node.
You might also notice a similar node called 'TransformVector'.
So, what’s the difference? The 'TransformVector' node assumes that the vector you're transforming represents a direction, not a position. Metaphorically, if you're standing in your home and point North, and then do the same in your local grocery store, your orientation remains unchanged, regardless of your location. That’s why 'TransformVector' skips the translation part of the matrix entirely and only applies rotation and scale: uniform scaling changes the vector’s length, while non-uniform scaling can both change its length and skew it, altering its direction!
You’ll also see that both nodes offer several options. For example, transforming a position from local space to camera space is like asking: “What is this position—originally defined relative to the mesh’s own origin—relative to the camera’s world transform?”
Here's a breakdown of the available options:
- Tangent Space: Relative to the object’s surface at each pixel. Only available in the 'TransformVector' node and relevant in pixel shaders.
- Local Space: Relative to the object’s own origin.
- Absolute World Space: Relative to the world origin. Simply named 'World Space' in the 'TransformVector' node.
- Periodic World Space: Similar to world space, but with the world origin moved to the center of the tile the camera is in, based on a given tile size. Logically, this is similar to
(CameraAbsoluteWorldPosition % TileSize + CameraRelativeWorldPosition)
. It offers better precision and scalability than regular 'World Space', especially for large worlds. Only available in the 'TransformPosition' node. - Camera Relative World Space: Same as world space (i.e., world space rotation and scale), but with the position relative to the camera.
- Camera Space: Relative to the camera. Only available in the 'TransformPosition' node.
- View Space: Similar to camera space, but in cases like shadow passes, where the 'camera' might not technically be the camera, but rather the light’s point of view.
- Instance & Particle Space: Relative to each individual instance in an instanced static mesh or to each particle in a particle system.
Hopefully, this brief chapter has shed some light on the principle of switching space, a fundamental concept in graphics programming, tech art, and game development in general.
The key takeaway is consistency. You can retrieve data in local space and perform computations there before transforming it into world space. Or, you can retrieve data in local space, transform it to world space, and then perform computations.
The two material graphs above produce the exact same result: an oscillating rotation about the world X axis, around a pivot point stored in the UVs. Notice how all inputs to the 'Rotate About Axis' node are in the same space, and its output remains in that space as well—meaning it must be transformed to world space before feeding into the material’s 'World Position Offset', which, as the name implies, operates in world space.
- The vector (1, 0, 0) is an arbitrarily chosen direction representing the X axis in Unreal Engine's world coordinate system—hence, it's in world space. This is why it must be made relative to the mesh: if the mesh is rotated 90°, then the world X axis effectively becomes the world Y axis—(0, 1, 0) relative to it. As explained earlier, the 'TransformVector' node works with direction vectors and applies the scale and rotation matrix. This means the vector’s length may be affected by the mesh’s scale. In my case, the mesh was scaled down by around 60% in world space, so the (1, 0, 0) vector, when transformed to local space, is no longer normalized—hence the need to normalize it and remove the scaling effect.
- Pivots are read from UVs in local space, and they can be used as rotation pivot points if the vertex position is also read in local space. Otherwise, a transformation is necessary to ensure both the pivot point and the vertex position to rotate are in the same space.
- The 'Rotate About Axis' node outputs the rotated vertex offset: $ComputedRotatedPosition - InputPosition$. This offset is in the same space as the input position. If the input is in local space, you need to transform the offset (treating it as a vector, not a position) from local to world space in order to wire it into the 'World Position Offset' pin.
[!NOTE] The 'Local Position' node was recently introduced in UE and replaces the now-deprecated 'Pre-Skinned Local Position' node. Technically, it's not even necessary, since transforming 'Absolute World Position' from world to local space gives the same result—just with an unnecessary matrix multiplication.
[!IMPORTANT] One may use normals for storing arbitrary data with caution for several reasons previously explained, but also because of the automatic space switching that typically occurs with normals, unlike data baked in UVs, Vertex Color, or Textures.
Normals are automatically updated to match the mesh's world-space orientation, even though game engines typically also provide ways to access the local-space normals initially stored in the vertices. However, this option may only be available in vertex shaders, like in Unreal Engine, where a vertex interpolator is required to store the normals in vertex attributes for the pixel shader to interpolate.
Custom/Vertex Interpolators
The following section explains what’s happening behind Unreal Engine’s Vertex Interpolator material node, which can initially obscure a somewhat puzzling mechanism. However, the underlying principle applies across game engines—for example, Unity’s Custom Interpolators feature. Vertex Interpolators have been mentioned several times throughout this wiki and, on occasion, partially explained. Still, I felt the concept was important enough to deserve its own dedicated section. I personally thought I had a solid grasp of interpolators—turns out, I didn’t! Thanks to Deathrey for the clarifications.
First, one has to understand the concept between a vertex shader (VS) and a pixel shader (PS).
Feature | Vertex Shader | Pixel Shader (Fragment Shader) |
---|---|---|
Executes Per | Vertex | Fragment (potential screen pixel) |
Input | Vertex data: position, normal, UVs, etc. | Interpolated data from vertices attributes (position, UVs, ...) |
Output | Transformed vertex position (to screen space) | Final pixel color (and sometimes depth or other data) |
Main Use | Geometry transformation | Lighting, texturing, coloring... |
Stage in Pipeline | Early (before rasterization) | Later (after rasterization) |
To summarize, both operate at different stages of the pipeline—the vertex shader before the pixel shader—and have different inputs and outputs for different purposes: one processes vertices, the other processes fragments. A fragment is a candidate for a pixel, generated during rasterization. It's considered a potential pixel because not every fragment becomes visible—some may be discarded due to depth testing, alpha blending, stencil testing, clipping, and so on.
This isn't particularly well visualized in Unreal Engine's material graph, but the following illustration might help:
Any graph built and connected to a pixel shader pin is evaluated in the pixel shader. Similarly, graphs built and connected to a vertex shader pin (WPO, essentially) are evaluated in the vertex shader.
There’s one exception: the Vertex Interpolator node, which can be summarized with the following illustration.
A node graph connected to the Vertex Interpolator node is computed in the vertex shader and the result is 'sent' to the pixel shader. But how does a vertex shader send data to a pixel shader? Simply put—through interpolation. Interpolation is a built-in part of the GPU’s fixed-function pipeline. It’s not programmable like shaders and is handled automatically during rasterization. When a triangle is rasterized, the GPU takes the outputs from the vertex shader (such as colors, UVs, normals, etc.) and interpolates them across the triangle’s surface for each fragment using barycentric coordinates based on the triangle’s three vertices.
Here is an illustration of vertex UVs being interpolated for a given fragment using barycentric coordinates. Each of the triangle’s three vertices is assigned a weight, and the sum of all three weights equals one.
This interpolation happens after the vertex shader but before the pixel shader runs, so interpolated values are available in the pixel shader.
Unlike constant buffers or textures, interpolators don’t use GPU memory buffers that you manage directly. Instead:
- Values are passed from the vertex shader via temporary registers.
- These are then interpolated by hardware, and the interpolated results are passed to the pixel shader through registers. In rare cases, such as with Nanite, interpolation might be handled in software rather than by the fixed-function hardware pipeline.
- The vertex cache helps avoid redundant computation for shared vertices.
GPUs have a limited number of registers for interpolation—32 floats on Shader Model 6 in Unreal Engine, and likely fewer on mobile platforms. Using too many interpolators can reduce the number of registers available for other operations, and may force the compiler to spill values into memory, which is slower. Additionally, each interpolated value consumes clock cycles per pixel, and excessive use can put extra strain on the vertex cache.
That said, these limitations typically shouldn’t deter you from using vertex interpolators. When used appropriately, the performance and flexibility benefits often far outweigh the costs.
There are usually three main reasons to use a vertex interpolator:
- To access data in the pixel shader that isn’t otherwise exposed
- To offload computation from the pixel shader to the vertex shader for potential performance gains
- To control what's interpolated
Performance is all about the trade-off between vertex and pixel processing. Imagine a rock displayed at medium distance on a 4K screen. The mesh would cover a large number of pixels—likely far more than the number of vertices it contains. This means any computation done in the pixel shader is repeated many more times than if it were done in the vertex shader. In such cases, moving computation from the pixel shader to the vertex shader can lead to a performance boost. However, this isn’t always true.
For example, the mesh may move farther from the camera and appear smaller, but if it has no LODs (or poorly implemented ones), the vertex count remains unchanged. The cost of the vertex shader stays the same, while the pixel shader cost drops significantly. At some point, the performance balance tips. Also, keep in mind that pixel shaders run after rasterization, on the visible fragments. So the back side of the mesh doesn’t affect pixel shader cost. That’s not the case with vertices: all of them are always processed in the vertex shader.
[!IMPORTANT] Virtualized geometry (Nanite) forces a shift in thinking, as the statements above no longer fully apply. Vertex density is now tied to screen size and scales similarly to pixel density, which changes how you evaluate and balance vertex versus pixel processing.
Furthermore, not all computations can be performed on vertices. Pixel interpolation is often needed to access interpolated UVs, as shown in the illustration below. A spherical gradient can’t be magically created in the middle of a 4-vertex plane in the vertex shader. More technically, it can be created, but the values computed at the vertex level would simply be 0.0 (outside the sphere mask), meaning the pixel shader would only have 0 values to interpolate with.
Dynamic changes in topology can also cause visual artifacts when vertex interpolators are used to drive visual parameters. Virtualized geometry (like Nanite) is a good example—it can break shaders that rely heavily on vertex interpolators.
Some data must be passed to the pixel shader via vertex interpolators because vertex and pixel shaders take different inputs and operate at different stages—before and after rasterization, respectively. This also depends on the rendering engine and what data the developers decide to pass to each shader. A good example is Unreal Engine’s Pre-Skinned Nodes, which aren’t directly available in the pixel shader, but can be accessed if passed through a vertex interpolator.
Another example is the PixelNormalWS node which, as the name suggests, provides data that’s only available per pixel—thanks to rasterization. Pixels only 'exist' after rasterization, which is made possible by triangles formed from vertices. Since vertices are processed first, rasterization hasn’t occurred yet. The Fresnel node is another case where the dot product between the view direction and the surface normal—which doesn't exist yet at the vertex level—is used. To summarize, imagine a point cloud, where each point represents a vertex—that’s what the vertex shader operates on. Anything related to the mesh’s surface can’t be accessed in the vertex shader because the surface hasn’t been rasterized yet!
[!NOTE] Obviously, you can't use a vertex interpolator inside a vertex shader (WPO). The pixel shader is executed much later than the vertex shader, and the rendering pipeline flows in one direction—so a vertex can't rely on pixel interpolation. It simply doesn't make sense.
The Vertex Interpolator node is often referenced in Unreal Engine’s documentation and various tutorials as a way to simplify the now-deprecated method of using custom UVs to perform computations in the vertex shader and store the results in UV channels for the pixel shader to interpolate. Both methods are still valid today and are not mutually exclusive. Vertex Interpolators and custom UVs both go through the same fixed-function interpolation stage and use registers, but Vertex Interpolators do not 'use' custom UVs. You still retain access to all 8 UV sets as usual, as far as I know.