Shader Tips, Tricks and Optimizations - crosire/reshade-shaders GitHub Wiki

Note: A lot of this is taken from the Minecraft Shaders wiki

Implementation of Built-in Functions

a / b == a * rcp(b)

1.0/a == rcp(a)

sqrt(a) == a * rsqrt(a)

pow(a, b) == exp2(log2(a) * b)

exp(a) == exp2(a * constant) // constant == log2(e) or 1/log(2) or about 1.44269504088896340736

normalize(a) == a * rsqrt(dot(a, a))

lerp(a, b, c) == (b-a) * c + a

Intrinsic macros

distance(a, b) == length(b-a) // length is another macro

length(a) = sqrt(dot(a, a)) // see sqrt above

Tips

Negation (-) and abs are instruction modifiers and free if used on input.

Saturate is a free instruction modifier if used on output.

(a * b + c) is called MAD form and results in a single MAD instruction (MAD == Multiply And Add). It's one way to do more math using less resources.

Another is the dot product, which can be used to multiply vectors (float2, 3, and 4) together and sum the result in a single cycle.

dot(float4(a, b, c, d), float4(a, b, c, d)) == a*a + b*b + c*c + d*d // ASM for this is DP4

If followed by an addition the dot product of two float2s or two float3s will turn into DP2_ADD and DP3_ADD both of which are single cycle instructions as well, and it can also be used to sum up by multiplying with 1.

dot(float4(a, b, c, d), float4(1.0, 1.0, 1.0, 1.0))

Remember that subtraction can be done by adding a negative number and since negation is a free number this works fine with subtraction too.
Any time you need 4 or fewer pairs of multiply operations involving scalars and/or constants added and/or subtracted together a dot() will do.

If the MAD form is not possible to use, you might still save an instruction when multiplying by using a common multiplier.
Multiply by 0.5, 2.0 or 4.0 and the GPU driver will turn this into a free modifier for most scalar instructions.
Note that this is not done by the HLSL or GLSL compiler but done after by the driver and thus you cannot see this being done by examining the ASM output.
You can even get creative and multiply by a common multiplier in one part and then later multiply by another common multiplier to get certain non-common multipliers for free.
Say you need to multiply the end result by 0.25. This might also be done by multiplying a partial result by 0.5 and at the end multiply by 0.5.
Be aware however that because the compiler does not know about this optimization it might fight you and try to combine both the 0.5 multipliers into one 0.25 multiplier, thinking it's helping when in fact it is doing you a disservice.

Tricks

Check if two values are both 0 or not?

if (abs(a) == -abs(b)) // Are both values 0?  
//Single instruction compare because abs and negation are free here.

if (abs(a) != -abs(b)) // Is any value not 0?  
//A faster any() if you just need to compare two scalars and not vectors.

Inside the 0 to 1 range or not?

float2 range = saturate(uv * uv - uv);
bool is_inside = range.x == -range.y; //abs is not needed because the MAD above ensures range is always positive or 0

float2 range = saturate(uv * uv - uv);
bool is_outside = range.x != -range.y; //and of course if we are not inside we are outside.

Faster smoothstep

x = x * x * (3.0 - 2.0 * x) // faster smoothstep(0.0, 1.0, x) alternative  
//Only works in the 0 to 1 range though.

Identities

pow(a+b, 2.0) == (a+b) * (a+b) == a*a + 2.0*a*b + b*b ==
    a*a + a*b + a*b + b*b == dot(float4(a, a, a, b), float4(a, b, b, b))

pow(a-b, 2.0) == (a-b) * (a-b) == a*a – 2.0*a*b + b*b ==
    a*a - a*b - a*b + b*b == dot(float4(a,-a,-a, b), float4(a, b, b, b))

pow(a+b, 2.0) + pow(a-b, 2.0) == (a+b) * (a+b) + (a-b) * (a-b) ==
2.0 * (a*a + b*b) == dot(float4(a, a, b, b), float4(a, a, b, b)) 

pow(a+b, 2.0) - pow(a-b, 2.0) = (a+b) * (a+b) - (a-b) * (a-b) ==
(a * b) * 4.0 == dot(float4(a, a, a, a), float4(b, b, b, b))

(a+b) * (a–b) == a*a – b*b = dot(float2(a, -b), float2(a, b))

(x+a) * (x+b) == x*x + x*(a+b) + a*b == x*x + x*a + x*b + a*b ==
dot(float4(x, x, x, a), float4(x, a, b, b))

pow(a, 3.0) + pow(b, 3.0) == (a*a*a) + (b*b*b) == dot(float3(a, -a, b), float3(a, b, b)) * (a + b) 

pow(a, 3.0) - pow(b, 3.0) == (a*a*a) - (b*b*b) == dot(float3(a, a, b), float3(a, b, b)) * (a - b) 

pow(x, 1.5) == (x * x) * rsqrt(x) // because rsqrt(x) == pow(x,-0.5).
// BTW `sqrt(x) == pow(x,0.5)` but using a `sqrt` would be slower than using a `rsqrt`
// this is because rsqrt is so commonly used it has its own instruction

exp(a) * exp(b) == exp(a + b)

pow(pow(a, b), c) == pow(a, b * c)

a / pow(b, c) == a * pow(b, -c)

log(a) + log(b) == log(a*b)

log(a/b) == log(a) - log(b)

log(pow(a, b)) == b * log(a)

log(sqrt(a)) == log(a) * 0.5

cross(a, cross(b, c)) == b * dot(a, c) - c * dot(a, b)

References

Low-Level GPU Documentation (web archive link, most links to pdfs are saved, try a different snapshot if needed)
More math identities than you could ever need (web archive link, all pages viewable, try a different snapshot if needed)
Simplify your math with Wolfram Alpha
GLSL Optimizations