Comparing Speed of Different Compression Algorithms - svanoort/rest-compress GitHub Wiki

#Why? You want to know when to compress dynamic data, and what will give YOU the best performance for your systems, with YOUR data.

Normally compression benchmarks give compression ratio (compressed size / original size), and as a speed (MB/S). This is fine, assuming that a file is compressed once and copied many times. It does not give the whole picture for dynamic content, because you must spend time compressing the data on the server and decompressing on the client. This takes time. If it takes too long, you spend longer processing than it takes to send the data.

#Goal: Find which compression method delivers the highest data throughput (and thus fastest transmission), once compression ratio and round-trip time to compress and decompress are factored in.

#Definitions: To = time to send data, uncompressed

Bo = bytes of data (uncompressed)

Bc = bytes of data (compressed)

Vo = velocity at which data is sent (real bandwidth, with no compression, in MB/s)

Rc = Compression ratio (bytes compressed / bytes uncompressed)

Vc = velocity (speed) at which data can be compressed and then decompressed (round trip) (MB/S)

Tc = time to compress data

Tf = final compressed time to send all the data

Vf = speed at which data is sent, with compression, including compress

#What's the real speed with compression (including processing time)? T = B / V

V = B / T

To = Bo / Vo :: Uncompressed transmission time

Tc = Bo / Vc

Bc = Bo * Rc

Tf = Bc / Vo + Bo/Vc = (Bo * Rc)/(Vo) + Bo/Vc = Bo(Rc/Vo + 1/Vc)

Vf = B0 / Tf = B0/(B0*((Rc/Vo + 1/Vc)) = 1/(Rc/Vo + 1/Vc)

If Vf > Vo, then final throughput is higher with compression than without. This occurs when: Vo < 1/(Rc/Vo + 1/Vc)

Take reciprocal of both: 1/Vo < (Rc/Vo + 1/Vc)

Multiply by Vo: 1 < Rc + Vo/Vc

Subtract Rc: 1-Rc < Vo/Vc

Which is to say use compression when the ratio fraction of bytes removed is less the ratio of compress to transmit speed. This intuitively makes sense: if you reduce data by 50%, (1-Rc = 0.5), then you're cutting transmission time in half, but the compression algorithm still has to handle the original, full-sized data, so it must run twice as fast!

So, to sum up: ##Use compression when 1-Compression_Ratio < Network Bandwidth / Compression Speed

#Multiple Compression Algorithms: This is a bit trickier. For this, we're going to use V1 and V2 for speeds, R1 and R2 for ratios of algorithm 1 and 2.

V1 = 1/(R1/Vo + 1/V1)

V2 = 1/(R2/Vo + 1/V2)

When is V1 > V2?

1/(R1/Vo + 1/V1) > 1/(R2/Vo + 1/V2)

Take reciprocal of both sides: R1/Vo + 1/V1 > R2/Vo + 1/V2

Subtract 1/V1 from both sides: R1/Vo > R2/Vo + 1/V2 - 1/V1

Subtract R2/Vo from both sides: R1/Vo - R2/Vo > 1/V2 - 1/V1

Simplify: (R1 - R2) / Vo > 1/V2 - 1/V1

Multiply both sides by Vo: R1 - R2 > Vo(1/V2 - 1/V1)

Divide by the subtracted reciprocals: (R1 - R2)/(1/V2 - 1/V1) > Vo

Flipping it around: V1 is higher than V2 when:

Vo < (R1 - R2)/(1/V2 - 1/V1)

Plug in your benchmarked compression ratios and compression speeds, and that'll tell you the tipping point at which you get the best performance from one or the other.