serpent_audit - xero/leviathan-crypto GitHub Wiki
Audit of the leviathan-crypto WebAssembly Serpent-256 implementation (AssemblyScript) against the Serpent AES submission reference, with full security analysis covering all published cryptanalytic results and the Encrypt-then-MAC composition used by SerpentCipher.
- 1. Algorithm Correctness
- 2. Security Analysis
- 2.1 Side-Channel Analysis
- 2.2 Cryptanalytic Attack Papers
- Paper 1 β Amplified Boomerang Attacks (FSE 2000)
- Paper 2 β Chosen-Plaintext Linear Attacks (IET 2013)
- Paper 3 β Differential-Linear Attack on 12-Round Serpent (FSE 2008)
- Paper 4 β Linear Cryptanalysis of Reduced Round Serpent (FSE 2001)
- Paper 5 β The Rectangle Attack (EUROCRYPT 2001)
- Consolidated Verdict Table
- Final Assessment
- 2.3 Biclique Cryptanalysis (Full 32-Round)
- 2.4 SerpentCipher: Verify-then-Decrypt and the Cryptographic Doom Principle
| Meta | Description |
|---|---|
| Conducted: | Week of 2026-03-09 |
| Target: |
leviathan-crypto WebAssembly implementation (AssemblyScript) |
| Reference implementation: | Serpent AES submission C code (floppy1/serpent-reference.c, Frank Stajano) |
| Spec: | Serpent AES submission, Anderson/Biham/Knudsen 1998 |
| Test vectors: | AES submission KAT (serpent_ecb_vt/vk/tbl/iv.txt), Monte Carlo (serpent_*_e_m/d_m.txt), NESSIE 128/192/256 |
Note
A complete mirror of the AES submission floppy1 package is preserved on the
floppy1 branch,
including the original test vectors, reference implementation, and S-box
tables. Also included is the ctr_harness we developed for CTR mode vector
generation, a self-generated tool with no external authority.
We implement all 8 forward S-boxes (sb0βsb7) and 8 inverse S-boxes (si0βsi7) as Boolean logic circuits in AssemblyScript (src/asm/serpent/serpent.ts:86β233). Each function operates on 5 working register slots using rget/rset helpers that read/write fixed memory offsets in WASM. The operations are exclusively &, |, ^, and ~βno lookups, no data-dependent branches.
These Boolean expansions match the 4-bit lookup tables in the reference C implementation (serpent-reference.c, SBox[8][16] and SBoxInverse[8][16]). The reference tables are:
S0: { 3, 8,15, 1,10, 6, 5,11,14,13, 4, 2, 7, 0, 9,12 }
S1: {15,12, 2, 7, 9, 0, 5,10, 1,11,14, 8, 6,13, 3, 4 }
S2: { 8, 6, 7, 9, 3,12,10,15,13, 1,14, 4, 0,11, 5, 2 }
S3: { 0,15,11, 8,12, 9, 6, 3,13, 1, 2, 4,10, 7, 5,14 }
S4: { 1,15, 8, 3,12, 0,11, 6, 2, 5, 4,10, 9,14, 7,13 }
S5: {15, 5, 2,11, 4,10, 9,12, 0, 3,14, 8,13, 6, 7, 1 }
S6: { 7, 2,12, 5, 8, 4, 6,11,14, 9, 1,15,13, 3,10, 0 }
S7: { 1,13,15, 0,14, 8, 2,11, 7, 4,12,10, 9, 3, 5, 6 }
Correctness is established by the full test-vector suite: KAT, S-box entry tests (ecb_tbl), NESSIE vectors, and Monte Carlo tests all pass.
The lk function (serpent.ts:272β294) implements the forward linear transform (LT) combined with the next round's key XOR. The spec defines a 10-step bitslice transform. Here's how we implement each step:
| Step | Spec | leviathan-crypto (lk) |
|---|---|---|
| 1 | Xβ = Xβ <<< 13 | rset(a, rotl<i32>(rget(a), 13)) |
| 2 | Xβ = Xβ <<< 3 | rset(c, rotl<i32>(rget(c), 3)) |
| 3 | Xβ = Xβ ^ Xβ ^ Xβ |
rset(b, rget(b) ^ rget(a)) ... rset(b, rget(b) ^ rget(c))
|
| 4 | Xβ = Xβ ^ Xβ ^ (Xβ << 3) |
rset(e, rget(a) << 3) ... rset(d, rget(d) ^ rget(c)) ... rset(d, rget(d) ^ rget(e))
|
| 5 | Xβ = Xβ <<< 1 | rset(b, rotl<i32>(rget(b), 1)) |
| 6 | Xβ = Xβ <<< 7 | rset(d, rotl<i32>(rget(d), 7)) |
| 7 | Xβ = Xβ ^ Xβ ^ Xβ |
rset(a, rget(a) ^ rget(b)) ... rset(a, rget(a) ^ rget(d))
|
| 8 | Xβ = Xβ ^ Xβ ^ (Xβ << 7) |
rset(e, rget(e) << 7) ... rset(c, rget(c) ^ rget(d)) ... rset(c, rget(c) ^ rget(e))
|
| 9 | Xβ = Xβ <<< 5 | rset(a, rotl<i32>(rget(a), 5)) |
| 10 | Xβ = Xβ <<< 22 | rset(c, rotl<i32>(rget(c), 22)) |
All rotation amounts match the spec: 13, 3, 1, 7, 5, 22 for the forward transform.
The kl function (serpent.ts:297β319) implements the inverse linear transform with correct inverse rotations:
| Forward | Inverse | Verify: (fwd + inv) mod 32 |
|---|---|---|
| ROTL(13) | ROTL(19) | 13 + 19 = 32 |
| ROTL(3) | ROTL(29) | 3 + 29 = 32 |
| ROTL(1) | ROTL(31) | 1 + 31 = 32 |
| ROTL(7) | ROTL(25) | 7 + 25 = 32 |
| ROTL(5) | ROTL(27) | 5 + 27 = 32 |
| ROTL(22) | ROTL(10) | 22 + 10 = 32 |
Note
In the WASM implementation, rotl<i32> is an AssemblyScript built-in that compiles to the WASM i32.rotl instruction, a single CPU instruction on all modern architectures. The TypeScript version required a manual rotW function with masking (& this.wMax) to preserve 32-bit arithmetic in JavaScript; the WASM version does not need this because i32 is natively 32-bit.
loadKey (serpent.ts:350β408) implements the full Serpent key schedule. The process has three stages:
Key loading and padding (lines 351β372):
The function validates the key (16, 24, or 32 bytes; returns -1 otherwise), then zeros the 132-word subkey buffer. It sets a padding bit by storing 1 at word position keyLenβmatching the reference C shortToLongKey(). Next, it reverse-copies key bytes (key[k] = input[keyLen - k - 1]), following the AES submission byte ordering convention (see 1.5). Finally, it repacks 8 groups of 4 byte-valued words into 8 little-endian uint32 words.
Prekey expansion (lines 383β392):
w[i] = (w[i-8] ^ w[i-5] ^ w[i-3] ^ w[i-1] ^ 0x9E3779B9 ^ i) <<< 11
The implementation uses a sliding-window approach with 5 working registers r[0..4] and a keyIt helper that reads from SUBKEY_OFFSET + a*4 and XORs with rget(b), rget(c), rget(d), 0x9e3779b9, and i, then applies rotl<i32>(..., 11). The iteration pattern (two keyIt calls per loop iteration before the break check, then three more) produces the same 132-word sequence as the reference C makeSubkeysBitslice() function.
Reference C (serpent-reference.c, makeSubkeysBitslice):
w[i] = rotateLeft(w[i-8] ^ w[i-5] ^ w[i-3] ^ w[i-1] ^ PHI ^ i, 11)The golden ratio constant Ο = 0x9E3779B9 matches in both implementations.
Subkey derivation (lines 395β405):
Round keys are derived by applying bitslice S-boxes to groups of 4 prekey words. The S-box selection follows S_{(3-n) mod 8} per the spec:
- Kββ uses Sβ, Kββ uses Sβ, Kββ uses Sβ , ..., cycling through all 8 S-boxes.
The implementation iterates from ri=128 down to ri=0 (Kββ down to Kβ), using kc(n) to encode the register slot permutations and rj starting at 3 to select the S-box (rj % 8).
Reference C applies S-boxes via nibble extraction and scatter: for each of the 32 nibble positions across 4 words, extract the nibble, apply S[whichS], and scatter back. The WASM bitslice approach applies the same S-box to all 32 bits simultaneously via Boolean circuits. Mathematical equivalence is confirmed by test vectors.
Encryption (serpent.ts:414β450):
Load PT (byte-reverse) β K(0) XOR β 32 rounds β K(32) XOR β Store CT (byte-reverse)
Round structure for n = 0..31:
- Rounds 0β30:
S[n % 8]βLK(linear transform + K(n+1) XOR) - Round 31:
S[31 % 8] = S[7]βK(32)XOR (no linear transform)
This matches the spec exactly. The reference C encryptGivenKHat():
- Rounds 0β30: XOR KΜα΅’, Ε(i), LT
- Round 31: XOR KΜββ, Ε(31), XOR KΜββ (no LT)
Decryption (serpent.ts:456β490):
Load CT (byte-reverse) β K(32) XOR β 32 inverse rounds β K(0) XOR β Store PT (byte-reverse)
Inverse round structure for n = 0..31:
- Round 0 (first decryption round):
SI[7]βKL(inverse LT + K(31) XOR) - Rounds 1β30:
SI[7 - n%8]βKL(inverse LT + K(32-n) XOR) - Round 31 (last decryption round):
SI[0]βK(0)XOR (no inverse LT)
The inverse S-box selection SI[7 - n%8] correctly reverses the forward order: SI7, SI6, SI5, SI4, SI3, SI2, SI1, SI0, SI7, SI6, ...
The final K(0) XOR in decryption uses slots (2, 3, 1, 4) rather than the default (0, 1, 2, 3), because the last inverse S-box (SI0) leaves its output in a non-standard register arrangement. The plaintext is stored from registers r[4, 1, 3, 2]. This is the correct slot permutation for the decrypt path, as determined by the DC constant encoding.
leviathan-crypto uses the original Serpent AES submission byte ordering:
Input loading (encrypt, serpent.ts:418β422):
r[0] = bytes[15..12] as LE uint32 (MSW of reversed block)
r[1] = bytes[11..8] as LE uint32
r[2] = bytes[7..4] as LE uint32
r[3] = bytes[3..0] as LE uint32 (LSW of reversed block)
Output storing (encrypt, serpent.ts:441β449):
ct[0..3] = r[3] as BE bytes
ct[4..7] = r[2] as BE bytes
ct[8..11] = r[1] as BE bytes
ct[12..15] = r[0] as BE bytes
This is not the NESSIE convention. NESSIE test vectors require word-reversal and byte-swap preprocessing before comparison, which the NESSIE test harness handles. The AES submission vectors (floppy4/) work directly with this convention.
Key loading (serpent.ts:361β362):
key[k] = input_byte[keyLen - k - 1]
Key bytes are reversed before packing as LE uint32 words. This matches the reference C makeUserKeyFromKeyMaterial() function, which processes hex digits in the same reversed order.
serpent_unrolled.ts is auto-generated (bun bench/generate_unrolled.ts) and contains fully unrolled versions of encryptBlock and decryptBlock. All 32 rounds are expanded with hardcoded slot indices (the EC/DC constant values pre-resolved at generation time).
The unrolled variant:
- Imports
sb0βsb7,si0βsi7,lk,kl,keyXorfromserpent.ts(same functions, not copies). - Uses identical byte-reversal load/store logic.
- Round 31 correctly skips the linear transform in both encrypt and decrypt.
- The S-box and slot assignments in each expanded round match the values that the loop-based version would produce via EC/DC lookup.
This is the version called by CTR and CBC modes (index.ts re-exports encryptBlock_unrolled as encryptBlock). The design intent is to enable V8 TurboFan alias analysis to promote working registers from WASM linear memory to CPU registers, since all slot indices are compile-time constants.
CTR mode (ctr.ts):
- Counter format: 128-bit little-endian, byte 0 is LSB.
-
resetCounter(): copies NONCE buffer to COUNTER buffer (nonce = initial counter value). -
incrementCounter(): LE byte-by-byte increment with carry propagation. Early-exit on no carry is a minor timing leak (see 2.1). -
processBlock(): copies counter to BLOCK_PT, encrypts, XORs keystream with plaintext. Supports partial final blocks (1β16 bytes). -
setCounter(lo, hi): absolute 128-bit counter positioning for worker pool parallelism.
CBC mode (cbc.ts):
-
cbcEncryptChunk(): C[i] = Encrypt(P[i] XOR C[i-1]), C[-1] = IV. Requireslento be a positive multiple of 16. -
cbcDecryptChunk(): P[i] = Decrypt(C[i]) XOR C[i-1]. Reads from CHUNK_CT; original ciphertext is preserved because decryptBlock writes to BLOCK_PT, not CHUNK_CT. - Chaining block (CBC_IV_OFFSET) is updated in-place after each chunk.
- PKCS7 padding is handled in the TypeScript wrapper, not in WASM.
Both modes delegate to encryptBlock_unrolled / decryptBlock_unrolled.
The Serpent WASM module uses static buffer allocation in linear memory (buffers.ts):
| Offset | Size | Name | Purpose |
|---|---|---|---|
| 0 | 32 | KEY_BUFFER | Raw key bytes (padded to 32) |
| 32 | 16 | BLOCK_PT_BUFFER | Single-block plaintext |
| 48 | 16 | BLOCK_CT_BUFFER | Single-block ciphertext |
| 64 | 16 | NONCE_BUFFER | CTR nonce |
| 80 | 16 | COUNTER_BUFFER | 128-bit LE counter |
| 96 | 528 | SUBKEY_BUFFER | 33 subkeys Γ 4 words Γ 4 bytes |
| 624 | 65,536 | CHUNK_PT_BUFFER | Streaming plaintext |
| 66,160 | 65,536 | CHUNK_CT_BUFFER | Streaming ciphertext |
| 131,696 | 20 | WORK_BUFFER | 5 Γ i32 working registers |
| 131,716 | 16 | CBC_IV_BUFFER | CBC chaining block |
Total: 131,732 bytes < 196,608 (3 Γ 64KB pages).
No dynamic allocation (memory.grow()) is used. All offsets are compile-time constants. Buffer regions do not overlap. The WORK_BUFFER (5 registers) is placed after the large chunk buffers to avoid any possibility of chunk data overwriting working state.
wipeBuffers() (serpent.ts:495β506) zeros all buffers: KEY (32B), BLOCK_PT (16B), BLOCK_CT (16B), NONCE (16B), COUNTER (16B), SUBKEY (528B), WORK (20B), CHUNK_PT (64KB), CHUNK_CT (64KB), CBC_IV (16B). This covers all sensitive material including key material and intermediate state.
The TypeScript classes (src/ts/serpent/index.ts) provide the public API:
Serpent: ECB-level operations. Validates key length (16/24/32), copies key to WASM memory, calls loadKey(). encryptBlock()/decryptBlock() validate 16-byte block size. dispose() calls wipeBuffers().
SerpentCtr: CTR mode streaming. beginEncrypt(key, nonce) validates key and 16-byte nonce, loads key and resets counter. encryptChunk() validates chunk β€ CHUNK_SIZE. decryptChunk() delegates to encryptChunk() (CTR is symmetric). JSDoc carries authentication warning.
SerpentCbc: CBC mode with PKCS7. encrypt() applies pkcs7Pad() in TypeScript, processes in 64KB chunks via WASM. decrypt() validates ciphertext is a non-zero multiple of 16, processes chunks, then applies pkcs7Strip(). JSDoc carries authentication warning.
PKCS7 validation (pkcs7Strip in shared-ops.ts): Uses constant-time XOR-accumulate validation over a fixed 16-byte tail window. The per-byte pad-region mask is derived arithmetically as ((16 - padLen - i - 1) >> 31) & 0xff β a signed arithmetic shift collapses the sign of 16 - padLen - i - 1 to 0xff when i >= 16 - padLen and 0x00 otherwise, with no ternary, no Math.min/Math.max, and no secret-derived clamp. Every pad byte is examined on every call; the result is accumulated into a bad flag and folded with the out-of-range checks (padLen == 0, padLen > 16) before a single terminal throw. This closes the Vaudenay-2002 padding-oracle surface at the validation level β no engine branch is emitted over the secret byte, and no length-dependent work is done between the tail read and the throw.
The TypeScript layer performs no cryptographic computation. It writes inputs to WASM memory, calls WASM exports, and reads outputs. This is the correct architecture per ARCHITECTURE.md.
The EC (encrypt), DC (decrypt), and KC (key schedule) constants encode 5-slot register permutations as magic integers. For each round n, the constant m = EC[n] determines working register assignments via m%5, m%7, m%11, m%13, m%17, which must produce all five distinct indices {0,1,2,3,4}.
In the WASM implementation, these are switch statements (serpent.ts:30β80) rather than the TypeScript version's Uint32Array lookup tables. The values are identical:
EC[0] = 44255, EC[1] = 61867, EC[2] = 45034, ...
DC[0] = 44255, DC[1] = 60896, DC[2] = 28835, ...
KC[0] = 7788, KC[1] = 63716, KC[2] = 84032, ...
These constants cannot be verified by static inspection. If any constant is wrong, the register shuffle will corrupt data silently without obvious structure. The intermediate-value tests (ecb_iv.txt) are specifically designed to catch such errors round by round, and the full test suite confirms correctness empirically.
The unrolled variant (serpent_unrolled.ts) pre-resolves all EC/DC values at generation time, expanding the slot indices directly into each round's function call arguments. This eliminates the switch dispatch at runtime and is a pure optimization; the mathematical result is identical.
| Component | Implementation | Constant-Time? |
|---|---|---|
| S-boxes | Boolean logic (AND/OR/XOR/NOT) via WASM i32 ops |
Yes |
| Linear transform |
i32.rotl + XOR |
Yes |
| Key schedule | Fixed operations, no branches on key data | Yes |
| CBC XOR | Byte-by-byte XOR | Yes |
| CTR counter increment | Carry-propagation loop with early exit | No (minor) |
| PKCS7 padding validation | XOR-accumulate, no early return | Yes |
S-box timing safety: The bitslice Boolean circuit S-boxes are constant-time by construction. All 8 forward and 8 inverse S-boxes use only &, |, ^, ~ on i32 values in WASM. Every bit is processed unconditionally on every call. No lookup tables, no data-dependent branches, no data-dependent memory access patterns.
WASM vs JavaScript timing guarantees: The prior TypeScript audit noted that JavaScript's bitwise operators (|, &, ^, ~) map to CPU integer instructions on modern V8/SpiderMonkey but the JS spec does not guarantee constant-time execution. The WASM implementation substantially improves this situation:
- WASM integer operations (
i32.and,i32.or,i32.xor,i32.rotl) have well-defined, fixed-width semantics. - WASM modules are compiled ahead-of-time by the engine's optimizing compiler (V8 Liftoff β TurboFan, SpiderMonkey Cranelift). The JIT's speculative optimizations (type guards, inline caches, deoptimization) do not apply to WASM.
- WASM's
i32type is always 32-bit; there is no polymorphic integer representation that the engine might specialize differently based on observed values. - The structured control flow of WASM (no computed gotos, no dynamic dispatch within the S-box circuits) leaves no optimization surface for speculative execution to exploit.
While WASM does not carry a formal constant-time guarantee in its specification (the spec defines semantics, not timing), the practical constant-time properties are significantly stronger than JavaScript. The uniform, branch-free, fixed-width integer operations in the S-box circuits are as close to constant-time as a browser execution environment can provide without native code.
CTR counter increment (ctr.ts:33β40): The 128-bit LE counter increment uses an early-exit break when no carry occurs. This leaks the carry-propagation depth, which correlates with the counter value. The counter value is not secret in CTR mode (it is either derived from a nonce or is itself the nonce incremented by a public block index), so this is a low-severity concern. A fully constant-time increment (always iterating all 16 bytes) would be marginally cleaner but is not a security requirement.
Every attack examined across 5 academic papers targets reduced-round Serpent. The minimum security margin across all papers is 20 rounds (32 β 12), and the best attack provides only ~6.6 bits of advantage over brute force on 12 rounds. The leviathan-crypto WASM implementation makes it structurally impossible to invoke fewer than 32 rounds. The round count is hardcoded in both the loop-based (serpent.ts:430β435) and unrolled (serpent_unrolled.ts) implementations, with no parameter, configuration, or conditional logic to reduce it.
Authors: John Kelsey, Tadayoshi Kohno, Bruce Schneier Published: FSE 2000, LNCS 1978, pp. 75-93
| Attack | Rounds | Model | Data | Time | Type |
|---|---|---|---|---|---|
| Amplified boomerang distinguisher | 7 | Chosen-plaintext | 2^113 | < brute force | Distinguisher |
| Amplified boomerang key recovery | 8 | Chosen-plaintext | 2^113 | 2^179 | Key recovery (68 subkey bits) |
Core technique: The cipher is split into two halves: Eβ (rounds 1β4) and Eβ (rounds 5β7). A 4-round differential with probability 2^{β31} and a 3-round differential with probability 2^{β16} are combined via the amplified boomerang framework. Differences spread rapidly through Serpent's linear transform. The authors state: "differences spread out, so that it is possible to find reasonably good characteristics for three or four rounds at a time, but not for larger numbers of rounds."
Analysis: The best result reaches 8 rounds with a 24-round security margin. The attack exploits algebraic properties of the S-boxes and linear transform that are inherent to the Serpent design, not implementation-specific. The WASM implementation faithfully reproduces these components. The authors explicitly confirm: "this attack does not threaten the full 32-round Serpent."
Verdict: NOT APPLICABLE β 24-round security margin.
Authors: Jialin Huang, Xuejia Lai Published: IET Information Security, Vol. 7, Iss. 4, pp. 293-299, 2013
| Attack | Rounds | Model | Data | Time | Type |
|---|---|---|---|---|---|
| Single linear (all keys) | 10 | Chosen-plaintext | 2^92 | 2^84.68 | Key recovery |
| Single linear (192/256-bit) | 10 | Chosen-plaintext | 2^80 | 2^180.68 | Key recovery |
| Multidimensional linear | 10 | Chosen-plaintext | 2^88 | 2^84.07 | Key recovery |
| Multidimensional linear | 11 | Chosen-plaintext | 2^116 | 2^144 | Key recovery |
| Experimental validation | 5 | Chosen-plaintext | ~2^20 | Trivial | Key recovery (12 bits) |
Core technique: By fixing specific S-box inputs in the first round of a linear approximation, inactive S-boxes have correlation exactly Β±1 instead of 2^{β1}, boosting the overall approximation correlation. This reduces data complexity by up to 2^22 for single-approximation attacks and dramatically reduces time complexity for multidimensional attacks.
Analysis: The best result reaches 11 rounds. The 9-round approximation has correlation 2^{β54}; extending to 32 rounds would push the bias below 2^{β64}, where data requirements exceed the 2^{128} codebook. Full 32-round Serpent retains a 21-round security margin. The S-boxes (serpent.ts:86β157) are the standard Serpent S-boxes; the linear approximation properties exploited are inherent to the truth tables.
Verdict: NOT APPLICABLE β 21-round security margin.
Authors: Orr Dunkelman, Sebastiaan Indesteege, Nathan Keller Published: FSE 2008
| Attack | Rounds | Model | Data | Time | Type |
|---|---|---|---|---|---|
| Improved differential-linear | 11 | Chosen-plaintext | 2^121.8 | 2^135.7 | Key recovery |
| Inverted differential-linear | 11 | Chosen-ciphertext | 2^113.7 | 2^137.7 | Key recovery |
| 12-round differential-linear | 12 | Chosen-plaintext | 2^123.5 | 2^249.4 | Key recovery |
| Improved 10-round (128-bit) | 10 | Chosen-plaintext | 2^97.2 | 2^128 | Key recovery |
| Related-key (modified Serpent) | 32* | Related-key CP | 2^125 | Negligible | Distinguisher |
*Targets a non-standard Serpent variant with key schedule constants removed.
Core technique: A 9-round differential-linear approximation combining a 3-round truncated differential (probability 2^{β6}) with a 6-round linear approximation (bias 2^{β27}). The 12-round attack extends by prepending one round with 2^112 subkey guesses.
The 12-round result is the best classical attack across all papers examined. At 2^249.4 time complexity vs. 2^256 brute force, it provides only ~6.6 bits of advantage, a purely certificational result. The progression from 11 to 12 rounds required an increase of 2^113.7 in time complexity.
Related-key attack (Section 5): Exploits a rotation property requiring removal of the 0x9e3779b9 ^ i constants from the key schedule. leviathan-crypto's key schedule (serpent.ts:324β325) includes these constants via the keyIt helper: rotl<i32>(... ^ 0x9e3779b9 ^ i, 11). This attack is entirely inapplicable.
Analysis: The 12-round attack covers only 12 of 32 rounds, leaving a 20-round security margin. The time complexity is within ~6.6 bits of brute force at 12 rounds; extending to 13 rounds would push complexity well beyond 2^256.
Verdict: NOT APPLICABLE β 20-round security margin.
Authors: Eli Biham, Orr Dunkelman, Nathan Keller Published: FSE 2001, LNCS 2355, pp. 16-27
| Attack | Rounds | Model | Data | Time | Type |
|---|---|---|---|---|---|
| 9-round linear approximation | 9 | Known-plaintext | N/A | N/A | Approximation |
| 10-round key recovery | 10 | Known-plaintext | 2^118 | 2^89 | Key recovery |
| 11-round key recovery (192/256) | 11 | Known-plaintext | 2^118 | 2^187 | Key recovery |
Core technique: Systematic search for linear approximations identified a 9-round approximation with bias 2^{β52} (39 active S-boxes), 4β8x stronger than the bounds claimed by the Serpent designers. The authors note: "there is a huge distance between a 9-round approximation and attacking 32 rounds, or even 16 rounds of Serpent."
Analysis: The bias progression shows roughly 5β13 bits of degradation per additional round. A 32-round approximation would have bias far below 2^{β128}, requiring more data than the 2^{128} codebook. Full 32-round Serpent retains a 21-round security margin. The 11-round attack's memory requirement of 2^{193} bits is astronomically beyond any physical storage.
Verdict: NOT APPLICABLE β 21-round security margin.
Authors: Eli Biham, Orr Dunkelman, Nathan Keller Published: EUROCRYPT 2001, LNCS 2045, pp. 340-357
| Attack | Rounds | Model | Data | Time | Type |
|---|---|---|---|---|---|
| Differential attack (all keys) | 7 | Chosen-plaintext | 2^84 | 2^85 | Key recovery |
| Differential attack (256-bit) | 8 | Chosen-plaintext | 2^84 | 2^213 | Key recovery |
| Rectangle attack (256-bit) | 10 | Chosen-plaintext | 2^126.8 | 2^207.4 | Key recovery |
Core technique: The rectangle attack decomposes the cipher into two halves and counts over all intermediate differences at the boundary, replacing single-characteristic probability with sums of squared differential probabilities. The paper proves the best 3-round differential characteristic has probability 2^{β15} (7 active S-boxes), confirming the S-boxes are well-designed against differential attack.
Analysis: The best result covers 10 rounds. The 6-round differential at the core has probability 2^{β93}. Each additional round adds at least 2^{β15} probability degradation. Full 32-round Serpent retains a 22-round security margin.
Verdict: NOT APPLICABLE β 22-round security margin.
| Paper | Best Attack (Rounds) | Security Margin | Time Complexity | Verdict |
|---|---|---|---|---|
| Amplified Boomerang (FSE 2000) | 8 | 24 rounds | 2^179 | NOT APPLICABLE |
| Chosen-Plaintext Linear (IET 2013) | 11 | 21 rounds | 2^144 | NOT APPLICABLE |
| Differential-Linear (FSE 2008) | 12 | 20 rounds | 2^249.4 | NOT APPLICABLE |
| Linear Cryptanalysis (FSE 2001) | 11 | 21 rounds | 2^187 | NOT APPLICABLE |
| Rectangle Attack (EUROCRYPT 2001) | 10 | 22 rounds | 2^207.4 | NOT APPLICABLE |
Minimum security margin across all papers: 20 rounds (62.5% of the cipher untouched) Best classical attack advantage over brute force: ~6.6 bits (differential-linear on 12 rounds)
Every attack in this corpus shares one fundamental limitation: they work only on reduced-round Serpent. The best result, the 12-round differential-linear attack (Dunkelman, Indesteege, Keller, 2008), achieves a time complexity of 2^249.4, which is barely distinguishable from the 2^256 brute-force bound. Each additional round costs exponentially more: the jump from 11 to 12 rounds alone required a 2^113.7x increase in time complexity. Extending to 13 rounds would push the attack beyond brute force.
The remaining 20 rounds represent an exponential barrier that no known cryptanalytic technique can bridge. The Serpent designers chose 32 rounds specifically to provide this defense-in-depth, roughly doubling the rounds needed for security at the time of design.
leviathan-crypto's round count is not configurable. The loop-based implementation (serpent.ts:430β435) runs from n=0 to n>=31 with no configurable parameter. The unrolled implementation (serpent_unrolled.ts) has all 32 rounds expanded at code-generation time. The key schedule generates all 33 subkeys (KββKββ) unconditionally. Both CTR and CBC modes delegate to the same full 32-round block cipher. There is no API to request reduced-round encryption.
Residual concern: unauthenticated modes. Neither CBC nor CTR mode provides integrity or authentication. Chosen-ciphertext attacks (padding oracles, bit-flipping) are a more realistic threat than any reduced-round algebraic attack. The TypeScript wrapper includes JSDoc warnings on both SerpentCtr and SerpentCbc directing users to pair with HMAC-SHA256 (Encrypt-then-MAC) or use XChaCha20Poly1305 instead. The PKCS7 padding validation uses constant-time XOR-accumulate comparison, mitigating padding oracle attacks at the validation level.
The following section incorporates independent research conducted against the leviathan Serpent-256 implementation using the BicliqueFinder tool. The full research document is available at github.com/xero/BicliqueFinder.
Biclique cryptanalysis is the only known technique that applies to the full 32-round Serpent-256. Unlike the reduced-round attacks in Section 2.2, biclique attacks cover the entire cipher by exploiting key-related structures. The best published biclique attack on Serpent-256 (Menezes et al. 2020) achieved 2^{255.21} time complexity with 2^{88} data complexity, only ~0.8 bits better than brute-force key search.
A separate biclique construction using generator sets (de Carvalho & Kowada) achieved 2^{255.39} time with 2^{4} data complexity, a different tradeoff with substantially lower data requirements.
The BicliqueFinder Java tool was validated against both published papers. Initial comparison revealed a ~1.2-bit shortfall in time complexity, traced to four bugs in the tool's complexity computation:
| Bug | Description |
|---|---|
| 1 | Method dispatch always routed to FUTURE cipher formula, not Serpent |
| 2 | Hardcoded S-box denominator of 160 (FUTURE's total) instead of cipher-specific value |
| 3 |
getNUM_SBOXES_TOTAL() dimensional error: 4480 instead of correct 2080 |
| 4 | Hardcoded biclique dimension instead of deriving from differential count |
The correct S-box denominator of 2080 was independently derived from two sources: the Serpent specification (32 rounds Γ 32 nibbles = 1024 state S-boxes + 33 groups Γ 32 nibbles = 1056 key schedule S-boxes) and the reference C implementation. After patching, both published attacks were reproduced to within Β±0.01 bits.
A hand calculation of the dim-4 attack's complexity components verified the published 2^{255.21} result:
| Component | Value |
|---|---|
| C_biclique | 2^{2.00} |
| C_precomp | 2^{3.807} |
| C_recomp | 2^{7.006} |
| C_falsepos | 2^{0.00} |
| Per-group | 2^{7.205} |
| Total | 2^{255.205} (vs. published 2^{255.21} β 0.005-bit agreement) |
C_recomp dominates at 87% of per-group cost.
With the corrected tool, systematic search was conducted across three parameter dimensions:
Variable v position search: 2,912 candidates (91 states Γ 32 nibbles) were tested with the generator-set biclique (delta K31 n6, nabla K18 n11, states #91β#96). Every candidate produced an independent biclique (100% independence rate). Late-state positions (states 61β90) consistently outperformed earlier ones due to less diffusion between v and the biclique boundary.
Multi-nibble v optimization: At the best single-nibble position (state 66, nibble 8), combining nibbles 8+9 (|v| = 8 bits) eliminated the false-positive term entirely: C_falsepos dropped from 2^{4.00} to 2^{0.00}. Adding nibble 9 increased C_recomp by only +0.04 bits (23 additional S-boxes), because nibbles 8 and 9 occupy the same byte and share most of their diffusion path.
Joint biclique nibble search: 16,384 configurations (64 delta Γ 64 nabla nibbles Γ 4 v positions) were evaluated. The best result was delta nibble 0 (byte 0, high nibble of K31), nabla nibble 13, v = state 66 nibbles 8+9, yielding 2^{255.20}. Delta nibble 0 activates fewer S-boxes in the recomputation phase due to favorable key schedule propagation.
Phase 2 improvement summary:
| Step | Configuration | Time | Data |
|---|---|---|---|
| Paper published | K31/K18, d6/n11, v=s75n31 | 2^{255.39} | 2^{4} |
| Better v position | K31/K18, d6/n11, v=s66n8+9 | 2^{255.21} | 2^{4} |
| Better biclique nibbles | K31/K18, d0/n13, v=s66n8+9 | 2^{255.20} | 2^{4} |
The search was broadened to test whether delta key indices other than K31 could yield better results. Three parallel searches tested K29, K30, and K31 delta pairs against all valid nabla pairs, totaling 1,327,104 evaluations in 6.7 hours of compute time.
Independence rates:
| Delta Key Index | Independent / Total | Rate |
|---|---|---|
| K29 | 14,512 / 106,496 | 13.6% |
| K30 | 85,808 / 110,592 | 77.6% |
| K31 | 89,200 / 114,688 | 77.8% |
K29's dramatically lower rate reflects greater key schedule distance: differences introduced at K29 propagate through 2 extra recurrence steps, creating more S-box conflicts between delta and nabla differentials.
Data complexity: the single most important finding.
| Delta Key Index | Min Data | Max Data | % with data = 2^{4} |
|---|---|---|---|
| K29 | 2^{56} | 2^{80} | 0% |
| K30 | 2^{16} | 2^{40} | 0% |
| K31 | 2^{4} | 2^{4} | 100% |
Data complexity decreases monotonically as the delta key index moves closer to ciphertext: each step closer removes one key schedule propagation step. K31/K32 are the subkeys applied directly in the biclique states; no propagation is needed, and data complexity is at the theoretical minimum. No delta key index other than K31 achieves data complexity within practical reach.
Nabla pair landscape: K17/K18 is universally optimal across all three delta indices. The time complexity curve follows a symmetric U-shape centered on K17. The paper's choice of K18 was near-optimal; K17 outperforms it by 8 recomputation S-boxes (a new finding not in the published papers).
| Parameter | Value |
|---|---|
| Delta | K31, nibble 0 (0xf0 mask) |
| Nabla | K17, nibble 10 (0x0f mask) |
| v | state 66, nibbles 8+9 |
| Time | 2^{255.19} |
| Data | 2^{4} |
| Recomp | 1,041 / 2,080 |
Complete improvement chain:
| Step | Configuration | Time | Improvement |
|---|---|---|---|
| Paper published | K31/K18, d6/n11, s75n31 | 2^{255.39} | β |
| Better v | K31/K18, d6/n11, s66n8+9 | 2^{255.21} | β0.18 bits |
| Better biclique | K31/K18, d0/n13, s66n8+9 | 2^{255.20} | β0.01 bits |
| Better nabla pair | K31/K17, d0/n10, s66n8+9 | 2^{255.19} | β0.01 bits |
| Total | β0.20 bits |
-
K31 is uniquely necessary, not just optimal. The data complexity progression from K29 (min 2^{56}) through K30 (min 2^{16}) to K31 (fixed 2^{4}) demonstrates a monotonic structural constraint tied to the biclique construction. Each key schedule step between the delta key index and the biclique states introduces active S-boxes that demand exponentially more chosen plaintexts. The paper's choice of K31 was not arbitrary; it is the only viable option.
-
K17 outperforms K18 as the nabla pair. This is a new finding not present in the published papers. K17 produces 8 fewer recomputation S-boxes than K18 at the optimal biclique configuration. The universal optimality of K17/K18 across all three delta indices, with the separation |deltaβ17| increasing by exactly 1 per index step (12, 13, 14), points to a fixed structural feature of the Serpent key schedule where the recurrence
w[i] = (w[i-8] ^ w[i-5] ^ w[i-3] ^ w[i-1] ^ phi ^ i) <<< 11produces optimal cancellation properties when the nabla difference enters at K17. -
The time-data tradeoff moving away from ciphertext is never favorable. K29 achieves 2^{255.13} time complexity (better than K31's 2^{255.19} by 0.06 bits) but requires 2^{56} data, a gap that renders the time improvement meaningless. Each step away from ciphertext trades a marginal time improvement for an exponential data complexity increase.
The best biclique attack on full 32-round Serpent-256 achieves 2^{255.19} time complexity with 2^{4} data complexity. This provides less than 1 bit of advantage over exhaustive key search (2^{256}). The attack is purely certificational. It demonstrates a theoretical distinction from an ideal 256-bit cipher but has zero practical impact on security.
To contextualize the scale: 2^{255.19} operations at 10^{18} operations per second (beyond any current or foreseeable hardware) would require approximately 10^{58} years, roughly 10^{48} times the age of the universe. The biclique attack reduces this by a factor of less than 2, which does not change the conclusion that brute-force key search against Serpent-256 is computationally infeasible.
The leviathan-crypto implementation is not affected by this attack in any way that a code change could address. The biclique structure exploits inherent algebraic properties of the Serpent S-boxes and key schedule. These properties are identical in any correct implementation.
The Cryptographic Doom Principle (Duong, 2011): if you have to perform any cryptographic operation before verifying the MAC on a message you've received, it will somehow inevitably lead to doom. The canonical counter-examples are the SSL/TLS padding oracle (Vaudenay 2002) and the SSH plaintext recovery attack, both consequences of MAC-then-encrypt designs where decryption must run before the MAC can be checked, giving an attacker a decryption oracle through observable error behavior.
Section 2.2 identified unauthenticated modes as the primary residual concern for raw SerpentCtr and SerpentCbc. SerpentCipher (src/ts/serpent/cipher-suite.ts) addresses this by composing SerpentCbc with HMAC_SHA256 in strict Encrypt-then-MAC order, per chunk, used via SealStream and OpenStream (src/ts/stream/seal-stream.ts, src/ts/stream/open-stream.ts).
plaintext β SerpentCbc.encrypt(enc_key, iv, plaintext) β ciphertext β HMAC_SHA256(mac_key, counterNonce β u32be(aad_len) β aad β ciphertext) β ciphertext β tag
The HMAC is computed over the ciphertext plus context (counter nonce, AAD), not the plaintext. The tag covers exactly what travels on the wire, plus the chunk's position and associated data.
const ct = chunk.subarray(0, chunk.length - 32);
const receivedTag = chunk.subarray(chunk.length - 32);
const tagInput = concat(counterNonce, aadLenBuf, aadBytes, ct);
const expectedTag = hmac.hash(macKey, tagInput);
// CRITICAL: Verify HMAC BEFORE decrypting.
if (!constantTimeEqual(expectedTag, receivedTag))
throw new AuthenticationError('serpent');
// ONLY decrypt after authentication succeeds
const plaintext = cbc.decrypt(encKey, iv, ct);cbc.decrypt is never called before constantTimeEqual returns true. There is no code path (no early return, no fallthrough, no branch) that produces plaintext from a chunk that has not passed its MAC. This is the doom principle enforced structurally, not by convention.
The constantTimeEqual comparison itself (XOR-accumulate over all 32 bytes, no early exit) prevents a timing oracle on the tag. An attacker cannot distinguish a one-byte tag mismatch from a 32-byte mismatch by measuring response latency. At audit time the comparison ran as a JS loop in utils.ts; post-audit it was moved into a dedicated WASM SIMD module (v128 XOR-accumulate with branch-free reduction), with no JS fallback. The constant-time property is preserved and strengthened. See asm_ct.md.
At stream construction, HKDF-SHA-256 derives three keys from the master key and a 16-byte random nonce:
HKDF-SHA-256(masterKey, nonce, "serpent-sealstream-v2", 96) β enc_key[0:32] β mac_key[32:64] β iv_key[64:96]
Position binding is achieved through the 12-byte counter nonce (11-byte big-endian counter + 1-byte final flag). The HMAC covers counterNonce β u32be(aad_len) β aad β ciphertext, binding authentication to chunk position and associated data. The CBC IV for each chunk is derived deterministically: HMAC-SHA-256(iv_key, counterNonce)[0:16].
A chunk transplanted from a different stream (different HKDF salt β different keys), a different position (different counterNonce β different IV and different HMAC input), or misrepresented as terminal or non-terminal (different final flag in counterNonce) will fail MAC verification before any decryption runs. The SSH plaintext recovery attack works by feeding an arbitrary ciphertext block to a recipient who decrypts the first four bytes and interprets them as a length, an operation taken before MAC verification. SerpentCipher has no equivalent: there is no length field inside the encrypted payload, chunk boundaries are determined externally by the caller, and every byte of every chunk is MAC-verified before any of it is decrypted.
The SSL/TLS vulnerability arises because the MAC covers the plaintext (MAC-then-encrypt), so the MAC cannot be checked until after decryption and padding removal. An attacker who can elicit different error responses for padding errors versus MAC errors gains a byte-at-a-time decryption oracle.
SerpentCipher uses CBC with PKCS7 padding, so padding does exist (unlike the v1 CTR-based construction). The verify-then-decrypt ordering ensures that PKCS7 padding is evaluated only after HMAC authentication succeeds. This prevents Vaudenay-style padding oracle attacks:
-
HMAC mismatch β
AuthenticationErrorthrown (before any decryption) - Valid HMAC, invalid padding β unreachable (padding is always valid after authentic CBC decryption with the correct key; an attacker cannot produce a valid HMAC for a ciphertext that would cause a padding error)
The error path produces a single observable outcome: AuthenticationError. There is no second error class (no padding error distinct from MAC error) and no timing difference between failure modes.
SerpentCipher satisfies the Cryptographic Doom Principle by construction. MAC verification is the unconditional gate on the open path; decryption is unreachable until that gate clears. The 3-key HKDF derivation (enc_key, mac_key, iv_key) with counter-nonce position binding extends this guarantee to stream integrity: reordering, truncation, and cross-stream substitution are all detected at the MAC layer before any plaintext is produced.
| Document | Description |
|---|---|
| index | Project Documentation index |
| architecture | architecture overview, module relationships, buffer layouts, and build pipeline |
| sha2_audit | SHA-256 / SHA-512 / SHA-384 implementation audit |
| sha3_audit | SHA-3 / Keccak implementation audit |
| hmac_audit | HMAC-SHA256 audit (used in SerpentCipher) |
| hkdf_audit | HKDF-SHA256 audit (used in SerpentCipher) |
| chacha_audit | XChaCha20-Poly1305 implementation audit |
| serpent | TypeScript API for Serpent-256 |
| asm_serpent | WASM implementation details |
| serpent_reference | algorithm specification and known attacks |