liams.website

Seed-independent collisions on wyhash and rapidhash

How static secrets in wyhash and rapidhash enable HashDoS attacks regardless of seed randomisation, and why randomising secrets fixes it.

6 minute read

What is HashDoS?

Hash tables are one of those data structures that feel almost like cheating. You want to look up a value by key? Just hash the key, use that hash to pick a bucket, and there’s your data. O(1) lookup. Constant time. The computer science equivalent of having a really good filing system.

The catch is that hash tables only work well when keys spread evenly across buckets. When two keys hash to the same value – a collision – they end up in the same bucket, and you have to search through that bucket linearly. A few collisions are fine. Hash tables are designed to handle them. But what happens if every key lands in the same bucket?

Then your O(1) lookup becomes O(n). Your hash table is now a linked list with extra steps. And if you’re a web server processing user-provided keys, that user has just found a very cheap way to make your CPU do a lot of work.

This is HashDoS: denial-of-service via hash collisions. In December 2011, researchers demonstrated at the Chaos Communication Congress that virtually every major web framework was vulnerable. A carefully crafted 2MB POST request containing colliding keys could trigger 40 billion string comparisons and lock up a server’s CPU. Minimal bandwidth, maximum damage. PHP, Java, Python, Ruby, ASP.NET, and V8 were all affected.

The fix seemed straightforward. Instead of using a predictable hash function, you randomise it. Generate a secret seed at startup, mix that same seed into every hash computation, and now the attacker can’t predict which keys will collide. They don’t know your seed, so they can’t craft adversarial inputs. SipHash was designed specifically for this purpose – a hash function fast enough for hash tables but cryptographically strong enough to resist differential attacks. Rust adopted randomised SipHash by default. Many other languages followed. The problem was solved.

But SipHash, while fast for a cryptographic construction, is still slower than non-cryptographic alternatives. For performance-critical applications, developers reached for faster options: MurmurHash, CityHash, wyhash, rapidhash. These hashes support random seeds too. Surely that’s good enough?

When seeds don’t help

A seed-independent collision is exactly what it sounds like: two inputs that collide regardless of what seed you use. The seed is supposed to make the hash function’s output unpredictable, but if you can find inputs that collide for every possible seed, the randomisation provides no protection at all.

This turns out to be more common than you might hope. Orson Peters has a detailed post explaining seed-independent collisions on CityHash64, MurmurHash2/3, and FarmHash64. The attacks exploit mathematical properties of the operations these hash functions use – fixed points in multiplication, invertible operations that let you work backwards from desired outputs.

Wyhash and rapidhash, two of the fastest non-cryptographic hash functions available today, have their own seed-independent collision vulnerability in the C++ implementations. I documented this in March 2025 while working on the Rust rapidhash crate, having missed an older issue that documents the same. A month later, CVE-2025-27209 announced that Node.js v24.0.0 was vulnerable to HashDoS through V8’s use of rapidhash. The attack is embarrassingly simple. And so is the fix.

The wyhash construction

To understand the vulnerability, we need to look at how wyhash handles the tail end of longer inputs. Like most fast hash functions, wyhash uses a folded multiply – multiplying two 64-bit values into a 128-bit result, then XORing the upper and lower halves together. This provides excellent bit diffusion in a single operation.

Wyhash also uses “secrets” – large constants mixed into the computation at various points. In theory, you could randomise these secrets to provide additional unpredictability. In practice, the C and Rust implementations use static, hardcoded secrets.

Here’s the relevant code for inputs longer than 16 bytes:

cpp
// a = penultimate 8 bytes
// b = final 8 bytes
a = _wyr8(p + i - 16);
b = _wyr8(p + i - 8);

// XOR with secret[1] and the seed
a ^= secret[1];
b ^= seed;

// Folded multiply
_wymum(&a, &b);

// Final mix
return _wymix(a ^ secret[0] ^ len, b ^ secret[1]);

The seed only affects b. The secrets are static constants. Do you see the problem?

If we set the penultimate 8 bytes of our input to equal secret[1], then after the XOR a becomes zero. When you multiply zero by anything, you get zero. The folded multiply produces zero for both a and b, regardless of what b was – which means regardless of what the seed was.

The final mix then operates on secret[0] ^ len and secret[1]. Both are constants. For any fixed input length, we can generate infinitely many inputs that all hash to the same value by:

  1. Choosing any arbitrary bytes for the first part of the input
  2. Setting bytes at position len - 16 to len - 9 equal to secret[1]
  3. Choosing any arbitrary bytes for the final 8

Every such input will hash identically, no matter what seed the hash table is using.

Rapidhash inherits the problem

Rapidhash is derived from wyhash with performance optimisations. It uses a similar tail-end structure, the C++ implementation:

cpp
if (len <= 16) {
    // Short input handling...
} else {
    // Process chunks...

    // Handle tail, where i is the length of the input
    a = rapid_read64(p + i - 16) ^ i;
    b = rapid_read64(p + i - 8);
}
a ^= secret[1];
b ^= seed;
rapid_mum(&a, &b);
return rapid_mix(a ^ secret[7], b ^ secret[1] ^ i);

There’s a small difference: rapidhash XORs i (the length) into a before the secret XOR. But the attack still works – just set the penultimate 8 bytes to secret[1] ^ len instead of just secret[1].

Here’s a test demonstrating the attack on rapidhash v3:

rust
#[test]
fn seed_independent_hash_collisions() {
    const SECRET1: u64 = 0x8bb84b93962eacc9;
    const EXPECTED_HASH: u64 = 18446744073709551615;

    fn random_slice() -> Vec<u8> {
        // generate a random 32-byte input
        let mut data = vec![0; 32];
        let rng = &mut rand::rng();
        rng.fill_bytes(data.as_mut_slice());

        // set penultimate 8 bytes to secret[1], XORed with the input length 32
        let offset = data.len() - 16;
        let a = &mut data[offset .. offset + 8];
        a.copy_from_slice(&(SECRET1 ^ 32).to_le_bytes());

        data
    }

    // 10 hashes with different seeds and input data, all colliding
    for _ in 0..10 {
        let hash = rapidhashcc_v3(&random_slice(), rand::random::<u64>());
        assert_eq!(hash, EXPECTED_HASH, "Expected seed-independent hash collision.");
    }
}

We generate ten completely different byte sequences with random seeds, ensure only that bytes 16–23 equal secret[1] ^ len, and they all collide to the same hash value.

The fix

The solution is almost insultingly obvious: randomise the secrets, not just the seed.

The Rust rapidhash crate now supports two approaches:

  1. Random secrets: Generate secrets randomly at startup for ephemeral hashing. Maximum unpredictability. RapidHashMap makes use of this by default with its initialisation through RandomState.
  2. Seed-derived secrets: Call RapidSecrets::seed(seed) to generate deterministic but unique secrets from your seed. Same seed, same secrets, reproducible hashing – but an attacker can’t exploit static constants.

Either approach defeats the attack. If the attacker doesn’t know secret[1], they can’t construct the magic bytes that cause zero collapse. Foldhash and other hashes with a similar folded-multiply construction also usually work around the same problem of seed-independent collisions by randomising the secrets.

An attacker who can observe raw hash outputs, or collect enough timing data to infer when collisions occur, might still find ways through – non-cryptographic hash functions aren’t designed to resist that level of scrutiny. But for a typical hash table processing untrusted input, where the attacker only controls the keys and not the observation channel, randomising the secrets can provide a reasonable level of DoS-resistance for most use cases.

Rapidhash with randomised secrets is still very fast. It just has one more thing to randomise.