How much do hash collisions help an attacker?

I was asked this interesting question: suppose there are 2^{256} possible MACs (message authentication codes) and that each MAC is hashed to a 256 bit hash. Suppose we have a hash obtained from a MAC, and want to find a MAC with the same hash.

The object of this post is to give a short proof that the average number of MACs we must hash is between 0.0163 \times 2^{256} \approx 2^{250} and 0.418 \times 2^{256}. This contrasts with the 0.5 \times 2^{256} hashes that would on average be needed if there was a unique hash for each MAC.

Update. The ‘asymptotically correct’ answer is 2^{256} / \mathrm{e} = 0.368 \times 2^{256}: see end of this post.

Generalize by having N MACs, each of which is hashed to one of N possible values. We model an idealised hash using balls and urns: take N urns, representing the possible values of the hash, and suppose that each ball is thrown into an urn chosen uniformly at random. The probability that after N throws, urn 1 contains exactly r balls is

\binom{N}{r} \bigl( 1 - \frac{1}{N} \bigr)^{N-r} \frac{1}{N^r}.

For large N we have (1-1/N)^N \approx \mathrm{e}^{-1}. If r is small compared to N then (1-1/N)^{-r} \approx 1 and

\binom{N}{r} \frac{1}{N^r} = \frac{N(N-1) \ldots (N-r+1)}{N^r} \frac{1}{r!} \approx \frac{1}{r!}.

It follows that the probability of seeing exactly r balls in urn 1 is approximately \mathrm{e}^{-1}/r!. (This is the Poisson distribution with parameter 1.) Hence, if r is small compared to N, then an excellent approximation for the probability that a given urn contains exactly r balls is \mathrm{e}^{-1}/r!.

Now, doing subtly the wrong thing, suppose that urn 1 is non-empty, and we draw balls at random until we find a ball in urn 1, throwing out balls that turn out to be in different urns. If urn 1 has r balls then on average (N+1)/(r+1) balls must be drawn. (This is easily proved by induction, using that if e_{N} is the expected number of drawings with N balls then e_1 = 1 and, conditioning on the first ball drawn, e_N = r/N + (1-r/N)(1 + e_{N-1}) = 1 + (1-r/N) e_{N-1}.) Conditioned on being non-empty, the probability that urn 1 has r balls is 1/(\mathrm{e}-1) r!. Hence the expected number of balls drawn is

\begin{aligned}  \sum_{r=1}^\infty \frac{1}{\mathrm{e}-1} \frac{1}{r!} \frac{N+1}{r+1} &= \frac{N+1}{\mathrm{e}-1} \sum_{s=2}^\infty \frac{1}{s!} \\  &= \frac{N+1}{\mathrm{e}-1}\bigl( \mathrm{e} - 1 - 1 \bigr) \\ &= (N+1)\frac{\mathrm{e}-2}{\mathrm{e}-1}  \\  &\approx 0.418N. \end{aligned}

If each urn has exactly one ball, then on average (N+1)/2 \approx 0.5N drawings are needed. So we find a ball in urn 1 about 80\% faster than in the uniform case.

However, this does not capture the original problem. A hash from a MAC chosen uniformly at random is disproportionately likely to be the common hash of several different MACs. Correspondingly, in the ball-and-urns model, the probability of choosing an urn should be proportional to the number of balls in it. Choosing a non-empty urn at random gives too much weight to the urns with only one ball. So 0.418N is only an upper bound on the amount of work required.

For a lower bound, suppose that the original ball is always chosen from the urn containing the most balls. Lemma 6.20 in these notes on hashing shows that the probability that there is a urn containing at least t balls is at most N\mathrm{e}^t/t^t. This probability is small for t = \log N / \log \log N (provided N is sufficiently large). So an asymptotic lower bound for the amount of work required is N\log \log N/\log N.

For the original problem we have N = 2^{256} and 2^{256}\mathrm{e}^t/t^t < 1/1000 provided t \ge 60. So with probability at least 999/1000, no urn contains more than 60 balls. The expected numbers of balls drawn is therefore at least

\frac{999}{1000} \times \frac{2^{256} + 1}{61} \ge 0.0163 \times 2^{256}.

Update. I posted the question to MathOverflow, and it got an unexpectedly elegant answer from Douglas Zare. When we choose a ball number uniformly at random, it is k times more likely to be in an urn containing k balls. In the Poisson model, this event has probability k/\mathrm{e}/k! = 1/\mathrm{e}(k-1)!. So if X is the number of balls in the urn of the chosen ball, not counting the chosen ball, then
\mathbb{P}[X=j] = 1/\mathrm{e}k!. Hence X is distributed as a Poisson random variable with mean 1, and the expected number of balls drawn is 1/\mathbb{E}[X+1+1], which is shown to be N/\mathrm{e} in Zare’s post.

Since \log_2(\mathrm{e}) = 1/\log 2 = 1.443, the hash collisions mean an attacker saves about 1.443 bits of work.


One Response to How much do hash collisions help an attacker?

  1. pollyshaw says:

    Thank you! This really clears it up for me. I feel a bit sorry I can’t give you lots of reputation points.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: