Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Newsgroups: comp.os.linux.development.system
Subject: xfrm spi hash
Date: Sun, 29 Jul 2012 20:23:25 +0100
Lines: 52
Message-ID: <87y5m25q36.fsf@sapphire.mobileactivedefense.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: individual.net 3OscHhSpIfxNTcVtO35s/QKzzZySf5oB/+f0wTgHCfAhqswJOXh4JDxU4cdutb7GE=
Cancel-Lock: sha1:nAcwofJSdXpbVcgGQf8FrCtcdrc= sha1:PHHCv84ZOt19hLzXJzUisGs6FQs=
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
Xref: csiph.com comp.os.linux.development.system:457

Leading remark: The following text will probably very cryptic to
anynone not familiar with the inner workings of the racoon IKE
implementation and the process of establishing IPsec-based 'VPN
tunnels'. There's a more general question at the end but I think I
should also include enough background information to communicate the
general situation.

Recently, I replaced the SADB_DUMP-based mechanism racoon uses to get
completely rid of all phase2 IPsec SAs associated with a particular
phase1 SA with 'something sane' based on using already available
information to determine the phase2 SAs to be removed. Because I need
to do accounting for 'user VPNs', deleteing a kernel phase2 SA/ xfrm
state is a two step procedure comprised of first sending a SADB_GET
message to the kernel in order to get the traffic counters and then
deleting it. The code which deals with the replies to this get message
needs to find the corresponding ph2handle structures of the daemon
based on using SPI, source and destination address and the IPsec
protocol ID. The racoon code keeps all ph2handles on a global linked
list and locates a specific one by searching this list using 'some
comparison criterion'. In the case of a SPI-based search, that's a
particular atrocious procedure because a SPI may be stored in one of
two members of a saproto structure, a linked list of which is hooked
to a saprop structure and the ph2handle structure itself contains two
linked lists of these. This seemed a little gross to be and because of
that, I added a static (64K pointers) hash table to be used for these
lookups.

Since the same problem needs to be solved by the kernel, I
looked at the kernel source in order to see what kind of hash function
Linux uses for this. For IPv4, it is essentially

	x = spi ^ proto ^ ntohl(daddr);

The spi is a (pseudo-)random 32-bit value, proto is the eight bit
protocol ID and daddr is the IPv4 destination address. This value is
then post-processed as

	x = x ^ (x >> 10) ^ (x >> 20);

to calculate the hash value. Since it is usually pointless to change
something which is known to work, I used the same calculation for my
code. But since this is a relativley expensive calculation, I
continued to think about it and some doubts about it arose: Assuming
the PRNG the kernel uses to generate the SPI is 'good' in the sense
that a sequences of SPIs is statistically indistinguishable from a
random sequence, the only effect of these calculations ought to be to
transform a random set of bits into a different, equally random set of
bits with no specific relation to any of the other input values
whatsoever. I'm therefore very much tempted to drop all of it and just
use the SPI as hash value. Is there some kind of obvious error in this
reasoning or does anyone know something about the raison d'etre for
this specific hash function?