Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Rainer Weikusat Newsgroups: comp.os.linux.development.system Subject: xfrm spi hash Date: Sun, 29 Jul 2012 20:23:25 +0100 Lines: 52 Message-ID: <87y5m25q36.fsf@sapphire.mobileactivedefense.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: individual.net 3OscHhSpIfxNTcVtO35s/QKzzZySf5oB/+f0wTgHCfAhqswJOXh4JDxU4cdutb7GE= Cancel-Lock: sha1:nAcwofJSdXpbVcgGQf8FrCtcdrc= sha1:PHHCv84ZOt19hLzXJzUisGs6FQs= User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) Xref: csiph.com comp.os.linux.development.system:457 Leading remark: The following text will probably very cryptic to anynone not familiar with the inner workings of the racoon IKE implementation and the process of establishing IPsec-based 'VPN tunnels'. There's a more general question at the end but I think I should also include enough background information to communicate the general situation. Recently, I replaced the SADB_DUMP-based mechanism racoon uses to get completely rid of all phase2 IPsec SAs associated with a particular phase1 SA with 'something sane' based on using already available information to determine the phase2 SAs to be removed. Because I need to do accounting for 'user VPNs', deleteing a kernel phase2 SA/ xfrm state is a two step procedure comprised of first sending a SADB_GET message to the kernel in order to get the traffic counters and then deleting it. The code which deals with the replies to this get message needs to find the corresponding ph2handle structures of the daemon based on using SPI, source and destination address and the IPsec protocol ID. The racoon code keeps all ph2handles on a global linked list and locates a specific one by searching this list using 'some comparison criterion'. In the case of a SPI-based search, that's a particular atrocious procedure because a SPI may be stored in one of two members of a saproto structure, a linked list of which is hooked to a saprop structure and the ph2handle structure itself contains two linked lists of these. This seemed a little gross to be and because of that, I added a static (64K pointers) hash table to be used for these lookups. Since the same problem needs to be solved by the kernel, I looked at the kernel source in order to see what kind of hash function Linux uses for this. For IPv4, it is essentially x = spi ^ proto ^ ntohl(daddr); The spi is a (pseudo-)random 32-bit value, proto is the eight bit protocol ID and daddr is the IPv4 destination address. This value is then post-processed as x = x ^ (x >> 10) ^ (x >> 20); to calculate the hash value. Since it is usually pointless to change something which is known to work, I used the same calculation for my code. But since this is a relativley expensive calculation, I continued to think about it and some doubts about it arose: Assuming the PRNG the kernel uses to generate the SPI is 'good' in the sense that a sequences of SPIs is statistically indistinguishable from a random sequence, the only effect of these calculations ought to be to transform a random set of bits into a different, equally random set of bits with no specific relation to any of the other input values whatsoever. I'm therefore very much tempted to drop all of it and just use the SPI as hash value. Is there some kind of obvious error in this reasoning or does anyone know something about the raison d'etre for this specific hash function?