Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Newsgroups: comp.lang.python
Date: Tue, 22 Jan 2013 23:25:45 -0800 (PST)
In-Reply-To: <mailman.828.1358882199.2939.python-list@python.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=94.68.70.179; posting-account=DYJQ-woAAACEPH85Au2BhUVfFTfSfVa4
References: <adcfb222-a038-4700-8959-38e452c04b85@googlegroups.com> <50fe787e$0$30003$c3e8da3$5496439d@news.astraweb.com> <f4298c6f-81a2-45c7-903b-015e9f17d5a7@googlegroups.com> <mailman.784.1358857784.2939.python-list@python.org> <mailman.785.1358858844.2939.python-list@python.org> <50fe8e69$0$30003$c3e8da3$5496439d@news.astraweb.com> <0459659d-4ec2-4c7d-bee3-b4e363c916dd@googlegroups.com> <mailman.790.1358865192.2939.python-list@python.org> <ec8f1a56-d0f7-46a6-a8a3-9425d3aabf8e@googlegroups.com> <mailman.796.1358868351.2939.python-list@python.org> <12a22c5b-88a9-4577-a642-abe1e56cce5e@googlegroups.com> <mailman.803.1358871083.2939.python-list@python.org> <8ad4a124-37a8-41fc-938d-9535b8affcbf@googlegroups.com> <mailman.828.1358882199.2939.python-list@python.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Subject: Re: Using filepath method to identify an .html page
From: Ferrous Cranus <nikos.gr33k@gmail.com>
To: comp.lang.python@googlegroups.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: python-list@python.org
Precedence: list
Message-ID: <mailman.869.1358925949.2939.python-list@python.org>
Lines: 175
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:37421

=CE=A4=CE=B7 =CE=A4=CF=81=CE=AF=CF=84=CE=B7, 22 =CE=99=CE=B1=CE=BD=CE=BF=CF=
=85=CE=B1=CF=81=CE=AF=CE=BF=CF=85 2013 9:16:34 =CE=BC.=CE=BC. UTC+2, =CE=BF=
 =CF=87=CF=81=CE=AE=CF=83=CF=84=CE=B7=CF=82 Peter Otten =CE=AD=CE=B3=CF=81=
=CE=B1=CF=88=CE=B5:
> Ferrous Cranus wrote:
>=20
>=20
>=20
> > =CE=A4=CE=B7 =CE=A4=CF=81=CE=AF=CF=84=CE=B7, 22 =CE=99=CE=B1=CE=BD=CE=
=BF=CF=85=CE=B1=CF=81=CE=AF=CE=BF=CF=85 2013 6:11:20 =CE=BC.=CE=BC. UTC+2, =
=CE=BF =CF=87=CF=81=CE=AE=CF=83=CF=84=CE=B7=CF=82 Chris Angelico
>=20
> > =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
>=20
>=20
>=20
> >> all of it. You are asking something that is fundamentally
>=20
> >> impossible[1]. There simply are not enough numbers to go around.
>=20
>=20
>=20
> > Fundamentally impossible?
>=20
> >=20
>=20
> > Well....
>=20
> >=20
>=20
> > OK: How about this in Perl:
>=20
> >=20
>=20
> > $ cat testMD5.pl
>=20
> > use strict;
>=20
> >=20
>=20
> > foreach my $url(qw@ /index.html /about/time.html @){
>=20
> >         hashit($url);
>=20
> > }
>=20
> >=20
>=20
> > sub hashit {
>=20
> >    my $url=3Dshift;
>=20
> >    my @ltrs=3Dsplit(//,$url);
>=20
> >    my $hash =3D 0;
>=20
> >=20
>=20
> >    foreach my $ltr(@ltrs){
>=20
> >         $hash =3D ( $hash + ord($ltr)) %10000;
>=20
> >    }
>=20
> >    printf "%s: %0.4d\n",$url,$hash
>=20
> >   =20
>=20
> > }
>=20
> >=20
>=20
> >=20
>=20
> > which yields:
>=20
> > $ perl testMD5.pl
>=20
> > /index.html: 1066
>=20
> > /about/time.html: 1547
>=20
>=20
>=20
> $ cat clashes.pl=20
>=20
> use strict;
>=20
>=20
>=20
> foreach my $url(qw@=20
>=20
>     /public/fails.html
>=20
>     /large/cannot.html
>=20
>     /number/being.html
>=20
>     /hope/already.html
>=20
>     /being/really.html
>=20
>     /index/breath.html
>=20
>     /can/although.html
>=20
> @){
>=20
>         hashit($url);
>=20
> }
>=20
>=20
>=20
> sub hashit {
>=20
>    my $url=3Dshift;
>=20
>    my @ltrs=3Dsplit(//,$url);
>=20
>    my $hash =3D 0;
>=20
>=20
>=20
>    foreach my $ltr(@ltrs){
>=20
>         $hash =3D ( $hash + ord($ltr)) %10000;
>=20
>    }
>=20
>    printf "%s: %0.4d\n",$url,$hash
>=20
>   =20
>=20
> }
>=20
> $ perl clashes.pl=20
>=20
> /public/fails.html: 1743
>=20
> /large/cannot.html: 1743
>=20
> /number/being.html: 1743
>=20
> /hope/already.html: 1743
>=20
> /being/really.html: 1743
>=20
> /index/breath.html: 1743
>=20
> /can/although.html: 1743
>=20
>=20
>=20
> Hm, I must be holding it wrong...

my @i =3D split(//,$url); # put each letter in it's own bin
my $j=3D0;   # Initailize our=20
my $k=3D1;   # hashing increment values
my @m=3D();  # workspace
foreach my $n(@i){
       my $q=3Dord($n);  # ASCII for character
       $k +=3D $j;       # Increment our hash offset
       $q +=3D $k;       # add our "old" value
       $j =3D $k;        # store that.=20
       push @m,$q;     # save the offsetted value=20
}
      =20
my $hashval=3D0;  #initialize our hash value
# Generate that
map { $hashval =3D ($hashval + $_) % 10000} @m;


Using that method ABC.html and CBA.html now have different values because e=
ach letter position's value gets bumped up increasingly from left to right.