Groups > comp.lang.python > #37259 > unrolled thread

Using filepath method to identify an .html page

Started by	Ferrous Cranus <nikos.gr33k@gmail.com>
First post	2013-01-22 02:07 -0800
Last post	2013-01-22 17:27 -0800
Articles	20 on this page of 92 — 15 participants

Back to article view | Back to comp.lang.python

  Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 02:07 -0800
    Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 11:31 +0000
      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 03:53 -0800
        Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-22 23:26 +1100
      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:02 -0800
        Re: Using filepath method to identify an .html page Lele Gaifax <lele@metapensiero.it> - 2013-01-22 13:22 +0100
        Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 07:29 -0500
          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:47 -0800
            Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:50 -0800
            Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-22 23:59 +1100
            Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:50 -0800
          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:47 -0800
            Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 13:04 +0000
              Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 05:57 -0800
                Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 01:33 +1100
                  Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 06:55 -0800
                    Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 10:05 -0500
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:21 -0800
                        Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 02:27 +1100
                        Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:36 -0700
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:40 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:40 +0000
                            Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 17:07 -0700
                            Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-23 00:40 +0000
                              Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 18:55 -0800
                            Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-23 02:50 +0000
                              Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 19:04 -0800
                                Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 15:44 +1100
                            Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 22:15 -0500
                            Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-23 03:35 +0000
                            Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 22:10 -0700
                            Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-23 01:13 -0500
                            RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-23 16:33 +0000
                              Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 08:51 -0800
                                RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-23 18:19 +0000
                                Re: Using filepath method to identify an .html page Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-23 18:36 +0000
                                Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-23 17:46 -0500
                              Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 08:51 -0800
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:34 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:35 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:34 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:38 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:38 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:35 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
                        Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 16:44 -0500
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:21 -0800
                    Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 02:07 +1100
                    Re: Using filepath method to identify an .html page Peter Otten <__peter__@web.de> - 2013-01-22 16:25 +0100
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:46 -0800
                        Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 11:11 -0500
                        RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 16:23 +0000
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:13 -0800
                            Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:43 -0700
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:13 -0800
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:46 -0800
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:59 -0800
                        Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 03:11 +1100
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:26 -0800
                            Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-22 18:49 +0000
                            Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:49 -0700
                            Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 14:00 -0500
                            Re: Using filepath method to identify an .html page Peter Otten <__peter__@web.de> - 2013-01-22 20:16 +0100
                              Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 23:25 -0800
                                Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-23 08:25 -0700
                                  Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 07:56 -0800
                                  Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 07:56 -0800
                              Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 23:25 -0800
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:26 -0800
                      Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:59 -0800
                        Re: Using filepath method to identify an .html page John Gordon <gordon@panix.com> - 2013-01-22 16:55 +0000
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:07 -0800
                            Re: Using filepath method to identify an .html page John Gordon <gordon@panix.com> - 2013-01-22 18:37 +0000
                            Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 17:01 -0500
                            Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:23 +0000
                        Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 09:33 -0800
                          Re: Using filepath method to identify an .html page Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-22 17:54 +0000
                          Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:23 -0800
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 22:45 +0000
                          Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 22:44 +0000
                          Re: Using filepath method to identify an .html page Mitya Sirenef <msirenef@lightbird.net> - 2013-01-22 19:23 -0500
                  Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 06:55 -0800
    Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:21 -0700
    Re: Using filepath method to identify an .html page alex23 <wuwei23@gmail.com> - 2013-01-22 17:27 -0800

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

#37320

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 10:13 -0800
Message-ID	<a15028e4-1475-40d6-b2b4-2cf58d84bcc3@googlegroups.com>
In reply to	#37300

Τη Τρίτη, 22 Ιανουαρίου 2013 6:23:16 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly.
> 
> 
> 
> Okay, I think we need to throw the flag on the field at this point.  What you're asking for has gone into a realm where you clearly don't even appear to understand what you're asking for.
> 
> 
> 
> What is the reason for your integer being limited to only 4 digits?  Not even databases are limited in such a way.  So what are you doing that imposes that kind of a limit, and why?


a) I'am a reseller, i have unlimited ftp quota, hence database space
b) I'am feeling compelled to do it this way
c) i DO NOT want to use BIG absolute paths to identify files, just small numbers , shich they are easier to maintain.

Your solution i know it works and i thank you very much for providing it to me!

Can you help please on the errors that http://superhost.gr gives?

[toc] | [prev] | [next] | [standalone]

#37331

From	Michael Torrie <torriem@gmail.com>
Date	2013-01-22 11:43 -0700
Message-ID	<mailman.822.1358880240.2939.python-list@python.org>
In reply to	#37320

On 01/22/2013 11:13 AM, Ferrous Cranus wrote:
> a) I'am a reseller, i have unlimited ftp quota, hence database space

Space doesn't even come into the equation.  There's virtually no
difference between a 4-digit number and a 100-character string.  Yes
there is an absolute difference in storage space, but the difference is
so miniscule that there's no point even thinking about it.  Especially
if you are dealing with less than a million database rows.

>  b) I'am feeling compelled to do it this way

Why?  Who's compelling you?  Your boss?

> c) i DO NOT want to use BIG absolute paths to identify files, just
> small numbers , shich they are easier to maintain.

No it won't be easier to maintain.  I've done my share of web
development over the years.  There's no difference between using a
string index and some form of number index.  And if you have to go over
the database by hand, having a string is infinitely easier for your
brain to comprehend than a magic number.  Now don't get me wrong.  I've
done plenty of tables linked by index numbers, but it's certainly harder
to fix the data by hand since an index number only has meaning in the
context of a query with another table.

> 
> Your solution i know it works and i thank you very much for
> providing it to me!
> 
> Can you help please on the errors that http://superhost.gr gives?

Sorry I cannot, since I don't have access to your site's source code, or
your database.

[toc] | [prev] | [next] | [standalone]

#37322

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 10:13 -0800
Message-ID	<mailman.818.1358878932.2939.python-list@python.org>
In reply to	#37300

Τη Τρίτη, 22 Ιανουαρίου 2013 6:23:16 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly.
> 
> 
> 
> Okay, I think we need to throw the flag on the field at this point.  What you're asking for has gone into a realm where you clearly don't even appear to understand what you're asking for.
> 
> 
> 
> What is the reason for your integer being limited to only 4 digits?  Not even databases are limited in such a way.  So what are you doing that imposes that kind of a limit, and why?


a) I'am a reseller, i have unlimited ftp quota, hence database space
b) I'am feeling compelled to do it this way
c) i DO NOT want to use BIG absolute paths to identify files, just small numbers , shich they are easier to maintain.

Your solution i know it works and i thank you very much for providing it to me!

Can you help please on the errors that http://superhost.gr gives?

[toc] | [prev] | [next] | [standalone]

#37293

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 07:46 -0800
Message-ID	<mailman.800.1358870153.2939.python-list@python.org>
In reply to	#37284

Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly.

pin = int( htmlpage.encode("hex"), 16 )

I just tried whayt you gace me

This produces a number of: 140530319499494727...677522822126923116L

Visit http://superhost.gr to see that displayed error. I think it

Why did you use "hex" for? to encode the string to hexarithmetic? what for?

[toc] | [prev] | [next] | [standalone]

#37294

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 07:59 -0800
Message-ID	<12a22c5b-88a9-4577-a642-abe1e56cce5e@googlegroups.com>
In reply to	#37284

Τη Τρίτη, 22 Ιανουαρίου 2013 5:25:42 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
> 
> 
> 
> > I insist, perhaps compeleld, to use a key to associate a number to a
> 
> > filename. Would you help please?
> 
> > 
> 
> > I dont know this is supposed to be written. i just know i need this:
> 
> > 
> 
> > number = function_that_returns_a_number_out_of_a_string(
> 
> > absolute_path_of_a_html_file)
> 
> > 
> 
> > Would someone help me write that in python coding? We are talkign 1 line
> 
> > of code here....
> 
> 
> 
> Since you insist:
> 
> 
> 
> >>> def function_that_returns_a_number_out_of_a_string(absolute_path_of_a_html_file):
> 
> ...     return int(absolute_path_of_a_html_file.encode("hex"), 16)
> 
> ... 
> 
> >>> function_that_returns_a_number_out_of_a_string("/foo/bar/baz")
> 
> 14669632128886499728813089146L
> 
> 
> 
> As a bonus here is how to turn the number back into a path:
> 
> 
> 
> >>> x = 14669632128886499728813089146
> 
> >>> "{:x}".format(x).decode("hex")
> 
> '/foo/bar/baz'
> 
> 
> 
> ;)

Thank you but no...no that would be unnecessary complex.

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

[toc] | [prev] | [next] | [standalone]

#37297

From	Chris Angelico <rosuav@gmail.com>
Date	2013-01-23 03:11 +1100
Message-ID	<mailman.803.1358871083.2939.python-list@python.org>
In reply to	#37294

On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

Either you are deliberately trolling, or you have a major
comprehension problem. Please go back and read, carefully, all the
remarks you've been offered in this thread. Feel free to ask for
clarification of anything that doesn't make sense, but be sure to read
all of it. You are asking something that is fundamentally
impossible[1]. There simply are not enough numbers to go around.

ChrisA
[1] Well, impossible in decimal. If you work in base 4294967296, you
could do what you want in four "digits".

[toc] | [prev] | [next] | [standalone]

#37324

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 10:26 -0800
Message-ID	<8ad4a124-37a8-41fc-938d-9535b8affcbf@googlegroups.com>
In reply to	#37297

Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> 
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
> 
> 
> 
> Either you are deliberately trolling, or you have a major
> 
> comprehension problem. Please go back and read, carefully, all the
> 
> remarks you've been offered in this thread. Feel free to ask for
> 
> clarification of anything that doesn't make sense, but be sure to read
> 
> all of it. You are asking something that is fundamentally
> 
> impossible[1]. There simply are not enough numbers to go around.
> 
> 
> 
> ChrisA
> 
> [1] Well, impossible in decimal. If you work in base 4294967296, you
> 
> could do what you want in four "digits".

Fundamentally impossible?

Well....

OK: How about this in Perl:

$ cat testMD5.pl
use strict;

foreach my $url(qw@ /index.html /about/time.html @){
        hashit($url);
}

sub hashit {
   my $url=shift;
   my @ltrs=split(//,$url);
   my $hash = 0;

   foreach my $ltr(@ltrs){
        $hash = ( $hash + ord($ltr)) %10000;
   }
   printf "%s: %0.4d\n",$url,$hash
   
}


which yields:
$ perl testMD5.pl 
/index.html: 1066
/about/time.html: 1547

[toc] | [prev] | [next] | [standalone]

#37332

From	MRAB <python@mrabarnett.plus.com>
Date	2013-01-22 18:49 +0000
Message-ID	<mailman.823.1358880581.2939.python-list@python.org>
In reply to	#37324

On 2013-01-22 18:26, Ferrous Cranus wrote:
> Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
>> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>>
>> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
>>
>> Either you are deliberately trolling, or you have a major
>> comprehension problem. Please go back and read, carefully, all the
>> remarks you've been offered in this thread. Feel free to ask for
>> clarification of anything that doesn't make sense, but be sure to read
>> all of it. You are asking something that is fundamentally
>> impossible[1]. There simply are not enough numbers to go around.
>>
>> ChrisA
>>
>> [1] Well, impossible in decimal. If you work in base 4294967296, you
>>
>> could do what you want in four "digits".
>
> Fundamentally impossible?
>
Yes.

> Well....
>
> OK: How about this in Perl:
>
> $ cat testMD5.pl
> use strict;
>
> foreach my $url(qw@ /index.html /about/time.html @){
>          hashit($url);
> }
>
> sub hashit {
>     my $url=shift;
>     my @ltrs=split(//,$url);
>     my $hash = 0;
>
>     foreach my $ltr(@ltrs){
>          $hash = ( $hash + ord($ltr)) %10000;
>     }
>     printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
>
That shortens the int to 4 digits.

A hash isn't guaranteed to be unique. A hash is an attempt to make an
int which is highly sensitive to a change in the data so that a small
change in the data will result in a different int. If the change is big
enough it _could_ give the same int, but the hope is that it probably
won't. (Ideally, if the hash has 4 decimal digits, you'd hope that the
chance of different data giving the same hash would be about 1 in
10000.)

[toc] | [prev] | [next] | [standalone]

#37333

From	Michael Torrie <torriem@gmail.com>
Date	2013-01-22 11:49 -0700
Message-ID	<mailman.824.1358880594.2939.python-list@python.org>
In reply to	#37324

On 01/22/2013 11:26 AM, Ferrous Cranus wrote:
> which yields:
> $ perl testMD5.pl 
> /index.html: 1066
> /about/time.html: 1547

Well do it the same with in python then.  Just read the docs on the
hashlib so you know what kind of object it returns and how to call
methods on that object to return a big number that you can then do %
10000 on it.  Note that your perl code is guaranteed to have collisions
in the final number generated.

If you're comfortable with perl, maybe you should use it rather than
fight a language that you are not comfortable with and not understanding.

[toc] | [prev] | [next] | [standalone]

#37334

From	Dave Angel <d@davea.name>
Date	2013-01-22 14:00 -0500
Message-ID	<mailman.825.1358881241.2939.python-list@python.org>
In reply to	#37324

On 01/22/2013 01:26 PM, Ferrous Cranus wrote:
>
>> <snip>
>
> sub hashit {
>     my $url=shift;
>     my @ltrs=split(//,$url);
>     my $hash = 0;
>
>     foreach my $ltr(@ltrs){
>          $hash = ( $hash + ord($ltr)) %10000;
>     }
>     printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
>

If you use that algorithm to get a 4 digit number, it'll look good for 
the first few files.  But if you try 100 files, you've got almost 40% 
chance of a collision, and if you try 10001, you've got a 100% chance.

So is it really okay to reuse the same integer for different files?

I tried to help you when you were using the md5 algorithm.  By using 
enough digits/characters, you can cut the likelihood of a collision 
quite small.  But 4 digits, don't be ridiculous.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#37337

From	Peter Otten <__peter__@web.de>
Date	2013-01-22 20:16 +0100
Message-ID	<mailman.828.1358882199.2939.python-list@python.org>
In reply to	#37324

Ferrous Cranus wrote:

> Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
> έγραψε:

>> all of it. You are asking something that is fundamentally
>> impossible[1]. There simply are not enough numbers to go around.

> Fundamentally impossible?
> 
> Well....
> 
> OK: How about this in Perl:
> 
> $ cat testMD5.pl
> use strict;
> 
> foreach my $url(qw@ /index.html /about/time.html @){
>         hashit($url);
> }
> 
> sub hashit {
>    my $url=shift;
>    my @ltrs=split(//,$url);
>    my $hash = 0;
> 
>    foreach my $ltr(@ltrs){
>         $hash = ( $hash + ord($ltr)) %10000;
>    }
>    printf "%s: %0.4d\n",$url,$hash
>    
> }
> 
> 
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547

$ cat clashes.pl 
use strict;

foreach my $url(qw@ 
    /public/fails.html
    /large/cannot.html
    /number/being.html
    /hope/already.html
    /being/really.html
    /index/breath.html
    /can/although.html
@){
        hashit($url);
}

sub hashit {
   my $url=shift;
   my @ltrs=split(//,$url);
   my $hash = 0;

   foreach my $ltr(@ltrs){
        $hash = ( $hash + ord($ltr)) %10000;
   }
   printf "%s: %0.4d\n",$url,$hash
   
}
$ perl clashes.pl 
/public/fails.html: 1743
/large/cannot.html: 1743
/number/being.html: 1743
/hope/already.html: 1743
/being/really.html: 1743
/index/breath.html: 1743
/can/although.html: 1743

Hm, I must be holding it wrong...

[toc] | [prev] | [next] | [standalone]

#37420

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 23:25 -0800
Message-ID	<2a972302-27ee-4fcb-9928-e0e67afeec36@googlegroups.com>
In reply to	#37337

Τη Τρίτη, 22 Ιανουαρίου 2013 9:16:34 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
> 
> 
> 
> > Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
> 
> > έγραψε:
> 
> 
> 
> >> all of it. You are asking something that is fundamentally
> 
> >> impossible[1]. There simply are not enough numbers to go around.
> 
> 
> 
> > Fundamentally impossible?
> 
> > 
> 
> > Well....
> 
> > 
> 
> > OK: How about this in Perl:
> 
> > 
> 
> > $ cat testMD5.pl
> 
> > use strict;
> 
> > 
> 
> > foreach my $url(qw@ /index.html /about/time.html @){
> 
> >         hashit($url);
> 
> > }
> 
> > 
> 
> > sub hashit {
> 
> >    my $url=shift;
> 
> >    my @ltrs=split(//,$url);
> 
> >    my $hash = 0;
> 
> > 
> 
> >    foreach my $ltr(@ltrs){
> 
> >         $hash = ( $hash + ord($ltr)) %10000;
> 
> >    }
> 
> >    printf "%s: %0.4d\n",$url,$hash
> 
> >    
> 
> > }
> 
> > 
> 
> > 
> 
> > which yields:
> 
> > $ perl testMD5.pl
> 
> > /index.html: 1066
> 
> > /about/time.html: 1547
> 
> 
> 
> $ cat clashes.pl 
> 
> use strict;
> 
> 
> 
> foreach my $url(qw@ 
> 
>     /public/fails.html
> 
>     /large/cannot.html
> 
>     /number/being.html
> 
>     /hope/already.html
> 
>     /being/really.html
> 
>     /index/breath.html
> 
>     /can/although.html
> 
> @){
> 
>         hashit($url);
> 
> }
> 
> 
> 
> sub hashit {
> 
>    my $url=shift;
> 
>    my @ltrs=split(//,$url);
> 
>    my $hash = 0;
> 
> 
> 
>    foreach my $ltr(@ltrs){
> 
>         $hash = ( $hash + ord($ltr)) %10000;
> 
>    }
> 
>    printf "%s: %0.4d\n",$url,$hash
> 
>    
> 
> }
> 
> $ perl clashes.pl 
> 
> /public/fails.html: 1743
> 
> /large/cannot.html: 1743
> 
> /number/being.html: 1743
> 
> /hope/already.html: 1743
> 
> /being/really.html: 1743
> 
> /index/breath.html: 1743
> 
> /can/although.html: 1743
> 
> 
> 
> Hm, I must be holding it wrong...

my @i = split(//,$url); # put each letter in it's own bin
my $j=0;   # Initailize our 
my $k=1;   # hashing increment values
my @m=();  # workspace
foreach my $n(@i){
       my $q=ord($n);  # ASCII for character
       $k += $j;       # Increment our hash offset
       $q += $k;       # add our "old" value
       $j = $k;        # store that. 
       push @m,$q;     # save the offsetted value 
}
       
my $hashval=0;  #initialize our hash value
# Generate that
map { $hashval = ($hashval + $_) % 10000} @m;


Using that method ABC.html and CBA.html now have different values because each letter position's value gets bumped up increasingly from left to right.

[toc] | [prev] | [next] | [standalone]

#37480

From	Michael Torrie <torriem@gmail.com>
Date	2013-01-23 08:25 -0700
Message-ID	<mailman.902.1358954746.2939.python-list@python.org>
In reply to	#37420

On 01/23/2013 12:25 AM, Ferrous Cranus wrote:
> <some perl code>
> Using that method ABC.html and CBA.html now have different values
> because each letter position's value gets bumped up increasingly from
> left to right.

You have run this little "hash" algorithm on a whole bunch of files, say
C:\windows\system32 right?  And how many collisions did you get?

You've already rejected using the file path or url as a key because it
could change.  Why are you wanting to do this hash based on the file's
path or url anyway?

[toc] | [prev] | [next] | [standalone]

#37485

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-23 07:56 -0800
Message-ID	<31584b1b-4e84-459d-b027-8daf65bf2910@googlegroups.com>
In reply to	#37480

Τη Τετάρτη, 23 Ιανουαρίου 2013 5:25:36 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/23/2013 12:25 AM, Ferrous Cranus wrote:
> 
> > <some perl code>
> 
> > Using that method ABC.html and CBA.html now have different values
> 
> > because each letter position's value gets bumped up increasingly from
> 
> > left to right.
> 
> 
> 
> You have run this little "hash" algorithm on a whole bunch of files, say
> 
> C:\windows\system32 right?  And how many collisions did you get?
> 
> 
> 
> You've already rejected using the file path or url as a key because it
> 
> could change.  Why are you wanting to do this hash based on the file's
> 
> path or url anyway?

No, its inevitable, something must remain the same.

Filepath *must* be used.

Can you transliterate this code to Python code please?

[toc] | [prev] | [next] | [standalone]

#37486

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-23 07:56 -0800
Message-ID	<mailman.905.1358956621.2939.python-list@python.org>
In reply to	#37480

Τη Τετάρτη, 23 Ιανουαρίου 2013 5:25:36 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/23/2013 12:25 AM, Ferrous Cranus wrote:
> 
> > <some perl code>
> 
> > Using that method ABC.html and CBA.html now have different values
> 
> > because each letter position's value gets bumped up increasingly from
> 
> > left to right.
> 
> 
> 
> You have run this little "hash" algorithm on a whole bunch of files, say
> 
> C:\windows\system32 right?  And how many collisions did you get?
> 
> 
> 
> You've already rejected using the file path or url as a key because it
> 
> could change.  Why are you wanting to do this hash based on the file's
> 
> path or url anyway?

No, its inevitable, something must remain the same.

Filepath *must* be used.

Can you transliterate this code to Python code please?

[toc] | [prev] | [next] | [standalone]

#37421

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 23:25 -0800
Message-ID	<mailman.869.1358925949.2939.python-list@python.org>
In reply to	#37337

Τη Τρίτη, 22 Ιανουαρίου 2013 9:16:34 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
> 
> 
> 
> > Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
> 
> > έγραψε:
> 
> 
> 
> >> all of it. You are asking something that is fundamentally
> 
> >> impossible[1]. There simply are not enough numbers to go around.
> 
> 
> 
> > Fundamentally impossible?
> 
> > 
> 
> > Well....
> 
> > 
> 
> > OK: How about this in Perl:
> 
> > 
> 
> > $ cat testMD5.pl
> 
> > use strict;
> 
> > 
> 
> > foreach my $url(qw@ /index.html /about/time.html @){
> 
> >         hashit($url);
> 
> > }
> 
> > 
> 
> > sub hashit {
> 
> >    my $url=shift;
> 
> >    my @ltrs=split(//,$url);
> 
> >    my $hash = 0;
> 
> > 
> 
> >    foreach my $ltr(@ltrs){
> 
> >         $hash = ( $hash + ord($ltr)) %10000;
> 
> >    }
> 
> >    printf "%s: %0.4d\n",$url,$hash
> 
> >    
> 
> > }
> 
> > 
> 
> > 
> 
> > which yields:
> 
> > $ perl testMD5.pl
> 
> > /index.html: 1066
> 
> > /about/time.html: 1547
> 
> 
> 
> $ cat clashes.pl 
> 
> use strict;
> 
> 
> 
> foreach my $url(qw@ 
> 
>     /public/fails.html
> 
>     /large/cannot.html
> 
>     /number/being.html
> 
>     /hope/already.html
> 
>     /being/really.html
> 
>     /index/breath.html
> 
>     /can/although.html
> 
> @){
> 
>         hashit($url);
> 
> }
> 
> 
> 
> sub hashit {
> 
>    my $url=shift;
> 
>    my @ltrs=split(//,$url);
> 
>    my $hash = 0;
> 
> 
> 
>    foreach my $ltr(@ltrs){
> 
>         $hash = ( $hash + ord($ltr)) %10000;
> 
>    }
> 
>    printf "%s: %0.4d\n",$url,$hash
> 
>    
> 
> }
> 
> $ perl clashes.pl 
> 
> /public/fails.html: 1743
> 
> /large/cannot.html: 1743
> 
> /number/being.html: 1743
> 
> /hope/already.html: 1743
> 
> /being/really.html: 1743
> 
> /index/breath.html: 1743
> 
> /can/although.html: 1743
> 
> 
> 
> Hm, I must be holding it wrong...

my @i = split(//,$url); # put each letter in it's own bin
my $j=0;   # Initailize our 
my $k=1;   # hashing increment values
my @m=();  # workspace
foreach my $n(@i){
       my $q=ord($n);  # ASCII for character
       $k += $j;       # Increment our hash offset
       $q += $k;       # add our "old" value
       $j = $k;        # store that. 
       push @m,$q;     # save the offsetted value 
}
       
my $hashval=0;  #initialize our hash value
# Generate that
map { $hashval = ($hashval + $_) % 10000} @m;


Using that method ABC.html and CBA.html now have different values because each letter position's value gets bumped up increasingly from left to right.

[toc] | [prev] | [next] | [standalone]

#37325

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 10:26 -0800
Message-ID	<mailman.819.1358879187.2939.python-list@python.org>
In reply to	#37297

Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> 
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
> 
> 
> 
> Either you are deliberately trolling, or you have a major
> 
> comprehension problem. Please go back and read, carefully, all the
> 
> remarks you've been offered in this thread. Feel free to ask for
> 
> clarification of anything that doesn't make sense, but be sure to read
> 
> all of it. You are asking something that is fundamentally
> 
> impossible[1]. There simply are not enough numbers to go around.
> 
> 
> 
> ChrisA
> 
> [1] Well, impossible in decimal. If you work in base 4294967296, you
> 
> could do what you want in four "digits".

Fundamentally impossible?

Well....

OK: How about this in Perl:

$ cat testMD5.pl
use strict;

foreach my $url(qw@ /index.html /about/time.html @){
        hashit($url);
}

sub hashit {
   my $url=shift;
   my @ltrs=split(//,$url);
   my $hash = 0;

   foreach my $ltr(@ltrs){
        $hash = ( $hash + ord($ltr)) %10000;
   }
   printf "%s: %0.4d\n",$url,$hash
   
}


which yields:
$ perl testMD5.pl 
/index.html: 1066
/about/time.html: 1547

[toc] | [prev] | [next] | [standalone]

#37295

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 07:59 -0800
Message-ID	<mailman.801.1358870401.2939.python-list@python.org>
In reply to	#37284

Τη Τρίτη, 22 Ιανουαρίου 2013 5:25:42 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
> 
> 
> 
> > I insist, perhaps compeleld, to use a key to associate a number to a
> 
> > filename. Would you help please?
> 
> > 
> 
> > I dont know this is supposed to be written. i just know i need this:
> 
> > 
> 
> > number = function_that_returns_a_number_out_of_a_string(
> 
> > absolute_path_of_a_html_file)
> 
> > 
> 
> > Would someone help me write that in python coding? We are talkign 1 line
> 
> > of code here....
> 
> 
> 
> Since you insist:
> 
> 
> 
> >>> def function_that_returns_a_number_out_of_a_string(absolute_path_of_a_html_file):
> 
> ...     return int(absolute_path_of_a_html_file.encode("hex"), 16)
> 
> ... 
> 
> >>> function_that_returns_a_number_out_of_a_string("/foo/bar/baz")
> 
> 14669632128886499728813089146L
> 
> 
> 
> As a bonus here is how to turn the number back into a path:
> 
> 
> 
> >>> x = 14669632128886499728813089146
> 
> >>> "{:x}".format(x).decode("hex")
> 
> '/foo/bar/baz'
> 
> 
> 
> ;)

Thank you but no...no that would be unnecessary complex.

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

[toc] | [prev] | [next] | [standalone]

#37305

From	John Gordon <gordon@panix.com>
Date	2013-01-22 16:55 +0000
Message-ID	<kdmg96$gl8$1@reader1.panix.com>
In reply to	#37295

In <mailman.801.1358870401.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:

> I just need a way to CONVERT a string(absolute path) to a 4-digit unique
> number with INT!!! That's all i want!! But i cannot make it work :(

Given your requirements, I don't think it *can* work.  There's just no
way to do it.

How can the computer guarantee that billions of possible inputs (file paths)
map to 10,000 unique outputs (4-digit numbers)?  It's not possible.

It might be possible if you had control over the format of the input strings,
but it doesn't sound like you do.

Can you maintain a separate database which maps file paths to numbers?
If so, then this is an easy problem.  Just keep a lookup table, like so:

  filepath                                         number
  --------                                         ------
  /home/files/bob/foo.html                         0001
  /home/files/bob/bar.html                         0002
  /home/files/steve/recipes/chocolate-cake.html    0003
  /home/files/mary/payroll.html                    0004

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]

#37318

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-22 10:07 -0800
Message-ID	<4847a0e3-aefa-4330-9252-db08f2e993df@googlegroups.com>
In reply to	#37305

Τη Τρίτη, 22 Ιανουαρίου 2013 6:55:02 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε:
> In <mailman.801.1358870401.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> 
> 
> 
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique
> 
> > number with INT!!! That's all i want!! But i cannot make it work :(
> 
> 
> 
> Given your requirements, I don't think it *can* work.  There's just no
> 
> way to do it.
> 
> 
> 
> How can the computer guarantee that billions of possible inputs (file paths)
> 
> map to 10,000 unique outputs (4-digit numbers)?  It's not possible.
> 
> 
> 
> It might be possible if you had control over the format of the input strings,
> 
> but it doesn't sound like you do.
> 
> 
> 
> Can you maintain a separate database which maps file paths to numbers?
> 
> If so, then this is an easy problem.  Just keep a lookup table, like so:
> 
> 
> 
>   filepath                                         number
> 
>   --------                                         ------
> 
>   /home/files/bob/foo.html                         0001
> 
>   /home/files/bob/bar.html                         0002
> 
>   /home/files/steve/recipes/chocolate-cake.html    0003
> 
>   /home/files/mary/payroll.html                    0004
> 
> 
> 
> -- 
> 
> John Gordon                   A is for Amy, who fell down the stairs
> 
> gordon@panix.com              B is for Basil, assaulted by bears
> 
>                                 -- Edward Gorey, "The Gashlycrumb Tinies"

No, because i DO NOT WANT to store LOTS OF BIGS absolute paths in the database.

And the .html files are not even close 10.000

[toc] | [prev] | [next] | [standalone]

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

csiph-web

Using filepath method to identify an .html page

Contents

#37320

#37331

#37322

#37293

#37294

#37297

#37324

#37332

#37333

#37334

#37337

#37420

#37480

#37485

#37486

#37421

#37325

#37295

#37305

#37318