Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #37259 > unrolled thread
| Started by | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| First post | 2013-01-22 02:07 -0800 |
| Last post | 2013-01-22 17:27 -0800 |
| Articles | 20 on this page of 92 — 15 participants |
Back to article view | Back to comp.lang.python
Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 02:07 -0800
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 11:31 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 03:53 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-22 23:26 +1100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:02 -0800
Re: Using filepath method to identify an .html page Lele Gaifax <lele@metapensiero.it> - 2013-01-22 13:22 +0100
Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 07:29 -0500
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:47 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:50 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-22 23:59 +1100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:50 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 04:47 -0800
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 13:04 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 05:57 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 01:33 +1100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 06:55 -0800
Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 10:05 -0500
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:21 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 02:27 +1100
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:36 -0700
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:40 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:40 +0000
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 17:07 -0700
Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-23 00:40 +0000
Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 18:55 -0800
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-23 02:50 +0000
Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 19:04 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 15:44 +1100
Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 22:15 -0500
Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-23 03:35 +0000
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 22:10 -0700
Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-23 01:13 -0500
RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-23 16:33 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 08:51 -0800
RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-23 18:19 +0000
Re: Using filepath method to identify an .html page Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-23 18:36 +0000
Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-23 17:46 -0500
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 08:51 -0800
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:34 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:35 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:34 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:38 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:38 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:39 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:36 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:35 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:37 +0000
Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 16:44 -0500
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:21 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 02:07 +1100
Re: Using filepath method to identify an .html page Peter Otten <__peter__@web.de> - 2013-01-22 16:25 +0100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:46 -0800
Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 11:11 -0500
RE: Using filepath method to identify an .html page "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 16:23 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:13 -0800
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:43 -0700
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:13 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:46 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:59 -0800
Re: Using filepath method to identify an .html page Chris Angelico <rosuav@gmail.com> - 2013-01-23 03:11 +1100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:26 -0800
Re: Using filepath method to identify an .html page MRAB <python@mrabarnett.plus.com> - 2013-01-22 18:49 +0000
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:49 -0700
Re: Using filepath method to identify an .html page Dave Angel <d@davea.name> - 2013-01-22 14:00 -0500
Re: Using filepath method to identify an .html page Peter Otten <__peter__@web.de> - 2013-01-22 20:16 +0100
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 23:25 -0800
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-23 08:25 -0700
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 07:56 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 07:56 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 23:25 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:26 -0800
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 07:59 -0800
Re: Using filepath method to identify an .html page John Gordon <gordon@panix.com> - 2013-01-22 16:55 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:07 -0800
Re: Using filepath method to identify an .html page John Gordon <gordon@panix.com> - 2013-01-22 18:37 +0000
Re: Using filepath method to identify an .html page Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 17:01 -0500
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 23:23 +0000
Re: Using filepath method to identify an .html page rusi <rustompmody@gmail.com> - 2013-01-22 09:33 -0800
Re: Using filepath method to identify an .html page Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-22 17:54 +0000
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:23 -0800
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 22:45 +0000
Re: Using filepath method to identify an .html page Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-22 22:44 +0000
Re: Using filepath method to identify an .html page Mitya Sirenef <msirenef@lightbird.net> - 2013-01-22 19:23 -0500
Re: Using filepath method to identify an .html page Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 06:55 -0800
Re: Using filepath method to identify an .html page Michael Torrie <torriem@gmail.com> - 2013-01-22 11:21 -0700
Re: Using filepath method to identify an .html page alex23 <wuwei23@gmail.com> - 2013-01-22 17:27 -0800
Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:13 -0800 |
| Message-ID | <a15028e4-1475-40d6-b2b4-2cf58d84bcc3@googlegroups.com> |
| In reply to | #37300 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:23:16 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε: > > Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly. > > > > Okay, I think we need to throw the flag on the field at this point. What you're asking for has gone into a realm where you clearly don't even appear to understand what you're asking for. > > > > What is the reason for your integer being limited to only 4 digits? Not even databases are limited in such a way. So what are you doing that imposes that kind of a limit, and why? a) I'am a reseller, i have unlimited ftp quota, hence database space b) I'am feeling compelled to do it this way c) i DO NOT want to use BIG absolute paths to identify files, just small numbers , shich they are easier to maintain. Your solution i know it works and i thank you very much for providing it to me! Can you help please on the errors that http://superhost.gr gives?
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-01-22 11:43 -0700 |
| Message-ID | <mailman.822.1358880240.2939.python-list@python.org> |
| In reply to | #37320 |
On 01/22/2013 11:13 AM, Ferrous Cranus wrote: > a) I'am a reseller, i have unlimited ftp quota, hence database space Space doesn't even come into the equation. There's virtually no difference between a 4-digit number and a 100-character string. Yes there is an absolute difference in storage space, but the difference is so miniscule that there's no point even thinking about it. Especially if you are dealing with less than a million database rows. > b) I'am feeling compelled to do it this way Why? Who's compelling you? Your boss? > c) i DO NOT want to use BIG absolute paths to identify files, just > small numbers , shich they are easier to maintain. No it won't be easier to maintain. I've done my share of web development over the years. There's no difference between using a string index and some form of number index. And if you have to go over the database by hand, having a string is infinitely easier for your brain to comprehend than a magic number. Now don't get me wrong. I've done plenty of tables linked by index numbers, but it's certainly harder to fix the data by hand since an index number only has meaning in the context of a query with another table. > > Your solution i know it works and i thank you very much for > providing it to me! > > Can you help please on the errors that http://superhost.gr gives? Sorry I cannot, since I don't have access to your site's source code, or your database.
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:13 -0800 |
| Message-ID | <mailman.818.1358878932.2939.python-list@python.org> |
| In reply to | #37300 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:23:16 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε: > > Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly. > > > > Okay, I think we need to throw the flag on the field at this point. What you're asking for has gone into a realm where you clearly don't even appear to understand what you're asking for. > > > > What is the reason for your integer being limited to only 4 digits? Not even databases are limited in such a way. So what are you doing that imposes that kind of a limit, and why? a) I'am a reseller, i have unlimited ftp quota, hence database space b) I'am feeling compelled to do it this way c) i DO NOT want to use BIG absolute paths to identify files, just small numbers , shich they are easier to maintain. Your solution i know it works and i thank you very much for providing it to me! Can you help please on the errors that http://superhost.gr gives?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 07:46 -0800 |
| Message-ID | <mailman.800.1358870153.2939.python-list@python.org> |
| In reply to | #37284 |
Thank you but the number needs to be a 4-digit integer only, if its to be stored in the database table correctly.
pin = int( htmlpage.encode("hex"), 16 )
I just tried whayt you gace me
This produces a number of: 140530319499494727...677522822126923116L
Visit http://superhost.gr to see that displayed error. I think it
Why did you use "hex" for? to encode the string to hexarithmetic? what for?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 07:59 -0800 |
| Message-ID | <12a22c5b-88a9-4577-a642-abe1e56cce5e@googlegroups.com> |
| In reply to | #37284 |
Τη Τρίτη, 22 Ιανουαρίου 2013 5:25:42 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
>
>
>
> > I insist, perhaps compeleld, to use a key to associate a number to a
>
> > filename. Would you help please?
>
> >
>
> > I dont know this is supposed to be written. i just know i need this:
>
> >
>
> > number = function_that_returns_a_number_out_of_a_string(
>
> > absolute_path_of_a_html_file)
>
> >
>
> > Would someone help me write that in python coding? We are talkign 1 line
>
> > of code here....
>
>
>
> Since you insist:
>
>
>
> >>> def function_that_returns_a_number_out_of_a_string(absolute_path_of_a_html_file):
>
> ... return int(absolute_path_of_a_html_file.encode("hex"), 16)
>
> ...
>
> >>> function_that_returns_a_number_out_of_a_string("/foo/bar/baz")
>
> 14669632128886499728813089146L
>
>
>
> As a bonus here is how to turn the number back into a path:
>
>
>
> >>> x = 14669632128886499728813089146
>
> >>> "{:x}".format(x).decode("hex")
>
> '/foo/bar/baz'
>
>
>
> ;)
Thank you but no...no that would be unnecessary complex.
I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-23 03:11 +1100 |
| Message-ID | <mailman.803.1358871083.2939.python-list@python.org> |
| In reply to | #37294 |
On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :( Either you are deliberately trolling, or you have a major comprehension problem. Please go back and read, carefully, all the remarks you've been offered in this thread. Feel free to ask for clarification of anything that doesn't make sense, but be sure to read all of it. You are asking something that is fundamentally impossible[1]. There simply are not enough numbers to go around. ChrisA [1] Well, impossible in decimal. If you work in base 4294967296, you could do what you want in four "digits".
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:26 -0800 |
| Message-ID | <8ad4a124-37a8-41fc-938d-9535b8affcbf@googlegroups.com> |
| In reply to | #37297 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
>
>
>
> Either you are deliberately trolling, or you have a major
>
> comprehension problem. Please go back and read, carefully, all the
>
> remarks you've been offered in this thread. Feel free to ask for
>
> clarification of anything that doesn't make sense, but be sure to read
>
> all of it. You are asking something that is fundamentally
>
> impossible[1]. There simply are not enough numbers to go around.
>
>
>
> ChrisA
>
> [1] Well, impossible in decimal. If you work in base 4294967296, you
>
> could do what you want in four "digits".
Fundamentally impossible?
Well....
OK: How about this in Perl:
$ cat testMD5.pl
use strict;
foreach my $url(qw@ /index.html /about/time.html @){
hashit($url);
}
sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash
}
which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-01-22 18:49 +0000 |
| Message-ID | <mailman.823.1358880581.2939.python-list@python.org> |
| In reply to | #37324 |
On 2013-01-22 18:26, Ferrous Cranus wrote:
> Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
>> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>>
>> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
>>
>> Either you are deliberately trolling, or you have a major
>> comprehension problem. Please go back and read, carefully, all the
>> remarks you've been offered in this thread. Feel free to ask for
>> clarification of anything that doesn't make sense, but be sure to read
>> all of it. You are asking something that is fundamentally
>> impossible[1]. There simply are not enough numbers to go around.
>>
>> ChrisA
>>
>> [1] Well, impossible in decimal. If you work in base 4294967296, you
>>
>> could do what you want in four "digits".
>
> Fundamentally impossible?
>
Yes.
> Well....
>
> OK: How about this in Perl:
>
> $ cat testMD5.pl
> use strict;
>
> foreach my $url(qw@ /index.html /about/time.html @){
> hashit($url);
> }
>
> sub hashit {
> my $url=shift;
> my @ltrs=split(//,$url);
> my $hash = 0;
>
> foreach my $ltr(@ltrs){
> $hash = ( $hash + ord($ltr)) %10000;
> }
> printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
>
That shortens the int to 4 digits.
A hash isn't guaranteed to be unique. A hash is an attempt to make an
int which is highly sensitive to a change in the data so that a small
change in the data will result in a different int. If the change is big
enough it _could_ give the same int, but the hope is that it probably
won't. (Ideally, if the hash has 4 decimal digits, you'd hope that the
chance of different data giving the same hash would be about 1 in
10000.)
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-01-22 11:49 -0700 |
| Message-ID | <mailman.824.1358880594.2939.python-list@python.org> |
| In reply to | #37324 |
On 01/22/2013 11:26 AM, Ferrous Cranus wrote: > which yields: > $ perl testMD5.pl > /index.html: 1066 > /about/time.html: 1547 Well do it the same with in python then. Just read the docs on the hashlib so you know what kind of object it returns and how to call methods on that object to return a big number that you can then do % 10000 on it. Note that your perl code is guaranteed to have collisions in the final number generated. If you're comfortable with perl, maybe you should use it rather than fight a language that you are not comfortable with and not understanding.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2013-01-22 14:00 -0500 |
| Message-ID | <mailman.825.1358881241.2939.python-list@python.org> |
| In reply to | #37324 |
On 01/22/2013 01:26 PM, Ferrous Cranus wrote:
>
>> <snip>
>
> sub hashit {
> my $url=shift;
> my @ltrs=split(//,$url);
> my $hash = 0;
>
> foreach my $ltr(@ltrs){
> $hash = ( $hash + ord($ltr)) %10000;
> }
> printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
>
If you use that algorithm to get a 4 digit number, it'll look good for
the first few files. But if you try 100 files, you've got almost 40%
chance of a collision, and if you try 10001, you've got a 100% chance.
So is it really okay to reuse the same integer for different files?
I tried to help you when you were using the md5 algorithm. By using
enough digits/characters, you can cut the likelihood of a collision
quite small. But 4 digits, don't be ridiculous.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-01-22 20:16 +0100 |
| Message-ID | <mailman.828.1358882199.2939.python-list@python.org> |
| In reply to | #37324 |
Ferrous Cranus wrote:
> Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
> έγραψε:
>> all of it. You are asking something that is fundamentally
>> impossible[1]. There simply are not enough numbers to go around.
> Fundamentally impossible?
>
> Well....
>
> OK: How about this in Perl:
>
> $ cat testMD5.pl
> use strict;
>
> foreach my $url(qw@ /index.html /about/time.html @){
> hashit($url);
> }
>
> sub hashit {
> my $url=shift;
> my @ltrs=split(//,$url);
> my $hash = 0;
>
> foreach my $ltr(@ltrs){
> $hash = ( $hash + ord($ltr)) %10000;
> }
> printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
$ cat clashes.pl
use strict;
foreach my $url(qw@
/public/fails.html
/large/cannot.html
/number/being.html
/hope/already.html
/being/really.html
/index/breath.html
/can/although.html
@){
hashit($url);
}
sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash
}
$ perl clashes.pl
/public/fails.html: 1743
/large/cannot.html: 1743
/number/being.html: 1743
/hope/already.html: 1743
/being/really.html: 1743
/index/breath.html: 1743
/can/although.html: 1743
Hm, I must be holding it wrong...
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 23:25 -0800 |
| Message-ID | <2a972302-27ee-4fcb-9928-e0e67afeec36@googlegroups.com> |
| In reply to | #37337 |
Τη Τρίτη, 22 Ιανουαρίου 2013 9:16:34 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
>
>
>
> > Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
>
> > έγραψε:
>
>
>
> >> all of it. You are asking something that is fundamentally
>
> >> impossible[1]. There simply are not enough numbers to go around.
>
>
>
> > Fundamentally impossible?
>
> >
>
> > Well....
>
> >
>
> > OK: How about this in Perl:
>
> >
>
> > $ cat testMD5.pl
>
> > use strict;
>
> >
>
> > foreach my $url(qw@ /index.html /about/time.html @){
>
> > hashit($url);
>
> > }
>
> >
>
> > sub hashit {
>
> > my $url=shift;
>
> > my @ltrs=split(//,$url);
>
> > my $hash = 0;
>
> >
>
> > foreach my $ltr(@ltrs){
>
> > $hash = ( $hash + ord($ltr)) %10000;
>
> > }
>
> > printf "%s: %0.4d\n",$url,$hash
>
> >
>
> > }
>
> >
>
> >
>
> > which yields:
>
> > $ perl testMD5.pl
>
> > /index.html: 1066
>
> > /about/time.html: 1547
>
>
>
> $ cat clashes.pl
>
> use strict;
>
>
>
> foreach my $url(qw@
>
> /public/fails.html
>
> /large/cannot.html
>
> /number/being.html
>
> /hope/already.html
>
> /being/really.html
>
> /index/breath.html
>
> /can/although.html
>
> @){
>
> hashit($url);
>
> }
>
>
>
> sub hashit {
>
> my $url=shift;
>
> my @ltrs=split(//,$url);
>
> my $hash = 0;
>
>
>
> foreach my $ltr(@ltrs){
>
> $hash = ( $hash + ord($ltr)) %10000;
>
> }
>
> printf "%s: %0.4d\n",$url,$hash
>
>
>
> }
>
> $ perl clashes.pl
>
> /public/fails.html: 1743
>
> /large/cannot.html: 1743
>
> /number/being.html: 1743
>
> /hope/already.html: 1743
>
> /being/really.html: 1743
>
> /index/breath.html: 1743
>
> /can/although.html: 1743
>
>
>
> Hm, I must be holding it wrong...
my @i = split(//,$url); # put each letter in it's own bin
my $j=0; # Initailize our
my $k=1; # hashing increment values
my @m=(); # workspace
foreach my $n(@i){
my $q=ord($n); # ASCII for character
$k += $j; # Increment our hash offset
$q += $k; # add our "old" value
$j = $k; # store that.
push @m,$q; # save the offsetted value
}
my $hashval=0; #initialize our hash value
# Generate that
map { $hashval = ($hashval + $_) % 10000} @m;
Using that method ABC.html and CBA.html now have different values because each letter position's value gets bumped up increasingly from left to right.
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-01-23 08:25 -0700 |
| Message-ID | <mailman.902.1358954746.2939.python-list@python.org> |
| In reply to | #37420 |
On 01/23/2013 12:25 AM, Ferrous Cranus wrote: > <some perl code> > Using that method ABC.html and CBA.html now have different values > because each letter position's value gets bumped up increasingly from > left to right. You have run this little "hash" algorithm on a whole bunch of files, say C:\windows\system32 right? And how many collisions did you get? You've already rejected using the file path or url as a key because it could change. Why are you wanting to do this hash based on the file's path or url anyway?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-23 07:56 -0800 |
| Message-ID | <31584b1b-4e84-459d-b027-8daf65bf2910@googlegroups.com> |
| In reply to | #37480 |
Τη Τετάρτη, 23 Ιανουαρίου 2013 5:25:36 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε: > On 01/23/2013 12:25 AM, Ferrous Cranus wrote: > > > <some perl code> > > > Using that method ABC.html and CBA.html now have different values > > > because each letter position's value gets bumped up increasingly from > > > left to right. > > > > You have run this little "hash" algorithm on a whole bunch of files, say > > C:\windows\system32 right? And how many collisions did you get? > > > > You've already rejected using the file path or url as a key because it > > could change. Why are you wanting to do this hash based on the file's > > path or url anyway? No, its inevitable, something must remain the same. Filepath *must* be used. Can you transliterate this code to Python code please?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-23 07:56 -0800 |
| Message-ID | <mailman.905.1358956621.2939.python-list@python.org> |
| In reply to | #37480 |
Τη Τετάρτη, 23 Ιανουαρίου 2013 5:25:36 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε: > On 01/23/2013 12:25 AM, Ferrous Cranus wrote: > > > <some perl code> > > > Using that method ABC.html and CBA.html now have different values > > > because each letter position's value gets bumped up increasingly from > > > left to right. > > > > You have run this little "hash" algorithm on a whole bunch of files, say > > C:\windows\system32 right? And how many collisions did you get? > > > > You've already rejected using the file path or url as a key because it > > could change. Why are you wanting to do this hash based on the file's > > path or url anyway? No, its inevitable, something must remain the same. Filepath *must* be used. Can you transliterate this code to Python code please?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 23:25 -0800 |
| Message-ID | <mailman.869.1358925949.2939.python-list@python.org> |
| In reply to | #37337 |
Τη Τρίτη, 22 Ιανουαρίου 2013 9:16:34 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
>
>
>
> > Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico
>
> > έγραψε:
>
>
>
> >> all of it. You are asking something that is fundamentally
>
> >> impossible[1]. There simply are not enough numbers to go around.
>
>
>
> > Fundamentally impossible?
>
> >
>
> > Well....
>
> >
>
> > OK: How about this in Perl:
>
> >
>
> > $ cat testMD5.pl
>
> > use strict;
>
> >
>
> > foreach my $url(qw@ /index.html /about/time.html @){
>
> > hashit($url);
>
> > }
>
> >
>
> > sub hashit {
>
> > my $url=shift;
>
> > my @ltrs=split(//,$url);
>
> > my $hash = 0;
>
> >
>
> > foreach my $ltr(@ltrs){
>
> > $hash = ( $hash + ord($ltr)) %10000;
>
> > }
>
> > printf "%s: %0.4d\n",$url,$hash
>
> >
>
> > }
>
> >
>
> >
>
> > which yields:
>
> > $ perl testMD5.pl
>
> > /index.html: 1066
>
> > /about/time.html: 1547
>
>
>
> $ cat clashes.pl
>
> use strict;
>
>
>
> foreach my $url(qw@
>
> /public/fails.html
>
> /large/cannot.html
>
> /number/being.html
>
> /hope/already.html
>
> /being/really.html
>
> /index/breath.html
>
> /can/although.html
>
> @){
>
> hashit($url);
>
> }
>
>
>
> sub hashit {
>
> my $url=shift;
>
> my @ltrs=split(//,$url);
>
> my $hash = 0;
>
>
>
> foreach my $ltr(@ltrs){
>
> $hash = ( $hash + ord($ltr)) %10000;
>
> }
>
> printf "%s: %0.4d\n",$url,$hash
>
>
>
> }
>
> $ perl clashes.pl
>
> /public/fails.html: 1743
>
> /large/cannot.html: 1743
>
> /number/being.html: 1743
>
> /hope/already.html: 1743
>
> /being/really.html: 1743
>
> /index/breath.html: 1743
>
> /can/although.html: 1743
>
>
>
> Hm, I must be holding it wrong...
my @i = split(//,$url); # put each letter in it's own bin
my $j=0; # Initailize our
my $k=1; # hashing increment values
my @m=(); # workspace
foreach my $n(@i){
my $q=ord($n); # ASCII for character
$k += $j; # Increment our hash offset
$q += $k; # add our "old" value
$j = $k; # store that.
push @m,$q; # save the offsetted value
}
my $hashval=0; #initialize our hash value
# Generate that
map { $hashval = ($hashval + $_) % 10000} @m;
Using that method ABC.html and CBA.html now have different values because each letter position's value gets bumped up increasingly from left to right.
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:26 -0800 |
| Message-ID | <mailman.819.1358879187.2939.python-list@python.org> |
| In reply to | #37297 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:11:20 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Wed, Jan 23, 2013 at 2:59 AM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
>
>
>
> Either you are deliberately trolling, or you have a major
>
> comprehension problem. Please go back and read, carefully, all the
>
> remarks you've been offered in this thread. Feel free to ask for
>
> clarification of anything that doesn't make sense, but be sure to read
>
> all of it. You are asking something that is fundamentally
>
> impossible[1]. There simply are not enough numbers to go around.
>
>
>
> ChrisA
>
> [1] Well, impossible in decimal. If you work in base 4294967296, you
>
> could do what you want in four "digits".
Fundamentally impossible?
Well....
OK: How about this in Perl:
$ cat testMD5.pl
use strict;
foreach my $url(qw@ /index.html /about/time.html @){
hashit($url);
}
sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash
}
which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 07:59 -0800 |
| Message-ID | <mailman.801.1358870401.2939.python-list@python.org> |
| In reply to | #37284 |
Τη Τρίτη, 22 Ιανουαρίου 2013 5:25:42 μ.μ. UTC+2, ο χρήστης Peter Otten έγραψε:
> Ferrous Cranus wrote:
>
>
>
> > I insist, perhaps compeleld, to use a key to associate a number to a
>
> > filename. Would you help please?
>
> >
>
> > I dont know this is supposed to be written. i just know i need this:
>
> >
>
> > number = function_that_returns_a_number_out_of_a_string(
>
> > absolute_path_of_a_html_file)
>
> >
>
> > Would someone help me write that in python coding? We are talkign 1 line
>
> > of code here....
>
>
>
> Since you insist:
>
>
>
> >>> def function_that_returns_a_number_out_of_a_string(absolute_path_of_a_html_file):
>
> ... return int(absolute_path_of_a_html_file.encode("hex"), 16)
>
> ...
>
> >>> function_that_returns_a_number_out_of_a_string("/foo/bar/baz")
>
> 14669632128886499728813089146L
>
>
>
> As a bonus here is how to turn the number back into a path:
>
>
>
> >>> x = 14669632128886499728813089146
>
> >>> "{:x}".format(x).decode("hex")
>
> '/foo/bar/baz'
>
>
>
> ;)
Thank you but no...no that would be unnecessary complex.
I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-01-22 16:55 +0000 |
| Message-ID | <kdmg96$gl8$1@reader1.panix.com> |
| In reply to | #37295 |
In <mailman.801.1358870401.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> I just need a way to CONVERT a string(absolute path) to a 4-digit unique
> number with INT!!! That's all i want!! But i cannot make it work :(
Given your requirements, I don't think it *can* work. There's just no
way to do it.
How can the computer guarantee that billions of possible inputs (file paths)
map to 10,000 unique outputs (4-digit numbers)? It's not possible.
It might be possible if you had control over the format of the input strings,
but it doesn't sound like you do.
Can you maintain a separate database which maps file paths to numbers?
If so, then this is an easy problem. Just keep a lookup table, like so:
filepath number
-------- ------
/home/files/bob/foo.html 0001
/home/files/bob/bar.html 0002
/home/files/steve/recipes/chocolate-cake.html 0003
/home/files/mary/payroll.html 0004
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:07 -0800 |
| Message-ID | <4847a0e3-aefa-4330-9252-db08f2e993df@googlegroups.com> |
| In reply to | #37305 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:55:02 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε: > In <mailman.801.1358870401.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes: > > > > > I just need a way to CONVERT a string(absolute path) to a 4-digit unique > > > number with INT!!! That's all i want!! But i cannot make it work :( > > > > Given your requirements, I don't think it *can* work. There's just no > > way to do it. > > > > How can the computer guarantee that billions of possible inputs (file paths) > > map to 10,000 unique outputs (4-digit numbers)? It's not possible. > > > > It might be possible if you had control over the format of the input strings, > > but it doesn't sound like you do. > > > > Can you maintain a separate database which maps file paths to numbers? > > If so, then this is an easy problem. Just keep a lookup table, like so: > > > > filepath number > > -------- ------ > > /home/files/bob/foo.html 0001 > > /home/files/bob/bar.html 0002 > > /home/files/steve/recipes/chocolate-cake.html 0003 > > /home/files/mary/payroll.html 0004 > > > > -- > > John Gordon A is for Amy, who fell down the stairs > > gordon@panix.com B is for Basil, assaulted by bears > > -- Edward Gorey, "The Gashlycrumb Tinies" No, because i DO NOT WANT to store LOTS OF BIGS absolute paths in the database. And the .html files are not even close 10.000
[toc] | [prev] | [next] | [standalone]
Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →
Back to top | Article view | comp.lang.python
csiph-web