Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #37299 > unrolled thread
| Started by | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| First post | 2013-01-22 08:15 -0800 |
| Last post | 2013-01-22 12:04 -0500 |
| Articles | 20 on this page of 26 — 11 participants |
Back to article view | Back to comp.lang.python
Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 08:15 -0800
RE: Converting a string to a number by using INT (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 16:27 +0000
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:02 -0800
RE: Converting a string to a number by using INT (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 17:24 +0000
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:39 -0800
Re: Converting a string to a number by using INT (no hash method) John Gordon <gordon@panix.com> - 2013-01-22 18:36 +0000
Re: Converting a string to a number by using INT (no hash method) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 17:22 -0500
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:39 -0800
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:37 -0800
Re: Converting a string to a number by using INT (no hash method) Michael Torrie <torriem@gmail.com> - 2013-01-22 12:02 -0700
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 11:28 -0800
Re: Converting a string to a number by using INT (no hash method) Alan Spence <alan.spence@ntlworld.com> - 2013-01-22 20:00 +0000
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 11:28 -0800
Re: Converting a string to a number by using INT (no hash method) John Gordon <gordon@panix.com> - 2013-01-22 20:40 +0000
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 00:44 -0800
Re: Converting a string to a number by using INT (no hash method) Dave Angel <d@davea.name> - 2013-01-22 14:08 -0500
RE: Converting a string to a number by using INT (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 20:30 +0000
Re: Converting a string to a number by using INT (no hash method) Dave Angel <d@davea.name> - 2013-01-22 15:43 -0500
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:37 -0800
Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:02 -0800
Re: Converting a string to a number by using INT (no hash method) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-22 16:27 +0000
Re: Converting a string to a number by using INT (no hash method) Dave Angel <d@davea.name> - 2013-01-22 11:40 -0500
Re: Converting a string to a number by using INT (no hash method) alex23 <wuwei23@gmail.com> - 2013-01-22 17:32 -0800
Re: Converting a string to a number by using INT (no hash method) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-23 03:02 +0000
Re: Converting a string to a number by using INT (no hash method) alex23 <wuwei23@gmail.com> - 2013-01-22 19:59 -0800
Re: Converting a string to a number by using INT (no hash method) "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-01-22 12:04 -0500
Page 1 of 2 [1] 2 Next page →
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 08:15 -0800 |
| Subject | Converting a string to a number by using INT (no hash method) |
| Message-ID | <e74ae1aa-f760-4123-8770-3b8a4f9a2f91@googlegroups.com> |
I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :( And the best part is that "that" number must be able to turn back into a path. This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!! 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!
[toc] | [next] | [standalone]
| From | "Leonard, Arah" <Arah.Leonard@bruker-axs.com> |
|---|---|
| Date | 2013-01-22 16:27 +0000 |
| Message-ID | <mailman.806.1358872054.2939.python-list@python.org> |
| In reply to | #37299 |
> I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :( > > And the best part is that "that" number must be able to turn back into a path. > > This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!! > > 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great! Without involving some kind of lookup table/map service to store the paths (which would entirely defeat the purpose) what you are ranting about is technically impossible. If you tried really really hard you *might* be able to convert a string that long into some kind of 4-digit integer checksum, but you would *never* be able to convert that back into a file path. Nor would it be guaranteed to be unique.
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 09:02 -0800 |
| Message-ID | <4339f8d7-2d78-450f-ad0e-91da35615e6d@googlegroups.com> |
| In reply to | #37301 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:27:32 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε: > > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :( > > > > > > And the best part is that "that" number must be able to turn back into a path. > > > > > > This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!! > > > > > > 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great! > > > > Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible. If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path. Nor would it be >guaranteed to be unique. Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number' So, what i want is a function foo() that does this: foo( "some long string" ) --> 1234 ===================== 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. turn the 'path' to 4-digit number and save it as 'pin' (how?) 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great! At some later time i want to check the weblog of that .html page 1. request the page as: http://mydomain.gr/index.html?show=log 2. .htaccess gives my script the absolute path of the requested .html file 3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking) 4. select all log records for that specific .html page (based on the 'pin' column) Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select. No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page Can this be done?
[toc] | [prev] | [next] | [standalone]
| From | "Leonard, Arah" <Arah.Leonard@bruker-axs.com> |
|---|---|
| Date | 2013-01-22 17:24 +0000 |
| Message-ID | <mailman.812.1358875475.2939.python-list@python.org> |
| In reply to | #37307 |
> No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
>
> Can this be done?
Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.
Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
pin = int( htmlpage.encode("hex"), 16 ) % 10000
It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 09:39 -0800 |
| Message-ID | <592233bd-3fc1-4e13-97f8-e11f89fbb0ba@googlegroups.com> |
| In reply to | #37311 |
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
>
> >
>
> > Can this be done?
>
>
>
> Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.
>
>
>
> Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
>
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
>
> It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!
NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!
And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.
Please take a look....
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-01-22 18:36 +0000 |
| Subject | Re: Converting a string to a number by using INT (no hash method) |
| Message-ID | <kdmm6l$hba$1@reader1.panix.com> |
| In reply to | #37313 |
In <592233bd-3fc1-4e13-97f8-e11f89fbb0ba@googlegroups.com> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> > pin int( htmlpage.encode("hex"), 16 ) % 10000
> >
> > It'll give you your number, but there are no guarantees of uniqueness.
> You're looking at more blind random luck using that.
> Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!
No it isn't; you said you wanted a unique 4-digit number. This method
can return the same 4-digit number for lots of different file paths.
> NOW, if you please explain it to me from the innermost parenthesis please,
> because i do want to understand it!!!
1. Transform the html path string into a (large) hexadecimal number
using the encode() function.
2. Convert the hexadecimal number into a decimal integer using the
int() function.
3. Shrink the integer into the range 0-9999 by using the % operator.
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2013-01-22 17:22 -0500 |
| Message-ID | <mailman.842.1358893382.2939.python-list@python.org> |
| In reply to | #37313 |
On Tue, 22 Jan 2013 09:39:11 -0800 (PST), Ferrous Cranus
<nikos.gr33k@gmail.com> declaimed the following in
gmane.comp.python.general:
>
> Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!
>
> NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!
>
> And since i'am sure it works, and i just used it on http://superhost.gr
Really: Try running:
-=-=-=-=-=-=-
import os
import collections
PATH = "c:/windows" #change this to your need
fids = os.listdir(PATH)
print "Number of files in directory %s: %s" % (PATH, len(fids))
cntr = collections.Counter()
for fid in fids:
pin = int( fid.encode("hex"), 16 ) % 10000
cntr.update([pin])
for pin, cnt in cntr.items():
if cnt > 1:
print "%10s %10s COLLISION" % (pin, cnt)
-=-=-=-=-=-
Number of files in directory c:/windows: 290
9799 2 COLLISION
7209 2 COLLISION
8900 2 COLLISION
6985 2 COLLISION
1972 2 COLLISION
2532 2 COLLISION
3559 2 COLLISION
-=-=-=-=-=-
Number of files in directory e:/userdata/wulfraed/my documents/python
progs: 157
1369 2 COLLISION
6041 3 COLLISION
3945 3 COLLISION
8489 2 COLLISION
7417 2 COLLISION
9076 2 COLLISION
6249 2 COLLISION
2457 2 COLLISION
937 2 COLLISION
8249 2 COLLISION
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 09:39 -0800 |
| Message-ID | <mailman.814.1358876859.2939.python-list@python.org> |
| In reply to | #37311 |
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
>
> >
>
> > Can this be done?
>
>
>
> Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.
>
>
>
> Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
>
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
>
> It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!
NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!
And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.
Please take a look....
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:37 -0800 |
| Message-ID | <2de57cf7-4a8f-4304-91cf-0024963315d7@googlegroups.com> |
| In reply to | #37311 |
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
>
> >
>
> > Can this be done?
>
>
>
> Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.
>
>
>
> Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
>
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
>
> It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000
==============================================
Can you please explain the differences to what you have posted opposed to this perl coding?
==============================================
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
==============================================
I want to understand this and see it implemented in Python.
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-01-22 12:02 -0700 |
| Message-ID | <mailman.826.1358881376.2939.python-list@python.org> |
| In reply to | #37329 |
On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
> ============================================== pin = int(
> htmlpage.encode("hex"), 16 ) % 10000
> ==============================================
>
> Can you please explain the differences to what you have posted
> opposed to this perl coding?
>
> ============================================== foreach my
> $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000;
> ==============================================
>
> I want to understand this and see it implemented in Python.
It isn't quite the thing. The perl code is merely a checksum of the
ascii value of the characters in the file name, that is then chopped
down to a number < 10000. The Python code is taking the ascii value of
each character in the file name, converting it to a hexadecimal pair of
digits, stringing them all out into a long string, then converting that
to a number using the hexadecimal number parser. This results in a
*very* large number, 8-bits per letter in the original file name, and
then chops that down to 10000. Technically neither method is a hash and
neither will generate unique numbers.
Here's the python algorithm used on a short word:
'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
=> 0x68656c6c6f => 448378203247
mod that with 10000 and you get 3247
If you would simply run the python interpreter and try these things out
you could see how and why they work or not work. What is stopping you
from doing this?
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 11:28 -0800 |
| Message-ID | <97e31693-5928-4e43-bc97-c449118f2de0@googlegroups.com> |
| In reply to | #37335 |
Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
>
> > ============================================== pin = int(
>
> > htmlpage.encode("hex"), 16 ) % 10000
>
> > ==============================================
>
> >
>
> > Can you please explain the differences to what you have posted
>
> > opposed to this perl coding?
>
> >
>
> > ============================================== foreach my
>
> > $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000;
>
> > ==============================================
>
> >
>
> > I want to understand this and see it implemented in Python.
>
>
>
> It isn't quite the thing. The perl code is merely a checksum of the
>
> ascii value of the characters in the file name, that is then chopped
>
> down to a number < 10000. The Python code is taking the ascii value of
>
> each character in the file name, converting it to a hexadecimal pair of
>
> digits, stringing them all out into a long string, then converting that
>
> to a number using the hexadecimal number parser. This results in a
>
> *very* large number, 8-bits per letter in the original file name, and
>
> then chops that down to 10000. Technically neither method is a hash and
>
> neither will generate unique numbers.
>
>
>
> Here's the python algorithm used on a short word:
>
> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
>
> => 0x68656c6c6f => 448378203247
>
> mod that with 10000 and you get 3247
>
>
>
> If you would simply run the python interpreter and try these things out
>
> you could see how and why they work or not work. What is stopping you
>
> from doing this?
May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>
1. this is not a script that iam being paid for.
2, this is not a class assignemnt
I just want to use that method of gettign this to work.
[toc] | [prev] | [next] | [standalone]
| From | Alan Spence <alan.spence@ntlworld.com> |
|---|---|
| Date | 2013-01-22 20:00 +0000 |
| Message-ID | <mailman.832.1358885133.2939.python-list@python.org> |
| In reply to | #37338 |
On 22 Jan 2013, at 19:28, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
>> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
>>
>>> ============================================== pin = int(
>>
>>> htmlpage.encode("hex"), 16 ) % 10000
>>
>>> ==============================================
>>
>>>
>>
>>> Can you please explain the differences to what you have posted
>>
>>> opposed to this perl coding?
>>
>>>
>>
>>> ============================================== foreach my
>>
>>> $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000;
>>
>>> ==============================================
>>
>>>
>>
>>> I want to understand this and see it implemented in Python.
>>
>>
>>
>> It isn't quite the thing. The perl code is merely a checksum of the
>>
>> ascii value of the characters in the file name, that is then chopped
>>
>> down to a number < 10000. The Python code is taking the ascii value of
>>
>> each character in the file name, converting it to a hexadecimal pair of
>>
>> digits, stringing them all out into a long string, then converting that
>>
>> to a number using the hexadecimal number parser. This results in a
>>
>> *very* large number, 8-bits per letter in the original file name, and
>>
>> then chops that down to 10000. Technically neither method is a hash and
>>
>> neither will generate unique numbers.
>>
>>
>>
>> Here's the python algorithm used on a short word:
>>
>> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
>>
>> => 0x68656c6c6f => 448378203247
>>
>> mod that with 10000 and you get 3247
>>
>>
>>
>> If you would simply run the python interpreter and try these things out
>>
>> you could see how and why they work or not work. What is stopping you
>>
>> from doing this?
>
>
> May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>
>
> 1. this is not a script that iam being paid for.
> 2, this is not a class assignemnt
>
> I just want to use that method of gettign this to work.
> --
> http://mail.python.org/mailman/listinfo/python-list
All pages, strings and objects map to:
http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm
Alan
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 11:28 -0800 |
| Message-ID | <mailman.829.1358882945.2939.python-list@python.org> |
| In reply to | #37335 |
Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
>
> > ============================================== pin = int(
>
> > htmlpage.encode("hex"), 16 ) % 10000
>
> > ==============================================
>
> >
>
> > Can you please explain the differences to what you have posted
>
> > opposed to this perl coding?
>
> >
>
> > ============================================== foreach my
>
> > $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000;
>
> > ==============================================
>
> >
>
> > I want to understand this and see it implemented in Python.
>
>
>
> It isn't quite the thing. The perl code is merely a checksum of the
>
> ascii value of the characters in the file name, that is then chopped
>
> down to a number < 10000. The Python code is taking the ascii value of
>
> each character in the file name, converting it to a hexadecimal pair of
>
> digits, stringing them all out into a long string, then converting that
>
> to a number using the hexadecimal number parser. This results in a
>
> *very* large number, 8-bits per letter in the original file name, and
>
> then chops that down to 10000. Technically neither method is a hash and
>
> neither will generate unique numbers.
>
>
>
> Here's the python algorithm used on a short word:
>
> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
>
> => 0x68656c6c6f => 448378203247
>
> mod that with 10000 and you get 3247
>
>
>
> If you would simply run the python interpreter and try these things out
>
> you could see how and why they work or not work. What is stopping you
>
> from doing this?
May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>
1. this is not a script that iam being paid for.
2, this is not a class assignemnt
I just want to use that method of gettign this to work.
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-01-22 20:40 +0000 |
| Subject | Re: Converting a string to a number by using INT (no hash method) |
| Message-ID | <kdmtg7$t4i$1@reader1.panix.com> |
| In reply to | #37339 |
In <mailman.829.1358882945.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> May i sent you my code by mail so for you see whats wrong and
> http://superhost.gr produces error?
I tried going to that address and got some error output. I noticed this
in the error dump:
186 if cursor.rowcount == 0:
187 cursor.execute( '''INSERT INTO visitors(pin, host
, hits, useros, browser, date) VALUES(%s, %s, %s, %s, %s)''', (pin, hos
t, 1, useros, browser, date) )
The INSERT statement gives six column names but only five placeholders (%s)
in the VALUES clause.
Perhaps that's the problem?
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-23 00:44 -0800 |
| Subject | Re: Converting a string to a number by using INT (no hash method) |
| Message-ID | <1ffaadd0-f700-4172-84a4-7c1d73745f83@googlegroups.com> |
| In reply to | #37346 |
Τη Τρίτη, 22 Ιανουαρίου 2013 10:40:39 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε: > In <mailman.829.1358882945.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes: > > > > > May i sent you my code by mail so for you see whats wrong and > > > http://superhost.gr produces error? > > > > I tried going to that address and got some error output. I noticed this > > in the error dump: > > > > 186 if cursor.rowcount == 0: > > 187 cursor.execute( '''INSERT INTO visitors(pin, host > > , hits, useros, browser, date) VALUES(%s, %s, %s, %s, %s)''', (pin, hos > > t, 1, useros, browser, date) ) > > > > The INSERT statement gives six column names but only five placeholders (%s) > > in the VALUES clause. > > > > Perhaps that's the problem? Excatly Gordon, i missed the extra placeholder(%s) when i was adding a new useros column. I also used a 5-digit number. Now my website finally works as intended. Just visit the following links plz. ------------------------------------------------------------------------------ 1. http://superhost.gr 2. http://superhost.gr/?show=log 3. http://i.imgur.com/3Hcz1uP.png (this displays the database's column 'pin', a 5-digit number acting as a filepath indicator. I guess i won't be needing column 'page' anymore) 4. http://i.imgur.com/kRwzLp3.png (this is the detailed page information associated to 'pin' column indicator instead of something like '/home/nikos/public_html/index.html' Isn't it a nice solution?
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2013-01-22 14:08 -0500 |
| Message-ID | <mailman.827.1358881715.2939.python-list@python.org> |
| In reply to | #37329 |
On 01/22/2013 01:37 PM, Ferrous Cranus wrote:
>
>> <snip>
>>
>
> ==============================================
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> ==============================================
>
> Can you please explain the differences to what you have posted opposed to this perl coding?
>
> ==============================================
> foreach my $ltr(@ltrs){
> $hash = ( $hash + ord($ltr)) %10000;
> ==============================================
>
> I want to understand this and see it implemented in Python.
>
The perl code will produce the same hash for "abc.html" as for
"bca.html" That's probably one reason Leonard didn't try to
transliterate the buggy code.
In any case, the likelihood of a hash collision for any non-trivial
website is substantial. As I said elsewhere, if you hash 100 files you
have about a 40% chance of a collision.
If you hash 220 files, the likelihood is about 90%
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | "Leonard, Arah" <Arah.Leonard@bruker-axs.com> |
|---|---|
| Date | 2013-01-22 20:30 +0000 |
| Message-ID | <mailman.834.1358886627.2939.python-list@python.org> |
| In reply to | #37329 |
> The perl code will produce the same hash for "abc.html" as for "bca.html" That's probably one reason Leonard didn't try to transliterate the buggy code. > Actually, to give credit where it's due, it wasn't me. I just modified someone else's interesting solution in this thread and added the silly limit of 10000 to it. > In any case, the likelihood of a hash collision for any non-trivial website is substantial. > Exactly. Four digits is hardly enough range for it to be even remotely safe. And even then range isn't really the issue as technically it just improves your odds. The results of a modulus operator are still non-unique no matter how many digits are there to work with ... within reason. Statistically anyone who buys a ticket could potentially win the lottery no matter how bad the odds are. ;) And now back to the OP, I'm still confused on this four-digit limitation. Why isn't the limitation at least adhering to a bytelength like byte/short/long? Is this database storing a string of characters instead of an actual number? (And if so, then why not just block out 255 characters instead of 4 to store a whole path? Or at the very least treat 4 characters as 4 bytes to greatly increase the numeric range?)
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2013-01-22 15:43 -0500 |
| Message-ID | <mailman.835.1358887428.2939.python-list@python.org> |
| In reply to | #37329 |
On 01/22/2013 03:30 PM, Leonard, Arah wrote: >> The perl code will produce the same hash for "abc.html" as for "bca.html" That's probably one reason Leonard didn't try to transliterate the buggy code. >> > > Actually, to give credit where it's due, it wasn't me. I just modified someone else's interesting solution in this thread and added the silly limit of 10000 to it. > That's okay. The OP doesn't seem to know anything about programming, or about information theory, so the fact you gave a single line that actually "works" must be extraordinarily valuable to him. When he was trying to use the md5 module, I gave him the hints about his five programming errors, and was about to expand on it when i noticed his 4 digit limitation. >> In any case, the likelihood of a hash collision for any non-trivial website is substantial. >> > > Exactly. Four digits is hardly enough range for it to be even remotely safe. And even then range isn't really the issue as technically it just improves your odds. > > The results of a modulus operator are still non-unique no matter how many digits are there to work with ... within reason. Statistically anyone who buys a ticket could potentially win the lottery no matter how bad the odds are. ;) > > And now back to the OP, I'm still confused on this four-digit limitation. Why isn't the limitation at least adhering to a bytelength like byte/short/long? Is this database storing a string of characters instead of an actual number? (And if so, then why not just block out 255 characters instead of 4 to store a whole path? Or at the very least treat 4 characters as 4 bytes to greatly increase the numeric range?) > I wish I had done the internet search earlier. This name 'ferrous cranus' is a pseudonym of various trolls, and anybody who'd adopt it isn't worth our time. Thanks to Alan Spence for spotting that. I'll plonk 'ferrous cranus' now. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 10:37 -0800 |
| Message-ID | <mailman.821.1358879867.2939.python-list@python.org> |
| In reply to | #37311 |
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
>
> >
>
> > Can this be done?
>
>
>
> Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.
>
>
>
> Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
>
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
>
> It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000
==============================================
Can you please explain the differences to what you have posted opposed to this perl coding?
==============================================
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
==============================================
I want to understand this and see it implemented in Python.
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-22 09:02 -0800 |
| Message-ID | <mailman.810.1358874186.2939.python-list@python.org> |
| In reply to | #37301 |
Τη Τρίτη, 22 Ιανουαρίου 2013 6:27:32 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε: > > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :( > > > > > > And the best part is that "that" number must be able to turn back into a path. > > > > > > This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!! > > > > > > 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great! > > > > Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible. If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path. Nor would it be >guaranteed to be unique. Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number' So, what i want is a function foo() that does this: foo( "some long string" ) --> 1234 ===================== 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. turn the 'path' to 4-digit number and save it as 'pin' (how?) 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great! At some later time i want to check the weblog of that .html page 1. request the page as: http://mydomain.gr/index.html?show=log 2. .htaccess gives my script the absolute path of the requested .html file 3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking) 4. select all log records for that specific .html page (based on the 'pin' column) Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select. No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page Can this be done?
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web