Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #37299 > unrolled thread

Converting a string to a number by using INT (no hash method)

Started byFerrous Cranus <nikos.gr33k@gmail.com>
First post2013-01-22 08:15 -0800
Last post2013-01-22 12:04 -0500
Articles 20 on this page of 26 — 11 participants

Back to article view | Back to comp.lang.python


Contents

  Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 08:15 -0800
    RE: Converting a string to a number by using INT  (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 16:27 +0000
      Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:02 -0800
        RE: Converting a string to a number by using INT  (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 17:24 +0000
          Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:39 -0800
            Re: Converting a string to a number by using INT (no hash method) John Gordon <gordon@panix.com> - 2013-01-22 18:36 +0000
            Re: Converting a string to a number by using INT  (no hash method) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 17:22 -0500
          Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:39 -0800
          Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:37 -0800
            Re: Converting a string to a number by using INT  (no hash method) Michael Torrie <torriem@gmail.com> - 2013-01-22 12:02 -0700
              Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 11:28 -0800
                Re: Converting a string to a number by using INT  (no hash method) Alan Spence <alan.spence@ntlworld.com> - 2013-01-22 20:00 +0000
              Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 11:28 -0800
                Re: Converting a string to a number by using INT (no hash method) John Gordon <gordon@panix.com> - 2013-01-22 20:40 +0000
                  Re: Converting a string to a number by using INT (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 00:44 -0800
            Re: Converting a string to a number by using INT  (no hash method) Dave Angel <d@davea.name> - 2013-01-22 14:08 -0500
            RE: Converting a string to a number by using INT  (no hash method) "Leonard, Arah" <Arah.Leonard@bruker-axs.com> - 2013-01-22 20:30 +0000
            Re: Converting a string to a number by using INT  (no hash method) Dave Angel <d@davea.name> - 2013-01-22 15:43 -0500
          Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 10:37 -0800
      Re: Converting a string to a number by using INT  (no hash method) Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-22 09:02 -0800
    Re: Converting a string to a number by using INT  (no hash method) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-22 16:27 +0000
    Re: Converting a string to a number by using INT  (no hash method) Dave Angel <d@davea.name> - 2013-01-22 11:40 -0500
      Re: Converting a string to a number by using INT (no hash method) alex23 <wuwei23@gmail.com> - 2013-01-22 17:32 -0800
        Re: Converting a string to a number by using INT (no hash method) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-23 03:02 +0000
          Re: Converting a string to a number by using INT (no hash method) alex23 <wuwei23@gmail.com> - 2013-01-22 19:59 -0800
    Re: Converting a string to a number by using INT  (no hash method) "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-01-22 12:04 -0500

Page 1 of 2  [1] 2  Next page →


#37299 — Converting a string to a number by using INT (no hash method)

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 08:15 -0800
SubjectConverting a string to a number by using INT (no hash method)
Message-ID<e74ae1aa-f760-4123-8770-3b8a4f9a2f91@googlegroups.com>
I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

And the best part is that "that" number must be able to turn back into a path.

This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!

1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. I turn the path into a 4-digitnumber
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!

[toc] | [next] | [standalone]


#37301

From"Leonard, Arah" <Arah.Leonard@bruker-axs.com>
Date2013-01-22 16:27 +0000
Message-ID<mailman.806.1358872054.2939.python-list@python.org>
In reply to#37299
> I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
> 
> And the best part is that "that" number must be able to turn back into a path.
> 
> This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!
> 
> 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!

Without involving some kind of lookup table/map service to store the paths (which would entirely defeat the purpose) what you are ranting about is technically impossible.  If you tried really really hard you *might* be able to convert a string that long into some kind of 4-digit integer checksum, but you would *never* be able to convert that back into a file path.  Nor would it be guaranteed to be unique.

[toc] | [prev] | [next] | [standalone]


#37307

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 09:02 -0800
Message-ID<4339f8d7-2d78-450f-ad0e-91da35615e6d@googlegroups.com>
In reply to#37301
Τη Τρίτη, 22 Ιανουαρίου 2013 6:27:32 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
> 
> > 
> 
> > And the best part is that "that" number must be able to turn back into a path.
> 
> > 
> 
> > This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!
> 
> > 
> 
> > 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!
> 
> 
> 
> Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible.  If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path.  Nor would it be >guaranteed to be unique.

Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number'

So, what i want is a function foo() that does this:

foo( "some long string" )  -->  1234

=====================
1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 
2. turn the 'path' to 4-digit number and save it as 'pin'  (how?)
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great! 


At some later time i want to check the weblog of that .html page


1. request the page as:  http://mydomain.gr/index.html?show=log
2. .htaccess gives my script the absolute path of the requested .html file
3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking)
4. select all log records for that specific .html page  (based on the 'pin' column)


Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select.

No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page

Can this be done?

[toc] | [prev] | [next] | [standalone]


#37311

From"Leonard, Arah" <Arah.Leonard@bruker-axs.com>
Date2013-01-22 17:24 +0000
Message-ID<mailman.812.1358875475.2939.python-list@python.org>
In reply to#37307
> No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
> 
> Can this be done?

Guaranteed to be unique?  Not even remotely possible.  Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers.  And that's the best-case scenario.  Anything else would be worse.

Not guaranteed to be unique?  Easy.  Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
pin = int( htmlpage.encode("hex"), 16 ) % 10000
It'll give you your number, but there are no guarantees of uniqueness.  You're looking at more blind random luck using that.

[toc] | [prev] | [next] | [standalone]


#37313

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 09:39 -0800
Message-ID<592233bd-3fc1-4e13-97f8-e11f89fbb0ba@googlegroups.com>
In reply to#37311
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
> 
> > 
> 
> > Can this be done?
> 
> 
> 
> Guaranteed to be unique?  Not even remotely possible.  Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers.  And that's the best-case scenario.  Anything else would be worse.
> 
> 
> 
> Not guaranteed to be unique?  Easy.  Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
> 
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> 
> It'll give you your number, but there are no guarantees of uniqueness.  You're looking at more blind random luck using that.

Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!

And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.

Please take a look....

[toc] | [prev] | [next] | [standalone]


#37326 — Re: Converting a string to a number by using INT (no hash method)

FromJohn Gordon <gordon@panix.com>
Date2013-01-22 18:36 +0000
SubjectRe: Converting a string to a number by using INT (no hash method)
Message-ID<kdmm6l$hba$1@reader1.panix.com>
In reply to#37313
In <592233bd-3fc1-4e13-97f8-e11f89fbb0ba@googlegroups.com> Ferrous Cranus <nikos.gr33k@gmail.com> writes:

> > pin int( htmlpage.encode("hex"), 16 ) % 10000
> >
> > It'll give you your number, but there are no guarantees of uniqueness.
> You're looking at more blind random luck using that.

> Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

No it isn't; you said you wanted a unique 4-digit number.  This method
can return the same 4-digit number for lots of different file paths.

> NOW, if you please explain it to me from the innermost parenthesis please,
> because i do want to understand it!!!

1. Transform the html path string into a (large) hexadecimal number
using the encode() function.

2. Convert the hexadecimal number into a decimal integer using the
int() function.

3. Shrink the integer into the range 0-9999 by using the % operator.

[toc] | [prev] | [next] | [standalone]


#37356

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2013-01-22 17:22 -0500
Message-ID<mailman.842.1358893382.2939.python-list@python.org>
In reply to#37313
On Tue, 22 Jan 2013 09:39:11 -0800 (PST), Ferrous Cranus
<nikos.gr33k@gmail.com> declaimed the following in
gmane.comp.python.general:


> 
> Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!
> 
> NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!
> 
> And since i'am sure it works, and i just used it on http://superhost.gr

	Really: Try running:

-=-=-=-=-=-=-
import os
import collections

PATH = "c:/windows"	#change this to your need

fids = os.listdir(PATH)

print "Number of files in directory %s: %s" % (PATH, len(fids))

cntr = collections.Counter()

for fid in fids:
    pin = int( fid.encode("hex"), 16 ) % 10000
    cntr.update([pin])

for pin, cnt in cntr.items():
    if cnt > 1:
        print "%10s %10s COLLISION" % (pin, cnt)


-=-=-=-=-=-
Number of files in directory c:/windows: 290
      9799          2 COLLISION
      7209          2 COLLISION
      8900          2 COLLISION
      6985          2 COLLISION
      1972          2 COLLISION
      2532          2 COLLISION
      3559          2 COLLISION

-=-=-=-=-=-
Number of files in directory e:/userdata/wulfraed/my documents/python
progs: 157
      1369          2 COLLISION
      6041          3 COLLISION
      3945          3 COLLISION
      8489          2 COLLISION
      7417          2 COLLISION
      9076          2 COLLISION
      6249          2 COLLISION
      2457          2 COLLISION
       937          2 COLLISION
      8249          2 COLLISION
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#37315

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 09:39 -0800
Message-ID<mailman.814.1358876859.2939.python-list@python.org>
In reply to#37311
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
> 
> > 
> 
> > Can this be done?
> 
> 
> 
> Guaranteed to be unique?  Not even remotely possible.  Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers.  And that's the best-case scenario.  Anything else would be worse.
> 
> 
> 
> Not guaranteed to be unique?  Easy.  Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
> 
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> 
> It'll give you your number, but there are no guarantees of uniqueness.  You're looking at more blind random luck using that.

Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!

And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.

Please take a look....

[toc] | [prev] | [next] | [standalone]


#37329

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 10:37 -0800
Message-ID<2de57cf7-4a8f-4304-91cf-0024963315d7@googlegroups.com>
In reply to#37311
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
> 
> > 
> 
> > Can this be done?
> 
> 
> 
> Guaranteed to be unique?  Not even remotely possible.  Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers.  And that's the best-case scenario.  Anything else would be worse.
> 
> 
> 
> Not guaranteed to be unique?  Easy.  Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
> 
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> 
> It'll give you your number, but there are no guarantees of uniqueness.  You're looking at more blind random luck using that.

==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000 
==============================================

Can you please explain the differences to what you have posted opposed to this perl coding?

==============================================
foreach my $ltr(@ltrs){
        $hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.

[toc] | [prev] | [next] | [standalone]


#37335

FromMichael Torrie <torriem@gmail.com>
Date2013-01-22 12:02 -0700
Message-ID<mailman.826.1358881376.2939.python-list@python.org>
In reply to#37329
On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
> ============================================== pin = int(
> htmlpage.encode("hex"), 16 ) % 10000 
> ==============================================
> 
> Can you please explain the differences to what you have posted
> opposed to this perl coding?
> 
> ============================================== foreach my
> $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000; 
> ==============================================
> 
> I want to understand this and see it implemented in Python.

It isn't quite the thing.  The perl code is merely a checksum of the
ascii value of the characters in the file name, that is then chopped
down to a number < 10000.  The Python code is taking the ascii value of
each character in the file name, converting it to a hexadecimal pair of
digits, stringing them all out into a long string, then converting that
to a number using the hexadecimal number parser. This results in a
*very* large number, 8-bits per letter in the original file name, and
then chops that down to 10000.  Technically neither method is a hash and
neither will generate unique numbers.

Here's the python algorithm used on a short word:
'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
=> 0x68656c6c6f => 448378203247
mod that with 10000 and you get 3247

If you would simply run the python interpreter and try these things out
you could see how and why they work or not work.  What is stopping you
from doing this?

[toc] | [prev] | [next] | [standalone]


#37338

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 11:28 -0800
Message-ID<97e31693-5928-4e43-bc97-c449118f2de0@googlegroups.com>
In reply to#37335
Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
> 
> > ============================================== pin = int(
> 
> > htmlpage.encode("hex"), 16 ) % 10000 
> 
> > ==============================================
> 
> > 
> 
> > Can you please explain the differences to what you have posted
> 
> > opposed to this perl coding?
> 
> > 
> 
> > ============================================== foreach my
> 
> > $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000; 
> 
> > ==============================================
> 
> > 
> 
> > I want to understand this and see it implemented in Python.
> 
> 
> 
> It isn't quite the thing.  The perl code is merely a checksum of the
> 
> ascii value of the characters in the file name, that is then chopped
> 
> down to a number < 10000.  The Python code is taking the ascii value of
> 
> each character in the file name, converting it to a hexadecimal pair of
> 
> digits, stringing them all out into a long string, then converting that
> 
> to a number using the hexadecimal number parser. This results in a
> 
> *very* large number, 8-bits per letter in the original file name, and
> 
> then chops that down to 10000.  Technically neither method is a hash and
> 
> neither will generate unique numbers.
> 
> 
> 
> Here's the python algorithm used on a short word:
> 
> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
> 
> => 0x68656c6c6f => 448378203247
> 
> mod that with 10000 and you get 3247
> 
> 
> 
> If you would simply run the python interpreter and try these things out
> 
> you could see how and why they work or not work.  What is stopping you
> 
> from doing this?


May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>

1. this is not a script that iam being paid for.
2, this is not a class assignemnt

I just want to use that method of gettign this to work.

[toc] | [prev] | [next] | [standalone]


#37343

FromAlan Spence <alan.spence@ntlworld.com>
Date2013-01-22 20:00 +0000
Message-ID<mailman.832.1358885133.2939.python-list@python.org>
In reply to#37338
On 22 Jan 2013, at 19:28, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:

> Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
>> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
>> 
>>> ============================================== pin = int(
>> 
>>> htmlpage.encode("hex"), 16 ) % 10000 
>> 
>>> ==============================================
>> 
>>> 
>> 
>>> Can you please explain the differences to what you have posted
>> 
>>> opposed to this perl coding?
>> 
>>> 
>> 
>>> ============================================== foreach my
>> 
>>> $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000; 
>> 
>>> ==============================================
>> 
>>> 
>> 
>>> I want to understand this and see it implemented in Python.
>> 
>> 
>> 
>> It isn't quite the thing.  The perl code is merely a checksum of the
>> 
>> ascii value of the characters in the file name, that is then chopped
>> 
>> down to a number < 10000.  The Python code is taking the ascii value of
>> 
>> each character in the file name, converting it to a hexadecimal pair of
>> 
>> digits, stringing them all out into a long string, then converting that
>> 
>> to a number using the hexadecimal number parser. This results in a
>> 
>> *very* large number, 8-bits per letter in the original file name, and
>> 
>> then chops that down to 10000.  Technically neither method is a hash and
>> 
>> neither will generate unique numbers.
>> 
>> 
>> 
>> Here's the python algorithm used on a short word:
>> 
>> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
>> 
>> => 0x68656c6c6f => 448378203247
>> 
>> mod that with 10000 and you get 3247
>> 
>> 
>> 
>> If you would simply run the python interpreter and try these things out
>> 
>> you could see how and why they work or not work.  What is stopping you
>> 
>> from doing this?
> 
> 
> May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>
> 
> 1. this is not a script that iam being paid for.
> 2, this is not a class assignemnt
> 
> I just want to use that method of gettign this to work.
> -- 
> http://mail.python.org/mailman/listinfo/python-list

All pages, strings and objects map to:

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm

Alan

[toc] | [prev] | [next] | [standalone]


#37339

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 11:28 -0800
Message-ID<mailman.829.1358882945.2939.python-list@python.org>
In reply to#37335
Τη Τρίτη, 22 Ιανουαρίου 2013 9:02:48 μ.μ. UTC+2, ο χρήστης Michael Torrie έγραψε:
> On 01/22/2013 11:37 AM, Ferrous Cranus wrote:
> 
> > ============================================== pin = int(
> 
> > htmlpage.encode("hex"), 16 ) % 10000 
> 
> > ==============================================
> 
> > 
> 
> > Can you please explain the differences to what you have posted
> 
> > opposed to this perl coding?
> 
> > 
> 
> > ============================================== foreach my
> 
> > $ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000; 
> 
> > ==============================================
> 
> > 
> 
> > I want to understand this and see it implemented in Python.
> 
> 
> 
> It isn't quite the thing.  The perl code is merely a checksum of the
> 
> ascii value of the characters in the file name, that is then chopped
> 
> down to a number < 10000.  The Python code is taking the ascii value of
> 
> each character in the file name, converting it to a hexadecimal pair of
> 
> digits, stringing them all out into a long string, then converting that
> 
> to a number using the hexadecimal number parser. This results in a
> 
> *very* large number, 8-bits per letter in the original file name, and
> 
> then chops that down to 10000.  Technically neither method is a hash and
> 
> neither will generate unique numbers.
> 
> 
> 
> Here's the python algorithm used on a short word:
> 
> 'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
> 
> => 0x68656c6c6f => 448378203247
> 
> mod that with 10000 and you get 3247
> 
> 
> 
> If you would simply run the python interpreter and try these things out
> 
> you could see how and why they work or not work.  What is stopping you
> 
> from doing this?


May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>

1. this is not a script that iam being paid for.
2, this is not a class assignemnt

I just want to use that method of gettign this to work.

[toc] | [prev] | [next] | [standalone]


#37346 — Re: Converting a string to a number by using INT (no hash method)

FromJohn Gordon <gordon@panix.com>
Date2013-01-22 20:40 +0000
SubjectRe: Converting a string to a number by using INT (no hash method)
Message-ID<kdmtg7$t4i$1@reader1.panix.com>
In reply to#37339
In <mailman.829.1358882945.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:

> May i sent you my code by mail so for you see whats wrong and
> http://superhost.gr produces error?

I tried going to that address and got some error output.  I noticed this
in the error dump:

     186         if cursor.rowcount == 0:
     187                 cursor.execute( '''INSERT INTO visitors(pin, host
   , hits, useros, browser, date) VALUES(%s, %s, %s, %s, %s)''', (pin, hos
   t, 1, useros, browser, date) )

The INSERT statement gives six column names but only five placeholders (%s)
in the VALUES clause.

Perhaps that's the problem?

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]


#37425 — Re: Converting a string to a number by using INT (no hash method)

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-23 00:44 -0800
SubjectRe: Converting a string to a number by using INT (no hash method)
Message-ID<1ffaadd0-f700-4172-84a4-7c1d73745f83@googlegroups.com>
In reply to#37346
Τη Τρίτη, 22 Ιανουαρίου 2013 10:40:39 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε:
> In <mailman.829.1358882945.2939.python-list@python.org> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> 
> 
> 
> > May i sent you my code by mail so for you see whats wrong and
> 
> > http://superhost.gr produces error?
> 
> 
> 
> I tried going to that address and got some error output.  I noticed this
> 
> in the error dump:
> 
> 
> 
>      186         if cursor.rowcount == 0:
> 
>      187                 cursor.execute( '''INSERT INTO visitors(pin, host
> 
>    , hits, useros, browser, date) VALUES(%s, %s, %s, %s, %s)''', (pin, hos
> 
>    t, 1, useros, browser, date) )
> 
> 
> 
> The INSERT statement gives six column names but only five placeholders (%s)
> 
> in the VALUES clause.
> 
> 
> 
> Perhaps that's the problem?

Excatly Gordon, i missed the extra placeholder(%s) when i was adding a new useros column. I also used a 5-digit number.

Now my website finally works as intended. Just visit the following links plz.
------------------------------------------------------------------------------
1. http://superhost.gr

2. http://superhost.gr/?show=log

3. http://i.imgur.com/3Hcz1uP.png  (this displays the database's column 'pin', a 5-digit number acting as a filepath indicator. I guess i won't be needing column 'page' anymore)

4. http://i.imgur.com/kRwzLp3.png   (this is the detailed page information associated to 'pin' column indicator instead of something like '/home/nikos/public_html/index.html'

Isn't it a nice solution?

[toc] | [prev] | [next] | [standalone]


#37336

FromDave Angel <d@davea.name>
Date2013-01-22 14:08 -0500
Message-ID<mailman.827.1358881715.2939.python-list@python.org>
In reply to#37329
On 01/22/2013 01:37 PM, Ferrous Cranus wrote:
>
>>  <snip>
>>
>
> ==============================================
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> ==============================================
>
> Can you please explain the differences to what you have posted opposed to this perl coding?
>
> ==============================================
> foreach my $ltr(@ltrs){
>          $hash = ( $hash + ord($ltr)) %10000;
> ==============================================
>
> I want to understand this and see it implemented in Python.
>

The perl code will produce the same hash for  "abc.html" as for 
"bca.html"  That's probably one reason Leonard didn't try to 
transliterate the buggy code.

In any case, the likelihood of a hash collision for any non-trivial 
website is substantial.  As I said elsewhere, if you hash 100 files you 
have about a 40% chance of a collision.

If you hash 220 files, the likelihood is about 90%

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#37345

From"Leonard, Arah" <Arah.Leonard@bruker-axs.com>
Date2013-01-22 20:30 +0000
Message-ID<mailman.834.1358886627.2939.python-list@python.org>
In reply to#37329
> The perl code will produce the same hash for  "abc.html" as for "bca.html"  That's probably one reason Leonard didn't try to transliterate the buggy code.
> 

Actually, to give credit where it's due, it wasn't me.  I just modified someone else's interesting solution in this thread and added the silly limit of 10000 to it.

> In any case, the likelihood of a hash collision for any non-trivial website is substantial.
> 

Exactly.  Four digits is hardly enough range for it to be even remotely safe.  And even then range isn't really the issue as technically it just improves your odds.

The results of a modulus operator are still non-unique no matter how many digits are there to work with ... within reason.  Statistically anyone who buys a ticket could potentially win the lottery no matter how bad the odds are.  ;)

And now back to the OP, I'm still confused on this four-digit limitation.  Why isn't the limitation at least adhering to a bytelength like byte/short/long?  Is this database storing a string of characters instead of an actual number?  (And if so, then why not just block out 255 characters instead of 4 to store a whole path?  Or at the very least treat 4 characters as 4 bytes to greatly increase the numeric range?)

[toc] | [prev] | [next] | [standalone]


#37347

FromDave Angel <d@davea.name>
Date2013-01-22 15:43 -0500
Message-ID<mailman.835.1358887428.2939.python-list@python.org>
In reply to#37329
On 01/22/2013 03:30 PM, Leonard, Arah wrote:
>> The perl code will produce the same hash for  "abc.html" as for "bca.html"  That's probably one reason Leonard didn't try to transliterate the buggy code.
>>
>
> Actually, to give credit where it's due, it wasn't me.  I just modified someone else's interesting solution in this thread and added the silly limit of 10000 to it.
>

That's okay.  The OP doesn't seem to know anything about programming, or 
about information theory, so the fact you gave a single line that 
actually "works" must be extraordinarily valuable to him.  When he was 
trying to use the md5 module, I gave him the hints about his five 
programming errors, and was about to expand on it when i noticed his 4 
digit limitation.

>> In any case, the likelihood of a hash collision for any non-trivial website is substantial.
>>
>
> Exactly.  Four digits is hardly enough range for it to be even remotely safe.  And even then range isn't really the issue as technically it just improves your odds.
>
> The results of a modulus operator are still non-unique no matter how many digits are there to work with ... within reason.  Statistically anyone who buys a ticket could potentially win the lottery no matter how bad the odds are.  ;)
>
> And now back to the OP, I'm still confused on this four-digit limitation.  Why isn't the limitation at least adhering to a bytelength like byte/short/long?  Is this database storing a string of characters instead of an actual number?  (And if so, then why not just block out 255 characters instead of 4 to store a whole path?  Or at the very least treat 4 characters as 4 bytes to greatly increase the numeric range?)
>

I wish I had done the internet search earlier.  This name 'ferrous 
cranus' is a pseudonym of various trolls, and anybody who'd adopt it 
isn't worth our time.

Thanks to Alan Spence for spotting that.  I'll plonk 'ferrous cranus' now.


-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#37330

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 10:37 -0800
Message-ID<mailman.821.1358879867.2939.python-list@python.org>
In reply to#37311
Τη Τρίτη, 22 Ιανουαρίου 2013 7:24:26 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
> 
> > 
> 
> > Can this be done?
> 
> 
> 
> Guaranteed to be unique?  Not even remotely possible.  Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers.  And that's the best-case scenario.  Anything else would be worse.
> 
> 
> 
> Not guaranteed to be unique?  Easy.  Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
> 
> pin = int( htmlpage.encode("hex"), 16 ) % 10000
> 
> It'll give you your number, but there are no guarantees of uniqueness.  You're looking at more blind random luck using that.

==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000 
==============================================

Can you please explain the differences to what you have posted opposed to this perl coding?

==============================================
foreach my $ltr(@ltrs){
        $hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.

[toc] | [prev] | [next] | [standalone]


#37308

FromFerrous Cranus <nikos.gr33k@gmail.com>
Date2013-01-22 09:02 -0800
Message-ID<mailman.810.1358874186.2939.python-list@python.org>
In reply to#37301
Τη Τρίτη, 22 Ιανουαρίου 2013 6:27:32 μ.μ. UTC+2, ο χρήστης Leonard, Arah έγραψε:
> > I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
> 
> > 
> 
> > And the best part is that "that" number must be able to turn back into a path.
> 
> > 
> 
> > This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!
> 
> > 
> 
> > 1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!
> 
> 
> 
> Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible.  If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path.  Nor would it be >guaranteed to be unique.

Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number'

So, what i want is a function foo() that does this:

foo( "some long string" )  -->  1234

=====================
1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 
2. turn the 'path' to 4-digit number and save it as 'pin'  (how?)
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great! 


At some later time i want to check the weblog of that .html page


1. request the page as:  http://mydomain.gr/index.html?show=log
2. .htaccess gives my script the absolute path of the requested .html file
3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking)
4. select all log records for that specific .html page  (based on the 'pin' column)


Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select.

No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page

Can this be done?

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web