Groups > comp.lang.python > #37026 > unrolled thread

Uniquely identifying each & every html template

Started by	Ferrous Cranus <nikos.gr33k@gmail.com>
First post	2013-01-18 12:48 -0800
Last post	2013-01-19 00:39 -0800
Articles	20 on this page of 62 — 14 participants

Back to article view | Back to comp.lang.python

  Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-18 12:48 -0800
    Re: Uniquely identifying each & every html template John Gordon <gordon@panix.com> - 2013-01-18 20:59 +0000
      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-18 14:12 -0800
    Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-18 17:09 -0500
      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-19 00:39 -0800
        Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-19 04:00 -0500
          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 23:08 -0800
            Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-21 18:20 +1100
              Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 01:19 -0800
                Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-21 20:31 +1100
                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 04:06 -0800
                    Re: Uniquely identifying each & every html template Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-21 12:39 +0000
                      Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 04:55 -0800
                        Re: Uniquely identifying each & every html template rusi <rustompmody@gmail.com> - 2013-01-21 19:24 -0800
                          Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-22 15:39 +1100
                      Re: Uniquely identifying each & every html template Tom P <werotizy@freent.dd> - 2013-01-22 00:01 +0100
                        Re: Uniquely identifying each & every html template Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-21 23:43 +0000
                        Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-22 11:04 +1100
                          Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-22 17:36 -0800
                    Re: Uniquely identifying each & every html template Joel Goldstick <joel.goldstick@gmail.com> - 2013-01-21 07:47 -0500
                      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:00 -0800
                        Re: Uniquely identifying each & every html template Michael Torrie <torriem@gmail.com> - 2013-01-22 16:55 -0700
                          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:12 -0800
                            Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-23 01:37 -0800
                              Re: Uniquely identifying each & every html template Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-23 09:49 +0000
                              Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 02:29 -0800
                                Re: Uniquely identifying each & every html template Joel Goldstick <joel.goldstick@gmail.com> - 2013-01-23 07:03 -0500
                                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 04:26 -0800
                                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 04:26 -0800
                                Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-23 06:25 -0800
                                Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 07:38 -0500
                                Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-24 10:25 +1100
                                Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 19:09 -0500
                                Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-24 11:39 +1100
                                Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 19:53 -0500
                            Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-23 16:01 -0500
                          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:12 -0800
                      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:00 -0800
                    Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-21 17:26 -0500
                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 04:06 -0800
                    Re: Uniquely identifying each & every html template Tim Roberts <timr@probo.com> - 2013-01-21 19:57 -0800
                Re: Uniquely identifying each & every html template Tim Roberts <timr@probo.com> - 2013-01-21 20:04 -0800
                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 22:49 -0800
                    Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 16:08 -0500
                      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:15 -0800
                      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:15 -0800
              Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 01:19 -0800
                Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 04:56 -0800
                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:03 -0800
                    Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 15:35 -0800
                Re: Uniquely identifying each & every html template Piet van Oostrum <piet@vanoostrum.org> - 2013-01-21 21:48 +0100
                  Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 22:38 -0800
              Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:07 -0800
                Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 15:36 -0800
                Re: Uniquely identifying each & every html template rusi <rustompmody@gmail.com> - 2013-01-21 20:18 -0800
              Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:07 -0800
          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 23:08 -0800
        Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-19 16:32 -0500
          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 22:52 -0800
          Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 22:52 -0800
        Re: Uniquely identifying each & every html template John Gordon <gordon@panix.com> - 2013-01-22 16:35 +0000
      Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-19 00:39 -0800

Page 1 of 4 [1] 2 3 4 Next page →

#37026 — Uniquely identifying each & every html template

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-18 12:48 -0800
Subject	Uniquely identifying each & every html template
Message-ID	<8deb6f5d-ff10-4b36-bdd6-36f9eed58e1e@googlegroups.com>

I use this .htaccess file to rewrite every .html request to counter.py

# =================================================================================================================
RewriteEngine On 
RewriteCond %{REQUEST_FILENAME} -f 
RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
# =================================================================================================================



counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there

# =================================================================================================================
# open current html template and get the page ID number
# =================================================================================================================
f = open( '/home/nikos/public_html/' + page )

# read first line of the file
firstline = f.readline()

# find the ID of the file and store it
pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
# =================================================================================================================

It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)

What is the problem you ask?!
Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:

index.html      <!-- 1 -->
somefile.html   <!-- 2-->
other.html      <!-- 3 -->
nikos.html      <!-- 4 -->
cool.html       <!-- 5 -->

to HELP counter.py identify each webpage at a unique way.

Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.

My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.

or perhaps by modifying them a bit..... but in an automatic way....?

Thank you ALL in advance.

[toc] | [next] | [standalone]

#37029

From	John Gordon <gordon@panix.com>
Date	2013-01-18 20:59 +0000
Message-ID	<kdcd35$pri$1@reader1.panix.com>
In reply to	#37026

In <8deb6f5d-ff10-4b36-bdd6-36f9eed58e1e@googlegroups.com> Ferrous Cranus <nikos.gr33k@gmail.com> writes:

> Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:

> index.html      <!-- 1 -->
> somefile.html   <!-- 2-->
> other.html      <!-- 3 -->
> nikos.html      <!-- 4 -->
> cool.html       <!-- 5 -->

> to HELP counter.py identify each webpage at a unique way.

Instead of inserting unique content in every page, can you use the
document path itself as the identifier?

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]

#37035

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-18 14:12 -0800
Message-ID	<56802884-9a7d-4aae-b080-115a300b1023@googlegroups.com>
In reply to	#37029

Τη Παρασκευή, 18 Ιανουαρίου 2013 10:59:17 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε:

> Instead of inserting unique content in every page, can't you use the 
> document path itself as the identifier?

No, i cannot, becaue it would mess things at later time when i for example:

1. mv name.html othername.html   (document's filename altered)
2. mv name.html /subfolder/name.html   (document's path altered)

Hence, new database counters will be created for each of the above cases.

[toc] | [prev] | [next] | [standalone]

#37034

From	Dave Angel <d@davea.name>
Date	2013-01-18 17:09 -0500
Message-ID	<mailman.655.1358546994.2939.python-list@python.org>
In reply to	#37026

On 01/18/2013 03:48 PM, Ferrous Cranus wrote:
> I use this .htaccess file to rewrite every .html request to counter.py
>
> # =================================================================================================================
> RewriteEngine On
> RewriteCond %{REQUEST_FILENAME} -f
> RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
> # =================================================================================================================
>
>
>
> counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
> It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there
>
> # =================================================================================================================
> # open current html template and get the page ID number
> # =================================================================================================================
> f = open( '/home/nikos/public_html/' + page )
>
> # read first line of the file
> firstline = f.readline()
>
> # find the ID of the file and store it
> pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
> # =================================================================================================================
>
> It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
>
> What is the problem you ask?!
> Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
>
> index.html      <!-- 1 -->
> somefile.html   <!-- 2-->
> other.html      <!-- 3 -->
> nikos.html      <!-- 4 -->
> cool.html       <!-- 5 -->
>
> to HELP counter.py identify each webpage at a unique way.
>
> Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
> Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
> Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.
>
> My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.
>
> or perhaps by modifying them a bit..... but in an automatic way....?
>
> Thank you ALL in advance.
>
>

I don't understand the problem.  A trivial Python script could scan 
through all the files in the directory, checking which ones are missing 
the identifier, and rewriting the file with the identifier added.

So, since you didn't come to that conclusion, there must be some other 
reason you don't want to edit the files.  Is it that the real sources 
are elsewhere (e.g. Dreamweaver), and whenever one recompiles those 
sources, these files get replaced (without identifiers)?

If that's the case, then I figure you have about 3 choices:

1) use the file path as your key, instead of requiring a number
2) use a hash of the page  (eg. md5) as your key.  of course this could 
mean that you get a new value whenever the page is updated.  That's good 
in many situations, but you don't give enough information to know if 
that's desirable for you or not.
3) Keep an external list of filenames, and their associated id numbers. 
  The database would be a good place to store such a list, in a separate 
table.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#37073

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-19 00:39 -0800
Message-ID	<5dd4babd-716d-4542-ad36-e6a841b73ec3@googlegroups.com>
In reply to	#37034

Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:

> I don't understand the problem.  A trivial Python script could scan 
> 
> through all the files in the directory, checking which ones are missing 
> 
> the identifier, and rewriting the file with the identifier added. 

> 
> So, since you didn't come to that conclusion, there must be some other 
> 
> reason you don't want to edit the files.  Is it that the real sources 
> 
> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those 
> 
> sources, these files get replaced (without identifiers)?

Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.
 

> If that's the case, then I figure you have about 3 choices:
> 1) use the file path as your key, instead of requiring a number

No, i cannot, because it would mess things at a later time on when i for example: 

1. mv name.html othername.html   (document's filename altered) 
2. mv name.html /subfolder/name.html   (document's filepath altered) 

Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.

Pros: If the file's contents gets updated, that won't affect the counter.
Cons: If filepath is altered, then duplicity will happen.


> 2) use a hash of the page  (eg. md5) as your key.  of course this could  
> mean that you get a new value whenever the page is updated.  That's good  
> in many situations, but you don't give enough information to know if 
> that's desirable for you or not.

That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution.

Pros: If filepath is altered, that won't affect the counter.
Cons: If file's contents gets updated the, then duplicity will happen.


> 3) Keep an external list of filenames, and their associated id numbers. 
> The database would be a good place to store such a list, in a separate table.

I did not understand that solution.


We need to find a way so even IF:

(filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.

[toc] | [prev] | [next] | [standalone]

#37076

From	Dave Angel <d@davea.name>
Date	2013-01-19 04:00 -0500
Message-ID	<mailman.684.1358586035.2939.python-list@python.org>
In reply to	#37073

On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
> Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
>
>> I don't understand the problem.  A trivial Python script could scan
>>
>> through all the files in the directory, checking which ones are missing
>>
>> the identifier, and rewriting the file with the identifier added.
>
>>
>> So, since you didn't come to that conclusion, there must be some other
>>
>> reason you don't want to edit the files.  Is it that the real sources
>>
>> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
>>
>> sources, these files get replaced (without identifiers)?
>
> Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.
>
>
>> If that's the case, then I figure you have about 3 choices:
>> 1) use the file path as your key, instead of requiring a number
>
> No, i cannot, because it would mess things at a later time on when i for example:
>
> 1. mv name.html othername.html   (document's filename altered)
> 2. mv name.html /subfolder/name.html   (document's filepath altered)
>
> Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.
>
> Pros: If the file's contents gets updated, that won't affect the counter.
> Cons: If filepath is altered, then duplicity will happen.
>
>
>> 2) use a hash of the page  (eg. md5) as your key.  of course this could
>> mean that you get a new value whenever the page is updated.  That's good
>> in many situations, but you don't give enough information to know if
>> that's desirable for you or not.
>
> That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution.
>
> Pros: If filepath is altered, that won't affect the counter.
> Cons: If file's contents gets updated the, then duplicity will happen.
>
>
>> 3) Keep an external list of filenames, and their associated id numbers.
>> The database would be a good place to store such a list, in a separate table.
>
> I did not understand that solution.
>
>
> We need to find a way so even IF:
>
> (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.
>

You don't yet have a programming problem, you have a specification 
problem.  Somehow, you want a file to be considered "the same" even when 
it's moved, renamed and/or modified.  So all files are the same, and you 
only need one id.

Don't pick a mechanism until you have an self-consistent spec.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#37161

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-20 23:08 -0800
Message-ID	<03581a24-9330-4019-bde9-61a607000d3d@googlegroups.com>
In reply to	#37076

Τη Σάββατο, 19 Ιανουαρίου 2013 11:00:15 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
> On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
> 
> > Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
> 
> >
> 
> >> I don't understand the problem.  A trivial Python script could scan
> 
> >>
> 
> >> through all the files in the directory, checking which ones are missing
> 
> >>
> 
> >> the identifier, and rewriting the file with the identifier added.
> 
> >
> 
> >>
> 
> >> So, since you didn't come to that conclusion, there must be some other
> 
> >>
> 
> >> reason you don't want to edit the files.  Is it that the real sources
> 
> >>
> 
> >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
> 
> >>
> 
> >> sources, these files get replaced (without identifiers)?
> 
> >
> 
> > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.
> 
> >
> 
> >
> 
> >> If that's the case, then I figure you have about 3 choices:
> 
> >> 1) use the file path as your key, instead of requiring a number
> 
> >
> 
> > No, i cannot, because it would mess things at a later time on when i for example:
> 
> >
> 
> > 1. mv name.html othername.html   (document's filename altered)
> 
> > 2. mv name.html /subfolder/name.html   (document's filepath altered)
> 
> >
> 
> > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.
> 
> >
> 
> > Pros: If the file's contents gets updated, that won't affect the counter.
> 
> > Cons: If filepath is altered, then duplicity will happen.
> 
> >
> 
> >
> 
> >> 2) use a hash of the page  (eg. md5) as your key.  of course this could
> 
> >> mean that you get a new value whenever the page is updated.  That's good
> 
> >> in many situations, but you don't give enough information to know if
> 
> >> that's desirable for you or not.
> 
> >
> 
> > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution.
> 
> >
> 
> > Pros: If filepath is altered, that won't affect the counter.
> 
> > Cons: If file's contents gets updated the, then duplicity will happen.
> 
> >
> 
> >
> 
> >> 3) Keep an external list of filenames, and their associated id numbers.
> 
> >> The database would be a good place to store such a list, in a separate table.
> 
> >
> 
> > I did not understand that solution.
> 
> >
> 
> >
> 
> > We need to find a way so even IF:
> 
> >
> 
> > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.
> 
> >
> 
> 
> 
> You don't yet have a programming problem, you have a specification 
> 
> problem.  Somehow, you want a file to be considered "the same" even when 
> 
> it's moved, renamed and/or modified.  So all files are the same, and you 
> 
> only need one id.
> 
> Don't pick a mechanism until you have an self-consistent spec.


I do have the specification.

An .html page must retain its database counter value even if its:

(renamed && moved && contents altered)


[original attributes of the file]:

filename: index.html
filepath: /home/nikos/public_html/
contents: <html> Hello </html>

[get modified to]:

filename: index2.html
filepath: /home/nikos/public_html/folder/subfolder/
contents: <html> Hello, people </html>


The file is still the same, even though its attributes got modified.
We want counter.py script to still be able to "identify" the .html page, hence its counter value in order to get increased properly.

[toc] | [prev] | [next] | [standalone]

#37163

From	Chris Angelico <rosuav@gmail.com>
Date	2013-01-21 18:20 +1100
Message-ID	<mailman.729.1358752818.2939.python-list@python.org>
In reply to	#37161

On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> An .html page must retain its database counter value even if its:
>
> (renamed && moved && contents altered)

Then you either need to tag them in some external way, or have some
kind of tracking operation - for instance, if you require that all
renames/moves be done through a script, that script can update its
pointer. Otherwise, you need magic, and lots of it.

ChrisA

[toc] | [prev] | [next] | [standalone]

#37172

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-21 01:19 -0800
Message-ID	<187d77e0-e948-46bf-acc5-668c446cf3aa@googlegroups.com>
In reply to	#37163

Τη Δευτέρα, 21 Ιανουαρίου 2013 9:20:15 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> 
> > An .html page must retain its database counter value even if its:
> 
> >
> 
> > (renamed && moved && contents altered)
> 
> 
> 
> Then you either need to tag them in some external way, or have some
> 
> kind of tracking operation - for instance, if you require that all
> 
> renames/moves be done through a script, that script can update its
> 
> pointer. Otherwise, you need magic, and lots of it.
> 
> 
> 
> ChrisA

This python script acts upon websites other people use and
every html templates has been written by different methods(notepad++, dreamweaver, joomla).

Renames and  moves are performed, either by shell access or either by cPanel access by website owners.

That being said i have no control on HOW and WHEN users alter their html pages.

[toc] | [prev] | [next] | [standalone]

#37173

From	Chris Angelico <rosuav@gmail.com>
Date	2013-01-21 20:31 +1100
Message-ID	<mailman.733.1358760692.2939.python-list@python.org>
In reply to	#37172

On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> This python script acts upon websites other people use and
> every html templates has been written by different methods(notepad++, dreamweaver, joomla).
>
> Renames and  moves are performed, either by shell access or either by cPanel access by website owners.
>
> That being said i have no control on HOW and WHEN users alter their html pages.

Then I recommend investing in some magic. There's an old-established
business JW Wells & Co, Family Sorcerers. They've a first-rate
assortment of magic, and for raising a posthumous shade with effects
that are comic, or tragic, there's no cheaper house in the trade! If
anyone anything lacks, he'll find it all ready in stacks, if he'll
only look in on the resident Djinn, number seventy, Simmery Axe!

Seriously, you're asking for something that's beyond the power of
humans or computers. You want to identify that something's the same
file, without tracking the change or having any identifiable tag.
That's a fundamentally impossible task.

ChrisA

[toc] | [prev] | [next] | [standalone]

#37181

From	Ferrous Cranus <nikos.gr33k@gmail.com>
Date	2013-01-21 04:06 -0800
Message-ID	<239abe33-fa5b-41a9-ae80-5260b9b1bd9c@googlegroups.com>
In reply to	#37173

Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> 
> > This python script acts upon websites other people use and
> 
> > every html templates has been written by different methods(notepad++, dreamweaver, joomla).
> 
> >
> 
> > Renames and  moves are performed, either by shell access or either by cPanel access by website owners.
> 
> >
> 
> > That being said i have no control on HOW and WHEN users alter their html pages.
> 
> 
> 
> Then I recommend investing in some magic. There's an old-established
> 
> business JW Wells & Co, Family Sorcerers. They've a first-rate
> 
> assortment of magic, and for raising a posthumous shade with effects
> 
> that are comic, or tragic, there's no cheaper house in the trade! If
> 
> anyone anything lacks, he'll find it all ready in stacks, if he'll
> 
> only look in on the resident Djinn, number seventy, Simmery Axe!
> 
> 
> 
> Seriously, you're asking for something that's beyond the power of
> 
> humans or computers. You want to identify that something's the same
> 
> file, without tracking the change or having any identifiable tag.
> 
> That's a fundamentally impossible task.

No, it is difficult but not impossible.
It just cannot be done by tagging the file by:

1. filename
2. filepath
3. hash (math algorithm producing a string based on the file's contents)

We need another way to identify the file WITHOUT using the above attributes.

[toc] | [prev] | [next] | [standalone]

#37185

From	Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date	2013-01-21 12:39 +0000
Message-ID	<mailman.742.1358771967.2939.python-list@python.org>
In reply to	#37181

On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
>>
>> Seriously, you're asking for something that's beyond the power of
>> humans or computers. You want to identify that something's the same
>> file, without tracking the change or having any identifiable tag.
>>
>> That's a fundamentally impossible task.
>
> No, it is difficult but not impossible.
> It just cannot be done by tagging the file by:
>
> 1. filename
> 2. filepath
> 3. hash (math algorithm producing a string based on the file's contents)
>
> We need another way to identify the file WITHOUT using the above attributes.

This is a very old problem (still unsolved I believe):
http://en.wikipedia.org/wiki/Ship_of_Theseus


Oscar

[toc] | [prev] | [next] | [standalone]

#37186

From	alex23 <wuwei23@gmail.com>
Date	2013-01-21 04:55 -0800
Message-ID	<d8eadd42-599c-4d2b-9111-8f39f59dfe66@t6g2000pba.googlegroups.com>
In reply to	#37185

On Jan 21, 10:39 pm, Oscar Benjamin <oscar.j.benja...@gmail.com>
wrote:
> This is a very old problem (still unsolved I believe):http://en.wikipedia.org/wiki/Ship_of_Theseus

+1 internets for referencing my most favourite thought experiment
ever :)

[toc] | [prev] | [next] | [standalone]

#37243

From	rusi <rustompmody@gmail.com>
Date	2013-01-21 19:24 -0800
Message-ID	<34a47f4b-af49-49df-b0f2-f275c4b99f36@ui9g2000pbc.googlegroups.com>
In reply to	#37186

On Jan 21, 5:55 pm, alex23 <wuwe...@gmail.com> wrote:
> On Jan 21, 10:39 pm, Oscar Benjamin <oscar.j.benja...@gmail.com>
> wrote:
>
> > This is a very old problem (still unsolved I believe):http://en.wikipedia.org/wiki/Ship_of_Theseus
>
> +1 internets for referencing my most favourite thought experiment
> ever :)

+2 Oscar for giving me this name.

A more apposite (to computers) experience:

Ive a computer whose OS I wanted to upgrade without disturbing the
existing setup. Decided to fit a new hard disk with a new OS.
Installed the OS on a new hard disk, fitted the new hard disk into the
old computer and rebooted.

The messages that started coming were: New Hardware detected: monitor,
mouse, network card etc etc. but not new disk!

Strange! The only one thing new is not seen as new but all the old
things are seen as new.

So…
Ask a layman whats a computer and he'll point to the box and call it
'CPU'.
Ask a more computer literate person and he'll point to the chip inside
the box and say 'CPU'
Ask the computer itself and it says 'Disk'.

Moral:
Object identity is at best hard -- usually unsolvable

[toc] | [prev] | [next] | [standalone]

#37249

From	Chris Angelico <rosuav@gmail.com>
Date	2013-01-22 15:39 +1100
Message-ID	<mailman.777.1358829588.2939.python-list@python.org>
In reply to	#37243

On Tue, Jan 22, 2013 at 2:24 PM, rusi <rustompmody@gmail.com> wrote:
> Ive a computer whose OS I wanted to upgrade without disturbing the
> existing setup. Decided to fit a new hard disk with a new OS.
> Installed the OS on a new hard disk, fitted the new hard disk into the
> old computer and rebooted.
>
> The messages that started coming were: New Hardware detected: monitor,
> mouse, network card etc etc. but not new disk!
>
> Strange! The only one thing new is not seen as new but all the old
> things are seen as new.

That's because you asked the OS to look at the computer, and the OS
was on the disk. So in that sense, you did give it a whole lot of new
hardware but not a new disk. However, Windows Product Activation would
probably have called that a new computer, meaning that Microsoft deems
it to be new. (I've no idea about other non-free systems. Free systems
don't care about new computer vs same computer, of course.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#37230

From	Tom P <werotizy@freent.dd>
Date	2013-01-22 00:01 +0100
Message-ID	<am5vmjFc7geU1@mid.individual.net>
In reply to	#37185

On 01/21/2013 01:39 PM, Oscar Benjamin wrote:
> On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
>>>
>>> Seriously, you're asking for something that's beyond the power of
>>> humans or computers. You want to identify that something's the same
>>> file, without tracking the change or having any identifiable tag.
>>>
>>> That's a fundamentally impossible task.
>>
>> No, it is difficult but not impossible.
>> It just cannot be done by tagging the file by:
>>
>> 1. filename
>> 2. filepath
>> 3. hash (math algorithm producing a string based on the file's contents)
>>
>> We need another way to identify the file WITHOUT using the above attributes.
>
> This is a very old problem (still unsolved I believe):
> http://en.wikipedia.org/wiki/Ship_of_Theseus
>
>
> Oscar
>
That wiki article gives a hint to a poosible solution -use a timestamp 
to determine which key is valid when.

[toc] | [prev] | [next] | [standalone]

#37236

From	Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date	2013-01-21 23:43 +0000
Message-ID	<mailman.770.1358811802.2939.python-list@python.org>
In reply to	#37230

On 21 January 2013 23:01, Tom P <werotizy@freent.dd> wrote:
> On 01/21/2013 01:39 PM, Oscar Benjamin wrote:
>>
>> On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>>>
>>> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris
>>> Angelico έγραψε:
>>>>
>>>>
>>>> Seriously, you're asking for something that's beyond the power of
>>>> humans or computers. You want to identify that something's the same
>>>> file, without tracking the change or having any identifiable tag.
>>>>
>>>> That's a fundamentally impossible task.
>>>
>>>
>>> No, it is difficult but not impossible.
>>> It just cannot be done by tagging the file by:
>>>
>>> 1. filename
>>> 2. filepath
>>> 3. hash (math algorithm producing a string based on the file's contents)
>>>
>>> We need another way to identify the file WITHOUT using the above
>>> attributes.
>>
>>
>> This is a very old problem (still unsolved I believe):
>> http://en.wikipedia.org/wiki/Ship_of_Theseus
>>
> That wiki article gives a hint to a poosible solution -use a timestamp to
> determine which key is valid when.

In the Ship of Theseus, it is only argued that it is the same ship
because people were aware of the incremental changes that took place
along the way. The same applies here: if you don't track the
incremental changes and the two files have nothing concrete in common,
what does it mean to say that a file is "the same file" as some older
file?

That being said, I've always been impressed with the way that git can
understand when I think that a file is the same as some older file
(though it does sometimes go wrong):

~/tmp$ git init
Initialized empty Git repository in /home/oscar/tmp/.git/
~/tmp$ vim old.py
~/tmp$ cat old.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")
~/tmp$ git add old.py
~/tmp$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#   new file:   old.py
#
~/tmp$ git commit
[master (root-commit) 8e91665] First commit
 1 file changed, 4 insertions(+)
 create mode 100644 old.py
~/tmp$ ls
old.py
~/tmp$ cat old.py > new.py
~/tmp$ rm old.py
~/tmp$ vim new.py
~/tmp$ cat new.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")

print("Although, I've edited it somewhat, it's still useless")
~/tmp$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   deleted:    old.py
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   new.py
no changes added to commit (use "git add" and/or "git commit -a")
~/tmp$ git add -A .
~/tmp$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    old.py -> new.py
#

So it *is* Theseus' ship!

Oscar

[toc] | [prev] | [next] | [standalone]

#37237

From	Chris Angelico <rosuav@gmail.com>
Date	2013-01-22 11:04 +1100
Message-ID	<mailman.772.1358813081.2939.python-list@python.org>
In reply to	#37230

On Tue, Jan 22, 2013 at 10:43 AM, Oscar Benjamin
<oscar.j.benjamin@gmail.com> wrote:
> On 21 January 2013 23:01, Tom P <werotizy@freent.dd> wrote:
>> On 01/21/2013 01:39 PM, Oscar Benjamin wrote:
>>> This is a very old problem (still unsolved I believe):
>>> http://en.wikipedia.org/wiki/Ship_of_Theseus
>>>
>> That wiki article gives a hint to a poosible solution -use a timestamp to
>> determine which key is valid when.
>
> In the Ship of Theseus, it is only argued that it is the same ship
> because people were aware of the incremental changes that took place
> along the way. The same applies here: if you don't track the
> incremental changes and the two files have nothing concrete in common,
> what does it mean to say that a file is "the same file" as some older
> file?
>
> That being said, I've always been impressed with the way that git can
> understand when I think that a file is the same as some older file
> (though it does sometimes go wrong):

Yeah, git's awesome like that :) It looks at file similarity, though,
so if you completely rewrite a file and simultaneously rename/move it,
git will lose track of it. And as you say, sometimes it gets things
wrong - if you merge a large file into a small one, git will report it
as a deletion and rename. (Of course, it doesn't make any difference.
It's just a matter of reporting.) Mercurial, if I understand
correctly, actually _tracks_ moves (and copies), but git just records
a deletion and a creation.

My family in fact has a literal "grandfather's axe" (except that I
don't think either of my grandfathers actually owned it, but it's my
Dad's old axe) that has had many new handles and a couple of new
heads. Bringing it back to computers, we have on our network two
computers "Stanley" and "Ollie" that have been there ever since we
first set up that network. Back then, it was coax cable, 10base2, no
routers/switches/etc, and the computers were I think early Pentiums.
We installed the database on one of them, and set the other in Dad's
office. Today, we have a modern Ethernet setup with modern hardware
and cat-5 cable; we still have Stanley with the database and Ollie in
the office. The name/identity of the computer is mostly associated
with its roles; but those roles can shift too (there was a time when
Ollie was the internet gateway, but that's no longer the case).
Identity is its own attribute.

The problem isn't that identity can't exist. It's that it can't be
discovered. That takes external knowledge. Dave's analogy is accurate.

ChrisA

[toc] | [prev] | [next] | [standalone]

#37395

From	alex23 <wuwei23@gmail.com>
Date	2013-01-22 17:36 -0800
Message-ID	<45ef0f07-505e-4986-bc99-bd0ce86c101c@q16g2000pbt.googlegroups.com>
In reply to	#37237

On Jan 22, 10:04 am, Chris Angelico <ros...@gmail.com> wrote:
> My family in fact has a literal "grandfather's axe" (except that I
> don't think either of my grandfathers actually owned it, but it's my
> Dad's old axe) that has had many new handles and a couple of new
> heads.

Ah, that's brilliant, I hadn't heard that term before, and it'll be a
lot easier to explain to people than the Theseus example.

How we think of identity is _awesome_ :)

[toc] | [prev] | [next] | [standalone]

#37194

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-01-21 07:47 -0500
Message-ID	<mailman.743.1358772477.2939.python-list@python.org>
In reply to	#37181

[Multipart message — attachments visible in raw view] — view raw

This is trolling Ferrous.  you are a troll.  Go away


On Mon, Jan 21, 2013 at 7:39 AM, Oscar Benjamin
<oscar.j.benjamin@gmail.com>wrote:

> On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
> > Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris
> Angelico έγραψε:
> >>
> >> Seriously, you're asking for something that's beyond the power of
> >> humans or computers. You want to identify that something's the same
> >> file, without tracking the change or having any identifiable tag.
> >>
> >> That's a fundamentally impossible task.
> >
> > No, it is difficult but not impossible.
> > It just cannot be done by tagging the file by:
> >
> > 1. filename
> > 2. filepath
> > 3. hash (math algorithm producing a string based on the file's contents)
> >
> > We need another way to identify the file WITHOUT using the above
> attributes.
>
> This is a very old problem (still unsolved I believe):
> http://en.wikipedia.org/wiki/Ship_of_Theseus
>
>
> Oscar
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]

Page 1 of 4 [1] 2 3 4 Next page →

csiph-web

Uniquely identifying each & every html template

Contents

#37026 — Uniquely identifying each & every html template

#37029

#37035

#37034

#37073

#37076

#37161

#37163

#37172

#37173

#37181

#37185

#37186

#37243

#37249

#37230

#37236

#37237

#37395

#37194