Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #37026 > unrolled thread
| Started by | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| First post | 2013-01-18 12:48 -0800 |
| Last post | 2013-01-19 00:39 -0800 |
| Articles | 20 on this page of 62 — 14 participants |
Back to article view | Back to comp.lang.python
Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-18 12:48 -0800
Re: Uniquely identifying each & every html template John Gordon <gordon@panix.com> - 2013-01-18 20:59 +0000
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-18 14:12 -0800
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-18 17:09 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-19 00:39 -0800
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-19 04:00 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 23:08 -0800
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-21 18:20 +1100
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 01:19 -0800
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-21 20:31 +1100
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 04:06 -0800
Re: Uniquely identifying each & every html template Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-21 12:39 +0000
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 04:55 -0800
Re: Uniquely identifying each & every html template rusi <rustompmody@gmail.com> - 2013-01-21 19:24 -0800
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-22 15:39 +1100
Re: Uniquely identifying each & every html template Tom P <werotizy@freent.dd> - 2013-01-22 00:01 +0100
Re: Uniquely identifying each & every html template Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-21 23:43 +0000
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-22 11:04 +1100
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-22 17:36 -0800
Re: Uniquely identifying each & every html template Joel Goldstick <joel.goldstick@gmail.com> - 2013-01-21 07:47 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:00 -0800
Re: Uniquely identifying each & every html template Michael Torrie <torriem@gmail.com> - 2013-01-22 16:55 -0700
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:12 -0800
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-23 01:37 -0800
Re: Uniquely identifying each & every html template Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-01-23 09:49 +0000
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 02:29 -0800
Re: Uniquely identifying each & every html template Joel Goldstick <joel.goldstick@gmail.com> - 2013-01-23 07:03 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 04:26 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 04:26 -0800
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-23 06:25 -0800
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 07:38 -0500
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-24 10:25 +1100
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 19:09 -0500
Re: Uniquely identifying each & every html template Chris Angelico <rosuav@gmail.com> - 2013-01-24 11:39 +1100
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-23 19:53 -0500
Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-23 16:01 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:12 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:00 -0800
Re: Uniquely identifying each & every html template Dave Angel <d@davea.name> - 2013-01-21 17:26 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 04:06 -0800
Re: Uniquely identifying each & every html template Tim Roberts <timr@probo.com> - 2013-01-21 19:57 -0800
Re: Uniquely identifying each & every html template Tim Roberts <timr@probo.com> - 2013-01-21 20:04 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 22:49 -0800
Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-22 16:08 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:15 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-23 01:15 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 01:19 -0800
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 04:56 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:03 -0800
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 15:35 -0800
Re: Uniquely identifying each & every html template Piet van Oostrum <piet@vanoostrum.org> - 2013-01-21 21:48 +0100
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 22:38 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:07 -0800
Re: Uniquely identifying each & every html template alex23 <wuwei23@gmail.com> - 2013-01-21 15:36 -0800
Re: Uniquely identifying each & every html template rusi <rustompmody@gmail.com> - 2013-01-21 20:18 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-21 07:07 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 23:08 -0800
Re: Uniquely identifying each & every html template Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-01-19 16:32 -0500
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 22:52 -0800
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-20 22:52 -0800
Re: Uniquely identifying each & every html template John Gordon <gordon@panix.com> - 2013-01-22 16:35 +0000
Re: Uniquely identifying each & every html template Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-01-19 00:39 -0800
Page 1 of 4 [1] 2 3 4 Next page →
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-18 12:48 -0800 |
| Subject | Uniquely identifying each & every html template |
| Message-ID | <8deb6f5d-ff10-4b36-bdd6-36f9eed58e1e@googlegroups.com> |
I use this .htaccess file to rewrite every .html request to counter.py
# =================================================================================================================
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
# =================================================================================================================
counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there
# =================================================================================================================
# open current html template and get the page ID number
# =================================================================================================================
f = open( '/home/nikos/public_html/' + page )
# read first line of the file
firstline = f.readline()
# find the ID of the file and store it
pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
# =================================================================================================================
It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
What is the problem you ask?!
Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
index.html <!-- 1 -->
somefile.html <!-- 2-->
other.html <!-- 3 -->
nikos.html <!-- 4 -->
cool.html <!-- 5 -->
to HELP counter.py identify each webpage at a unique way.
Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.
My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.
or perhaps by modifying them a bit..... but in an automatic way....?
Thank you ALL in advance.
[toc] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-01-18 20:59 +0000 |
| Message-ID | <kdcd35$pri$1@reader1.panix.com> |
| In reply to | #37026 |
In <8deb6f5d-ff10-4b36-bdd6-36f9eed58e1e@googlegroups.com> Ferrous Cranus <nikos.gr33k@gmail.com> writes:
> Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
> index.html <!-- 1 -->
> somefile.html <!-- 2-->
> other.html <!-- 3 -->
> nikos.html <!-- 4 -->
> cool.html <!-- 5 -->
> to HELP counter.py identify each webpage at a unique way.
Instead of inserting unique content in every page, can you use the
document path itself as the identifier?
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-18 14:12 -0800 |
| Message-ID | <56802884-9a7d-4aae-b080-115a300b1023@googlegroups.com> |
| In reply to | #37029 |
Τη Παρασκευή, 18 Ιανουαρίου 2013 10:59:17 μ.μ. UTC+2, ο χρήστης John Gordon έγραψε: > Instead of inserting unique content in every page, can't you use the > document path itself as the identifier? No, i cannot, becaue it would mess things at later time when i for example: 1. mv name.html othername.html (document's filename altered) 2. mv name.html /subfolder/name.html (document's path altered) Hence, new database counters will be created for each of the above cases.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2013-01-18 17:09 -0500 |
| Message-ID | <mailman.655.1358546994.2939.python-list@python.org> |
| In reply to | #37026 |
On 01/18/2013 03:48 PM, Ferrous Cranus wrote:
> I use this .htaccess file to rewrite every .html request to counter.py
>
> # =================================================================================================================
> RewriteEngine On
> RewriteCond %{REQUEST_FILENAME} -f
> RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
> # =================================================================================================================
>
>
>
> counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
> It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there
>
> # =================================================================================================================
> # open current html template and get the page ID number
> # =================================================================================================================
> f = open( '/home/nikos/public_html/' + page )
>
> # read first line of the file
> firstline = f.readline()
>
> # find the ID of the file and store it
> pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
> # =================================================================================================================
>
> It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
>
> What is the problem you ask?!
> Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
>
> index.html <!-- 1 -->
> somefile.html <!-- 2-->
> other.html <!-- 3 -->
> nikos.html <!-- 4 -->
> cool.html <!-- 5 -->
>
> to HELP counter.py identify each webpage at a unique way.
>
> Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
> Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
> Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.
>
> My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.
>
> or perhaps by modifying them a bit..... but in an automatic way....?
>
> Thank you ALL in advance.
>
>
I don't understand the problem. A trivial Python script could scan
through all the files in the directory, checking which ones are missing
the identifier, and rewriting the file with the identifier added.
So, since you didn't come to that conclusion, there must be some other
reason you don't want to edit the files. Is it that the real sources
are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
sources, these files get replaced (without identifiers)?
If that's the case, then I figure you have about 3 choices:
1) use the file path as your key, instead of requiring a number
2) use a hash of the page (eg. md5) as your key. of course this could
mean that you get a new value whenever the page is updated. That's good
in many situations, but you don't give enough information to know if
that's desirable for you or not.
3) Keep an external list of filenames, and their associated id numbers.
The database would be a good place to store such a list, in a separate
table.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-19 00:39 -0800 |
| Message-ID | <5dd4babd-716d-4542-ad36-e6a841b73ec3@googlegroups.com> |
| In reply to | #37034 |
Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε: > I don't understand the problem. A trivial Python script could scan > > through all the files in the directory, checking which ones are missing > > the identifier, and rewriting the file with the identifier added. > > So, since you didn't come to that conclusion, there must be some other > > reason you don't want to edit the files. Is it that the real sources > > are elsewhere (e.g. Dreamweaver), and whenever one recompiles those > > sources, these files get replaced (without identifiers)? Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical. > If that's the case, then I figure you have about 3 choices: > 1) use the file path as your key, instead of requiring a number No, i cannot, because it would mess things at a later time on when i for example: 1. mv name.html othername.html (document's filename altered) 2. mv name.html /subfolder/name.html (document's filepath altered) Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value. Pros: If the file's contents gets updated, that won't affect the counter. Cons: If filepath is altered, then duplicity will happen. > 2) use a hash of the page (eg. md5) as your key. of course this could > mean that you get a new value whenever the page is updated. That's good > in many situations, but you don't give enough information to know if > that's desirable for you or not. That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution. Pros: If filepath is altered, that won't affect the counter. Cons: If file's contents gets updated the, then duplicity will happen. > 3) Keep an external list of filenames, and their associated id numbers. > The database would be a good place to store such a list, in a separate table. I did not understand that solution. We need to find a way so even IF: (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2013-01-19 04:00 -0500 |
| Message-ID | <mailman.684.1358586035.2939.python-list@python.org> |
| In reply to | #37073 |
On 01/19/2013 03:39 AM, Ferrous Cranus wrote: > Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε: > >> I don't understand the problem. A trivial Python script could scan >> >> through all the files in the directory, checking which ones are missing >> >> the identifier, and rewriting the file with the identifier added. > >> >> So, since you didn't come to that conclusion, there must be some other >> >> reason you don't want to edit the files. Is it that the real sources >> >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those >> >> sources, these files get replaced (without identifiers)? > > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical. > > >> If that's the case, then I figure you have about 3 choices: >> 1) use the file path as your key, instead of requiring a number > > No, i cannot, because it would mess things at a later time on when i for example: > > 1. mv name.html othername.html (document's filename altered) > 2. mv name.html /subfolder/name.html (document's filepath altered) > > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value. > > Pros: If the file's contents gets updated, that won't affect the counter. > Cons: If filepath is altered, then duplicity will happen. > > >> 2) use a hash of the page (eg. md5) as your key. of course this could >> mean that you get a new value whenever the page is updated. That's good >> in many situations, but you don't give enough information to know if >> that's desirable for you or not. > > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution. > > Pros: If filepath is altered, that won't affect the counter. > Cons: If file's contents gets updated the, then duplicity will happen. > > >> 3) Keep an external list of filenames, and their associated id numbers. >> The database would be a good place to store such a list, in a separate table. > > I did not understand that solution. > > > We need to find a way so even IF: > > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value. > You don't yet have a programming problem, you have a specification problem. Somehow, you want a file to be considered "the same" even when it's moved, renamed and/or modified. So all files are the same, and you only need one id. Don't pick a mechanism until you have an self-consistent spec. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-20 23:08 -0800 |
| Message-ID | <03581a24-9330-4019-bde9-61a607000d3d@googlegroups.com> |
| In reply to | #37076 |
Τη Σάββατο, 19 Ιανουαρίου 2013 11:00:15 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε: > On 01/19/2013 03:39 AM, Ferrous Cranus wrote: > > > Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε: > > > > > >> I don't understand the problem. A trivial Python script could scan > > >> > > >> through all the files in the directory, checking which ones are missing > > >> > > >> the identifier, and rewriting the file with the identifier added. > > > > > >> > > >> So, since you didn't come to that conclusion, there must be some other > > >> > > >> reason you don't want to edit the files. Is it that the real sources > > >> > > >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those > > >> > > >> sources, these files get replaced (without identifiers)? > > > > > > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical. > > > > > > > > >> If that's the case, then I figure you have about 3 choices: > > >> 1) use the file path as your key, instead of requiring a number > > > > > > No, i cannot, because it would mess things at a later time on when i for example: > > > > > > 1. mv name.html othername.html (document's filename altered) > > > 2. mv name.html /subfolder/name.html (document's filepath altered) > > > > > > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value. > > > > > > Pros: If the file's contents gets updated, that won't affect the counter. > > > Cons: If filepath is altered, then duplicity will happen. > > > > > > > > >> 2) use a hash of the page (eg. md5) as your key. of course this could > > >> mean that you get a new value whenever the page is updated. That's good > > >> in many situations, but you don't give enough information to know if > > >> that's desirable for you or not. > > > > > > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution. > > > > > > Pros: If filepath is altered, that won't affect the counter. > > > Cons: If file's contents gets updated the, then duplicity will happen. > > > > > > > > >> 3) Keep an external list of filenames, and their associated id numbers. > > >> The database would be a good place to store such a list, in a separate table. > > > > > > I did not understand that solution. > > > > > > > > > We need to find a way so even IF: > > > > > > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value. > > > > > > > You don't yet have a programming problem, you have a specification > > problem. Somehow, you want a file to be considered "the same" even when > > it's moved, renamed and/or modified. So all files are the same, and you > > only need one id. > > Don't pick a mechanism until you have an self-consistent spec. I do have the specification. An .html page must retain its database counter value even if its: (renamed && moved && contents altered) [original attributes of the file]: filename: index.html filepath: /home/nikos/public_html/ contents: <html> Hello </html> [get modified to]: filename: index2.html filepath: /home/nikos/public_html/folder/subfolder/ contents: <html> Hello, people </html> The file is still the same, even though its attributes got modified. We want counter.py script to still be able to "identify" the .html page, hence its counter value in order to get increased properly.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-21 18:20 +1100 |
| Message-ID | <mailman.729.1358752818.2939.python-list@python.org> |
| In reply to | #37161 |
On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > An .html page must retain its database counter value even if its: > > (renamed && moved && contents altered) Then you either need to tag them in some external way, or have some kind of tracking operation - for instance, if you require that all renames/moves be done through a script, that script can update its pointer. Otherwise, you need magic, and lots of it. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-21 01:19 -0800 |
| Message-ID | <187d77e0-e948-46bf-acc5-668c446cf3aa@googlegroups.com> |
| In reply to | #37163 |
Τη Δευτέρα, 21 Ιανουαρίου 2013 9:20:15 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε: > On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > > > An .html page must retain its database counter value even if its: > > > > > > (renamed && moved && contents altered) > > > > Then you either need to tag them in some external way, or have some > > kind of tracking operation - for instance, if you require that all > > renames/moves be done through a script, that script can update its > > pointer. Otherwise, you need magic, and lots of it. > > > > ChrisA This python script acts upon websites other people use and every html templates has been written by different methods(notepad++, dreamweaver, joomla). Renames and moves are performed, either by shell access or either by cPanel access by website owners. That being said i have no control on HOW and WHEN users alter their html pages.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-21 20:31 +1100 |
| Message-ID | <mailman.733.1358760692.2939.python-list@python.org> |
| In reply to | #37172 |
On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > This python script acts upon websites other people use and > every html templates has been written by different methods(notepad++, dreamweaver, joomla). > > Renames and moves are performed, either by shell access or either by cPanel access by website owners. > > That being said i have no control on HOW and WHEN users alter their html pages. Then I recommend investing in some magic. There's an old-established business JW Wells & Co, Family Sorcerers. They've a first-rate assortment of magic, and for raising a posthumous shade with effects that are comic, or tragic, there's no cheaper house in the trade! If anyone anything lacks, he'll find it all ready in stacks, if he'll only look in on the resident Djinn, number seventy, Simmery Axe! Seriously, you're asking for something that's beyond the power of humans or computers. You want to identify that something's the same file, without tracking the change or having any identifiable tag. That's a fundamentally impossible task. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ferrous Cranus <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-01-21 04:06 -0800 |
| Message-ID | <239abe33-fa5b-41a9-ae80-5260b9b1bd9c@googlegroups.com> |
| In reply to | #37173 |
Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε: > On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > > > This python script acts upon websites other people use and > > > every html templates has been written by different methods(notepad++, dreamweaver, joomla). > > > > > > Renames and moves are performed, either by shell access or either by cPanel access by website owners. > > > > > > That being said i have no control on HOW and WHEN users alter their html pages. > > > > Then I recommend investing in some magic. There's an old-established > > business JW Wells & Co, Family Sorcerers. They've a first-rate > > assortment of magic, and for raising a posthumous shade with effects > > that are comic, or tragic, there's no cheaper house in the trade! If > > anyone anything lacks, he'll find it all ready in stacks, if he'll > > only look in on the resident Djinn, number seventy, Simmery Axe! > > > > Seriously, you're asking for something that's beyond the power of > > humans or computers. You want to identify that something's the same > > file, without tracking the change or having any identifiable tag. > > That's a fundamentally impossible task. No, it is difficult but not impossible. It just cannot be done by tagging the file by: 1. filename 2. filepath 3. hash (math algorithm producing a string based on the file's contents) We need another way to identify the file WITHOUT using the above attributes.
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2013-01-21 12:39 +0000 |
| Message-ID | <mailman.742.1358771967.2939.python-list@python.org> |
| In reply to | #37181 |
On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε: >> >> Seriously, you're asking for something that's beyond the power of >> humans or computers. You want to identify that something's the same >> file, without tracking the change or having any identifiable tag. >> >> That's a fundamentally impossible task. > > No, it is difficult but not impossible. > It just cannot be done by tagging the file by: > > 1. filename > 2. filepath > 3. hash (math algorithm producing a string based on the file's contents) > > We need another way to identify the file WITHOUT using the above attributes. This is a very old problem (still unsolved I believe): http://en.wikipedia.org/wiki/Ship_of_Theseus Oscar
[toc] | [prev] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2013-01-21 04:55 -0800 |
| Message-ID | <d8eadd42-599c-4d2b-9111-8f39f59dfe66@t6g2000pba.googlegroups.com> |
| In reply to | #37185 |
On Jan 21, 10:39 pm, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > This is a very old problem (still unsolved I believe):http://en.wikipedia.org/wiki/Ship_of_Theseus +1 internets for referencing my most favourite thought experiment ever :)
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-01-21 19:24 -0800 |
| Message-ID | <34a47f4b-af49-49df-b0f2-f275c4b99f36@ui9g2000pbc.googlegroups.com> |
| In reply to | #37186 |
On Jan 21, 5:55 pm, alex23 <wuwe...@gmail.com> wrote: > On Jan 21, 10:39 pm, Oscar Benjamin <oscar.j.benja...@gmail.com> > wrote: > > > This is a very old problem (still unsolved I believe):http://en.wikipedia.org/wiki/Ship_of_Theseus > > +1 internets for referencing my most favourite thought experiment > ever :) +2 Oscar for giving me this name. A more apposite (to computers) experience: Ive a computer whose OS I wanted to upgrade without disturbing the existing setup. Decided to fit a new hard disk with a new OS. Installed the OS on a new hard disk, fitted the new hard disk into the old computer and rebooted. The messages that started coming were: New Hardware detected: monitor, mouse, network card etc etc. but not new disk! Strange! The only one thing new is not seen as new but all the old things are seen as new. So… Ask a layman whats a computer and he'll point to the box and call it 'CPU'. Ask a more computer literate person and he'll point to the chip inside the box and say 'CPU' Ask the computer itself and it says 'Disk'. Moral: Object identity is at best hard -- usually unsolvable
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-22 15:39 +1100 |
| Message-ID | <mailman.777.1358829588.2939.python-list@python.org> |
| In reply to | #37243 |
On Tue, Jan 22, 2013 at 2:24 PM, rusi <rustompmody@gmail.com> wrote: > Ive a computer whose OS I wanted to upgrade without disturbing the > existing setup. Decided to fit a new hard disk with a new OS. > Installed the OS on a new hard disk, fitted the new hard disk into the > old computer and rebooted. > > The messages that started coming were: New Hardware detected: monitor, > mouse, network card etc etc. but not new disk! > > Strange! The only one thing new is not seen as new but all the old > things are seen as new. That's because you asked the OS to look at the computer, and the OS was on the disk. So in that sense, you did give it a whole lot of new hardware but not a new disk. However, Windows Product Activation would probably have called that a new computer, meaning that Microsoft deems it to be new. (I've no idea about other non-free systems. Free systems don't care about new computer vs same computer, of course.) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Tom P <werotizy@freent.dd> |
|---|---|
| Date | 2013-01-22 00:01 +0100 |
| Message-ID | <am5vmjFc7geU1@mid.individual.net> |
| In reply to | #37185 |
On 01/21/2013 01:39 PM, Oscar Benjamin wrote: > On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: >> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris Angelico έγραψε: >>> >>> Seriously, you're asking for something that's beyond the power of >>> humans or computers. You want to identify that something's the same >>> file, without tracking the change or having any identifiable tag. >>> >>> That's a fundamentally impossible task. >> >> No, it is difficult but not impossible. >> It just cannot be done by tagging the file by: >> >> 1. filename >> 2. filepath >> 3. hash (math algorithm producing a string based on the file's contents) >> >> We need another way to identify the file WITHOUT using the above attributes. > > This is a very old problem (still unsolved I believe): > http://en.wikipedia.org/wiki/Ship_of_Theseus > > > Oscar > That wiki article gives a hint to a poosible solution -use a timestamp to determine which key is valid when.
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2013-01-21 23:43 +0000 |
| Message-ID | <mailman.770.1358811802.2939.python-list@python.org> |
| In reply to | #37230 |
On 21 January 2013 23:01, Tom P <werotizy@freent.dd> wrote:
> On 01/21/2013 01:39 PM, Oscar Benjamin wrote:
>>
>> On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote:
>>>
>>> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris
>>> Angelico έγραψε:
>>>>
>>>>
>>>> Seriously, you're asking for something that's beyond the power of
>>>> humans or computers. You want to identify that something's the same
>>>> file, without tracking the change or having any identifiable tag.
>>>>
>>>> That's a fundamentally impossible task.
>>>
>>>
>>> No, it is difficult but not impossible.
>>> It just cannot be done by tagging the file by:
>>>
>>> 1. filename
>>> 2. filepath
>>> 3. hash (math algorithm producing a string based on the file's contents)
>>>
>>> We need another way to identify the file WITHOUT using the above
>>> attributes.
>>
>>
>> This is a very old problem (still unsolved I believe):
>> http://en.wikipedia.org/wiki/Ship_of_Theseus
>>
> That wiki article gives a hint to a poosible solution -use a timestamp to
> determine which key is valid when.
In the Ship of Theseus, it is only argued that it is the same ship
because people were aware of the incremental changes that took place
along the way. The same applies here: if you don't track the
incremental changes and the two files have nothing concrete in common,
what does it mean to say that a file is "the same file" as some older
file?
That being said, I've always been impressed with the way that git can
understand when I think that a file is the same as some older file
(though it does sometimes go wrong):
~/tmp$ git init
Initialized empty Git repository in /home/oscar/tmp/.git/
~/tmp$ vim old.py
~/tmp$ cat old.py
#!/usr/bin/env python
print('This is a fairly useless script.')
print("Maybe I'll improve it later...")
~/tmp$ git add old.py
~/tmp$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: old.py
#
~/tmp$ git commit
[master (root-commit) 8e91665] First commit
1 file changed, 4 insertions(+)
create mode 100644 old.py
~/tmp$ ls
old.py
~/tmp$ cat old.py > new.py
~/tmp$ rm old.py
~/tmp$ vim new.py
~/tmp$ cat new.py
#!/usr/bin/env python
print('This is a fairly useless script.')
print("Maybe I'll improve it later...")
print("Although, I've edited it somewhat, it's still useless")
~/tmp$ git status
# On branch master
# Changes not staged for commit:
# (use "git add/rm <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# deleted: old.py
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# new.py
no changes added to commit (use "git add" and/or "git commit -a")
~/tmp$ git add -A .
~/tmp$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# renamed: old.py -> new.py
#
So it *is* Theseus' ship!
Oscar
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-22 11:04 +1100 |
| Message-ID | <mailman.772.1358813081.2939.python-list@python.org> |
| In reply to | #37230 |
On Tue, Jan 22, 2013 at 10:43 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote: > On 21 January 2013 23:01, Tom P <werotizy@freent.dd> wrote: >> On 01/21/2013 01:39 PM, Oscar Benjamin wrote: >>> This is a very old problem (still unsolved I believe): >>> http://en.wikipedia.org/wiki/Ship_of_Theseus >>> >> That wiki article gives a hint to a poosible solution -use a timestamp to >> determine which key is valid when. > > In the Ship of Theseus, it is only argued that it is the same ship > because people were aware of the incremental changes that took place > along the way. The same applies here: if you don't track the > incremental changes and the two files have nothing concrete in common, > what does it mean to say that a file is "the same file" as some older > file? > > That being said, I've always been impressed with the way that git can > understand when I think that a file is the same as some older file > (though it does sometimes go wrong): Yeah, git's awesome like that :) It looks at file similarity, though, so if you completely rewrite a file and simultaneously rename/move it, git will lose track of it. And as you say, sometimes it gets things wrong - if you merge a large file into a small one, git will report it as a deletion and rename. (Of course, it doesn't make any difference. It's just a matter of reporting.) Mercurial, if I understand correctly, actually _tracks_ moves (and copies), but git just records a deletion and a creation. My family in fact has a literal "grandfather's axe" (except that I don't think either of my grandfathers actually owned it, but it's my Dad's old axe) that has had many new handles and a couple of new heads. Bringing it back to computers, we have on our network two computers "Stanley" and "Ollie" that have been there ever since we first set up that network. Back then, it was coax cable, 10base2, no routers/switches/etc, and the computers were I think early Pentiums. We installed the database on one of them, and set the other in Dad's office. Today, we have a modern Ethernet setup with modern hardware and cat-5 cable; we still have Stanley with the database and Ollie in the office. The name/identity of the computer is mostly associated with its roles; but those roles can shift too (there was a time when Ollie was the internet gateway, but that's no longer the case). Identity is its own attribute. The problem isn't that identity can't exist. It's that it can't be discovered. That takes external knowledge. Dave's analogy is accurate. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2013-01-22 17:36 -0800 |
| Message-ID | <45ef0f07-505e-4986-bc99-bd0ce86c101c@q16g2000pbt.googlegroups.com> |
| In reply to | #37237 |
On Jan 22, 10:04 am, Chris Angelico <ros...@gmail.com> wrote: > My family in fact has a literal "grandfather's axe" (except that I > don't think either of my grandfathers actually owned it, but it's my > Dad's old axe) that has had many new handles and a couple of new > heads. Ah, that's brilliant, I hadn't heard that term before, and it'll be a lot easier to explain to people than the Theseus example. How we think of identity is _awesome_ :)
[toc] | [prev] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-01-21 07:47 -0500 |
| Message-ID | <mailman.743.1358772477.2939.python-list@python.org> |
| In reply to | #37181 |
[Multipart message — attachments visible in raw view] — view raw
This is trolling Ferrous. you are a troll. Go away On Mon, Jan 21, 2013 at 7:39 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote: > On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k@gmail.com> wrote: > > Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris > Angelico έγραψε: > >> > >> Seriously, you're asking for something that's beyond the power of > >> humans or computers. You want to identify that something's the same > >> file, without tracking the change or having any identifiable tag. > >> > >> That's a fundamentally impossible task. > > > > No, it is difficult but not impossible. > > It just cannot be done by tagging the file by: > > > > 1. filename > > 2. filepath > > 3. hash (math algorithm producing a string based on the file's contents) > > > > We need another way to identify the file WITHOUT using the above > attributes. > > This is a very old problem (still unsolved I believe): > http://en.wikipedia.org/wiki/Ship_of_Theseus > > > Oscar > -- > http://mail.python.org/mailman/listinfo/python-list > -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
Page 1 of 4 [1] 2 3 4 Next page →
Back to top | Article view | comp.lang.python
csiph-web