Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #83425 > unrolled thread

Why do the URLs of posts here change?

Started bySteven D'Aprano <steve+comp.lang.python@pearwood.info>
First post2015-01-09 21:56 +1100
Last post2015-01-10 07:57 -0600
Articles 15 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  Why do the URLs of posts here change? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-01-09 21:56 +1100
    Re: Why do the URLs of posts here change? Skip Montanaro <skip.montanaro@gmail.com> - 2015-01-09 06:04 -0600
      Re: Why do the URLs of posts here change? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-01-09 23:28 +1100
      Re: Why do the URLs of posts here change? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-01-10 16:21 +1300
        Re: Why do the URLs of posts here change? Chris Angelico <rosuav@gmail.com> - 2015-01-10 14:53 +1100
          Re: Why do the URLs of posts here change? albert@spenarnc.xs4all.nl (Albert van der Horst) - 2015-01-17 16:39 +0000
            Re: Why do the URLs of posts here change? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-01-18 11:31 +1300
    Re: Why do the URLs of posts here change? Peter Otten <__peter__@web.de> - 2015-01-09 13:04 +0100
    Re: Why do the URLs of posts here change? Rustom Mody <rustompmody@gmail.com> - 2015-01-09 06:09 -0800
      Re: Why do the URLs of posts here change? Skip Montanaro <skip.montanaro@gmail.com> - 2015-01-09 08:15 -0600
        Re: Why do the URLs of posts here change? Rustom Mody <rustompmody@gmail.com> - 2015-01-09 08:52 -0800
          Re: Why do the URLs of posts here change? Chris Angelico <rosuav@gmail.com> - 2015-01-10 03:57 +1100
            Re: Why do the URLs of posts here change? Rustom Mody <rustompmody@gmail.com> - 2015-01-09 09:19 -0800
    Re: Why do the URLs of posts here change? Terry Reedy <tjreedy@udel.edu> - 2015-01-10 04:41 -0500
    Re: Why do the URLs of posts here change? Skip Montanaro <skip.montanaro@gmail.com> - 2015-01-10 07:57 -0600

#83425 — Why do the URLs of posts here change?

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-01-09 21:56 +1100
SubjectWhy do the URLs of posts here change?
Message-ID<54afb3ed$0$12995$c3e8da3$5496439d@news.astraweb.com>
I have come across this in the past, but today it annoyed me enough that I'm
asking for an explanation.

Posts on this newsgroup/mailing list are archived on the web, but the URLs
seem to change, which leaves dead links if you search for things.

For example, today I searched for a quote about floating point equality by
William Kahan, and I came across this post by me:

https://mail.python.org/pipermail/python-list/2008-February/468598.html

But that's a dead link! Here's Google's cache of it:

http://webcache.googleusercontent.com/search?client=opera&rls=en&q=cache:i0cWb0Tjxe0J:https://mail.python.org/pipermail/python-list/2008-February/468598.html%2Bfloating+point+superstition+equality&oe=utf-8&channel=suggest&gws_rd=ssl&hl=en&&ct=clnk

And here is the actual URL, as it appears today:

https://mail.python.org/pipermail/python-list/2008-February/481374.html


Why has the URL changed? Surely this is a bug? Where can I report it?




-- 
Steven

[toc] | [next] | [standalone]


#83430

FromSkip Montanaro <skip.montanaro@gmail.com>
Date2015-01-09 06:04 -0600
Message-ID<mailman.17520.1420805082.18130.python-list@python.org>
In reply to#83425
> Posts on this newsgroup/mailing list are archived on the web, but the URLs
> seem to change, which leaves dead links if you search for things.

Steven,

It's a known issue, but one which appears to be somewhat unavoidable,
at least in Mailman 2.x. The problem is that every now and then,
postmaster@python.org gets a legitimate request from someone for a
message to be deleted from the list archive. The way this is done, is
that the message is removed from the underlying mbox file, and the
archive regenerated. That changes the counter for every message after
that point - or maybe every message in the generated archive. (I have
no idea why the numerical basename of your subject message would have
changed so much. Maybe there is just a single ever incrementing
counter for a given Mailman installation.)

>From a technical standpoint, these sorts of requests are pretty
futile, since comp.lang.python/python-list@python.org is archived in
so many places, but that doesn't make the requests any less
legitimate. Consequently, when they arrive at the postmaster address,
they are generally taken care of in short order.

In my experience, they have generally fallen into two categories:

1. Safety. I recall one request where a woman accidentally posted
using an otherwise private email address. She was being stalked by her
ex-husband, and that address was unknown to him.

2. Defamation. There was a spate of recent messages (in Italian)
defaming a couple people, accusing them of being Nazis or pedophiles.

I will point out one class of messages which aren't deleted: those
which demonstrate people's stupidity. People do dumb things - e.g.,
fly off the handle during a flame war - which they sometimes later
realize reflects rather poorly on them (in future job searches and so
forth). Those sorts of message deletion requests are rejected.

That all said, I don't know if Mailman 3 (or some other archiver than
pipermail) will improve on this problem. I suggest a post to
mailman-users@python.org if you're curious about the Mailman
state-of-the-art in this area.

Skip

[toc] | [prev] | [next] | [standalone]


#83434

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-01-09 23:28 +1100
Message-ID<54afc96b$0$12985$c3e8da3$5496439d@news.astraweb.com>
In reply to#83430
Skip Montanaro wrote:

>> Posts on this newsgroup/mailing list are archived on the web, but the
>> URLs seem to change, which leaves dead links if you search for things.
[...]
> That all said, I don't know if Mailman 3 (or some other archiver than
> pipermail) will improve on this problem. I suggest a post to
> mailman-users@python.org if you're curious about the Mailman
> state-of-the-art in this area.

Thanks for the explanation!


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#83487

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2015-01-10 16:21 +1300
Message-ID<chbk5cF7dv9U1@mid.individual.net>
In reply to#83430
Skip Montanaro wrote:
> The way this is done, is
> that the message is removed from the underlying mbox file, and the
> archive regenerated. That changes the counter for every message after
> that point

Would it help to replace the message with a stub
instead of deleting it altogether?

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#83489

FromChris Angelico <rosuav@gmail.com>
Date2015-01-10 14:53 +1100
Message-ID<mailman.17551.1420862015.18130.python-list@python.org>
In reply to#83487
On Sat, Jan 10, 2015 at 2:21 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Skip Montanaro wrote:
>>
>> The way this is done, is
>> that the message is removed from the underlying mbox file, and the
>> archive regenerated. That changes the counter for every message after
>> that point
>
>
> Would it help to replace the message with a stub
> instead of deleting it altogether?

I had the same thought, but apparently not, according to the page
Peter Otten linked to:

http://wiki.list.org/display/DEV/Stable+URLs

ChrisA

[toc] | [prev] | [next] | [standalone]


#83931

Fromalbert@spenarnc.xs4all.nl (Albert van der Horst)
Date2015-01-17 16:39 +0000
Message-ID<54ba9041$0$6961$e4fe514c@dreader36.news.xs4all.nl>
In reply to#83489
In article <mailman.17551.1420862015.18130.python-list@python.org>,
Chris Angelico  <rosuav@gmail.com> wrote:
>On Sat, Jan 10, 2015 at 2:21 PM, Gregory Ewing
><greg.ewing@canterbury.ac.nz> wrote:
>> Skip Montanaro wrote:
>>>
>>> The way this is done, is
>>> that the message is removed from the underlying mbox file, and the
>>> archive regenerated. That changes the counter for every message after
>>> that point
>>
>>
>> Would it help to replace the message with a stub
>> instead of deleting it altogether?
>
>I had the same thought, but apparently not, according to the page
>Peter Otten linked to:
>
>http://wiki.list.org/display/DEV/Stable+URLs

Knowing that the source is an mbox file, I don't need to follow
that link to conclude that one is not very inventive.
It suffices to replace the content of the message by
a repetition of 'xxxx\n'. Maybe also the sender and the subject.

>
>ChrisA
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

[toc] | [prev] | [next] | [standalone]


#83956

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2015-01-18 11:31 +1300
Message-ID<ci0662Fp9a3U1@mid.individual.net>
In reply to#83931
Albert van der Horst wrote:
> Knowing that the source is an mbox file, I don't need to follow
> that link to conclude that one is not very inventive.
> It suffices to replace the content of the message by
> a repetition of 'xxxx\n'.

Editing the mbox file isn't the problem. From what I
gather, telling mailman to regenerate the web pages
from the mbox file causes all the messages to be given
new ID numbers, even if they remain in the same places
in the mbox.

So the web pages as well as the mbox would have to
be edited by hand, instead of using the auto regen
process.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#83431

FromPeter Otten <__peter__@web.de>
Date2015-01-09 13:04 +0100
Message-ID<mailman.17521.1420805425.18130.python-list@python.org>
In reply to#83425
Steven D'Aprano wrote:

> I have come across this in the past, but today it annoyed me enough that
> I'm asking for an explanation.
> 
> Posts on this newsgroup/mailing list are archived on the web, but the URLs
> seem to change, which leaves dead links if you search for things.
> 
> For example, today I searched for a quote about floating point equality by
> William Kahan, and I came across this post by me:
> 
> https://mail.python.org/pipermail/python-list/2008-February/468598.html
> 
> But that's a dead link! Here's Google's cache of it:
> 
> 
http://webcache.googleusercontent.com/search?client=opera&rls=en&q=cache:i0cWb0Tjxe0J:https://mail.python.org/pipermail/python-list/2008-February/468598.html%2Bfloating+point+superstition+equality&oe=utf-8&channel=suggest&gws_rd=ssl&hl=en&&ct=clnk
> 
> And here is the actual URL, as it appears today:
> 
> https://mail.python.org/pipermail/python-list/2008-February/481374.html
> 
> 
> Why has the URL changed? Surely this is a bug? Where can I report it?

This is a flaw of the mailman software.

http://wiki.list.org/display/DEV/Stable+URLs

suggests that the developers are aware of it. 

I don't know if there is a version available that has stable urls...

[toc] | [prev] | [next] | [standalone]


#83440

FromRustom Mody <rustompmody@gmail.com>
Date2015-01-09 06:09 -0800
Message-ID<28f970bd-188d-4455-b2b2-3320c546ad65@googlegroups.com>
In reply to#83425
On Friday, January 9, 2015 at 4:26:58 PM UTC+5:30, Steven D'Aprano wrote:
> I have come across this in the past, but today it annoyed me enough that I'm
> asking for an explanation.
> 
> Posts on this newsgroup/mailing list are archived on the web, but the URLs
> seem to change, which leaves dead links if you search for things.
> 
> For example, today I searched for a quote about floating point equality by
> William Kahan, and I came across this post by me:
> 
> https://mail.python.org/pipermail/python-list/2008-February/468598.html
> 
> But that's a dead link! Here's Google's cache of it:
> 
> http://webcache.googleusercontent.com/search?client=opera&rls=en&q=cache:i0cWb0Tjxe0J:https://mail.python.org/pipermail/python-list/2008-February/468598.html%2Bfloating+point+superstition+equality&oe=utf-8&channel=suggest&gws_rd=ssl&hl=en&&ct=clnk
> 
> And here is the actual URL, as it appears today:
> 
> https://mail.python.org/pipermail/python-list/2008-February/481374.html
> 
> 
> Why has the URL changed? Surely this is a bug? Where can I report it?

Theres a new app/service that should solve your problem:
Its from google... and called groups <wink>

[toc] | [prev] | [next] | [standalone]


#83441

FromSkip Montanaro <skip.montanaro@gmail.com>
Date2015-01-09 08:15 -0600
Message-ID<mailman.17526.1420812949.18130.python-list@python.org>
In reply to#83440

[Multipart message — attachments visible in raw view] — view raw

On Fri, Jan 9, 2015 at 8:09 AM, Rustom Mody <rustompmody@gmail.com> wrote:

> Theres a new app/service that should solve your problem:
> Its from google... and called groups <wink>
>

It solves one problem (moving archive URLs) by, I think, ignoring the other
(archive posts which should really be removed).

Skip

[toc] | [prev] | [next] | [standalone]


#83457

FromRustom Mody <rustompmody@gmail.com>
Date2015-01-09 08:52 -0800
Message-ID<f7e24119-fd65-4525-b10c-063a7eef4ad1@googlegroups.com>
In reply to#83441
On Friday, January 9, 2015 at 7:46:42 PM UTC+5:30, Skip Montanaro wrote:
> On Fri, Jan 9, 2015 at 8:09 AM, Rustom Mody  wrote:
> 
> Theres a new app/service that should solve your problem:
> 
> Its from google... and called groups <wink>
> It solves one problem (moving archive URLs) by, I think, ignoring the other (archive posts which should really be removed).
> 
> 
> Skip

Is it?
Ok lets test that.
This is posted from google-groups.
After posting I shall remove it

[toc] | [prev] | [next] | [standalone]


#83458

FromChris Angelico <rosuav@gmail.com>
Date2015-01-10 03:57 +1100
Message-ID<mailman.17537.1420822660.18130.python-list@python.org>
In reply to#83457
On Sat, Jan 10, 2015 at 3:52 AM, Rustom Mody <rustompmody@gmail.com> wrote:
> Is it?
> Ok lets test that.
> This is posted from google-groups.
> After posting I shall remove it

Remove it from GG, maybe, but I doubt very much it'll be removed from
the python.org archive. It's virtually impossible to remove something
from everywhere... you have to find every copy and hope none have been
web-archived yet.

ChrisA

[toc] | [prev] | [next] | [standalone]


#83459

FromRustom Mody <rustompmody@gmail.com>
Date2015-01-09 09:19 -0800
Message-ID<a79802e4-dd05-4886-ba24-25567f782d84@googlegroups.com>
In reply to#83458
On Friday, January 9, 2015 at 10:27:53 PM UTC+5:30, Chris Angelico wrote:
> On Sat, Jan 10, 2015 at 3:52 AM, Rustom Mody wrote:
> > Is it?
> > Ok lets test that.
> > This is posted from google-groups.
> > After posting I shall remove it
> 
> Remove it from GG, maybe, but I doubt very much it'll be removed from
> the python.org archive. It's virtually impossible to remove something
> from everywhere... you have to find every copy and hope none have been
> web-archived yet.
> 
> ChrisA

Precisely my point.
Removing something from the web is really a meaningless activity
[apart from some moral feel-good factor]
If that gesture means something to you, GG provides it

And to the best of my knowledge it does not screw up links like mailman.
We can test in some limited way that but I dont know how to do any test which will be
reasonably exhaustive

[toc] | [prev] | [next] | [standalone]


#83497

FromTerry Reedy <tjreedy@udel.edu>
Date2015-01-10 04:41 -0500
Message-ID<mailman.17561.1420883105.18130.python-list@python.org>
In reply to#83425
On 1/9/2015 7:04 AM, Skip Montanaro wrote:
>> Posts on this newsgroup/mailing list are archived on the web, but the URLs
>> seem to change, which leaves dead links if you search for things.
>
> Steven,
>
> It's a known issue, but one which appears to be somewhat unavoidable,
> at least in Mailman 2.x. The problem is that every now and then,
> postmaster@python.org gets a legitimate request from someone for a
> message to be deleted from the list archive. The way this is done, is
> that the message is removed from the underlying mbox file,

The post could be replaced by a placeholder "This message deleted'

  and the
> archive regenerated. That changes the counter for every message after

A placeholder should avoid that.

> that point - or maybe every message in the generated archive. (I have
> no idea why the numerical basename of your subject message would have
> changed so much. Maybe there is just a single ever incrementing
> counter for a given Mailman installation.)

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#83501

FromSkip Montanaro <skip.montanaro@gmail.com>
Date2015-01-10 07:57 -0600
Message-ID<mailman.17564.1420898282.18130.python-list@python.org>
In reply to#83425
On Sat, Jan 10, 2015 at 3:41 AM, Terry Reedy <tjreedy@udel.edu> wrote:
> The post could be replaced by a placeholder "This message deleted'
>
>  and the
>>
>> archive regenerated. That changes the counter for every message after
>
>
> A placeholder should avoid that.

I suspect (though don't know for certain) that just regenerating the
archive without touching the mbox file will change the numbering.
Steven's original post mentioned two very different basenames (468598
and 481374). As I indicated in an earlier response, those might be
generated from an ever-growing counter, not just a shift as articles
slide one closer to the first one of the month.

So, you'd have to edit the mbox file carefully (might need to edit
headers) and also edit the generated HTML for the message. Neither is
an insurmountable task, but both are going to be more error-prone than
just cutting out an entire message and regenerating the archive.

I will pass along your suggestion to the postmaster folks (I don't get
involved that that level - it's mostly the folks who directly maintain
the Postfix setup who do this), though. They are a generally pretty
responsive bunch.

Skip

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web