Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3819

Re: archive.org index of fermilab's archive of ps docs

From Anthk <anthk@disroot.org>
Newsgroups comp.lang.postscript
Subject Re: archive.org index of fermilab's archive of ps docs
Date 2022-12-13 01:54 +0000
Organization A noiseless patient Spider
Message-ID <slrntpfljj.1198.anthk@openbsd.home.local> (permalink)
References <c1804a43-da02-44d1-8c5b-cfcb522828a0n@googlegroups.com> <f4af945e-0372-4791-8279-43e7e842bcacn@googlegroups.com>

Show all headers | View raw


On 2022-04-29, Ross Presser <rpresser@gmail.com> wrote:
> On Monday, April 11, 2022 at 9:57:17 PM UTC-4, luser droog wrote:
>> https://web.archive.org/web/*/http://www-cdf.fnal.gov/offline/PostScript/* 
>> 
>> Get 'em while they're hot. It can still be removed if fnal changes its robots 
>> restrictions.
>
> Word to the wise:
> If the "MIME TYPE" column on the listing page luser droog linked to has "unk",
> the capture is going to be a 404 page. Mostly this appears to be when it crawled
> the wrong URL, e.g. 
> http://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf<p>
> instead of
> http://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf
>
> Some of the incorrect URLs are incredibly long and appear to be multiple URLs. 
> Try splitting into separate URLs.
>
> Sometimes the corrected URL is also captured; download that instead.

Sorry if I'm late, but here's a better URL.

http://theoldnet.com/get?url=http%3A%2F%2Fwww-cdf.fnal.gov%2Foffline%2FPostScript&year=2010&scripts=false&decode=false

Back to comp.lang.postscript | Previous | NextPrevious in thread | Find similar


Thread

archive.org index of fermilab's archive of ps docs luser droog <luser.droog@gmail.com> - 2022-04-11 18:57 -0700
  Re: archive.org index of fermilab's archive of ps docs Ross Presser <rpresser@gmail.com> - 2022-04-29 07:00 -0700
    Re: archive.org index of fermilab's archive of ps docs Anthk <anthk@disroot.org> - 2022-12-13 01:54 +0000

csiph-web