Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #12990
| Date | 2011-09-09 12:09 +1000 |
|---|---|
| From | Simon Cropper <simoncropper@fossworkflowguides.com> |
| Subject | Re: Create an index from a webpage [RANT, DNFTT] |
| References | <mailman.874.1315484806.27778.python-list@python.org> <1537032.qVoOGUtdWV@PointedEars.de> <4e68db21$0$30002$c3e8da3$5496439d@news.astraweb.com> <mailman.886.1315525252.27778.python-list@python.org> <op.v1imgsvpa8ncjz@gnudebst> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.891.1315534207.27778.python-list@python.org> (permalink) |
On 09/09/11 10:32, Rhodri James wrote:
> On Fri, 09 Sep 2011 00:40:42 +0100, Simon Cropper
>
> Ahem. You should expect a certain amount of ribbing after admitting that
> your Google-fu is weak. So is mine, but hey.
I did not admit anything. I consider my ability to find this quite good
actually. Others assumed that my "Google-fu is weak".
>
>> 4. If someone is willing to help me, rather than lecture me (or poke
>> me to see if they get a response), I would appreciate it.
>
> The Google Python Sitemap Generator
> (http://www.smart-it-consulting.com/article.htm?node=166&page=128,
> fourth offering when you google "map a website with Python") looks like
> a promising start. At least it produces something in XML -- filtering
> that and turning it into HTML should be fairly straightforward.
>
I saw this in my original search. My conclusions were..
1. The last update was in 2005. That is 6 years ago. In that time we
have had numerous upgrades to HTML, Logs, etc.
2. The script expects to run on the webserver. I don't have the ability
to run python on my webserver.
3. There are also a number of dead-links and redirects to Google
Webmaster Central / Tools, which then request you submit a sitemap (as I
alluded we get into a circular confusing cross-referencing situation)
4. The ultimate product - if you can get the package to work - would be
a XML file you would need to massage to extract what you needed.
To me this seems like overkill.
I assume you could import the parent html file, scrap all the links on
the same domain, dump these to a hierarchical list and represent this in
HTML using BeautifulSoup or something similar. Certainly doable but
considering the shear commonality of this task I don't understand why a
simple script does not already exist - hence my original request for
assistance.
It would appear from the feedback so far this 'forum' is not the most
appropriate to ask this question. Consequently, I will take your advice
and keep looking... and if I don't find something within a reasonable
time frame, just write something myself.
--
Cheers Simon
Simon Cropper - Open Content Creator / Website Administrator
Free and Open Source Software Workflow Guides
------------------------------------------------------------
Introduction http://www.fossworkflowguides.com
GIS Packages http://gis.fossworkflowguides.com
bash / Python http://scripting.fossworkflowguides.com
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Create an index from a webpage Simon Cropper <simoncropper@fossworkflowguides.com> - 2011-09-08 22:26 +1000
Re: Create an index from a webpage Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-09-08 14:38 +0200
Re: Create an index from a webpage Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-09 01:11 +1000
Re: Create an index from a webpage [RANT, DNFTT] Simon Cropper <simoncropper@fossworkflowguides.com> - 2011-09-09 09:40 +1000
Re: Create an index from a webpage [RANT, DNFTT] "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2011-09-09 01:32 +0100
Re: Create an index from a webpage [RANT, DNFTT] Simon Cropper <simoncropper@fossworkflowguides.com> - 2011-09-09 12:09 +1000
Re: Create an index from a webpage [RANT, DNFTT] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-09 12:16 +1000
Re: Create an index from a webpage [RANT, DNFTT] Duncan Booth <duncan.booth@invalid.invalid> - 2011-09-09 10:29 +0000
Re: Create an index from a webpage [RANT, DNFTT] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-09 12:14 +1000
Re: Create an index from a webpage [RANT, DNFTT] Simon Cropper <simoncropper@fossworkflowguides.com> - 2011-09-09 12:43 +1000
Re: Create an index from a webpage [RANT, DNFTT] Chris Angelico <rosuav@gmail.com> - 2011-09-09 12:59 +1000
Re: Create an index from a webpage [RANT, DNFTT] Simon Cropper <simoncropper@fossworkflowguides.com> - 2011-09-09 13:20 +1000
Re: Create an index from a webpage [RANT, DNFTT] Chris Angelico <rosuav@gmail.com> - 2011-09-09 13:46 +1000
csiph-web