Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.glorb.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Alec Taylor Newsgroups: comp.lang.postscript Subject: Comparing geometric layout information across "pages" Date: Tue, 11 Oct 2011 02:23:35 -0700 (PDT) Organization: http://groups.google.com Lines: 10 Message-ID: <30687018.365.1318325015549.JavaMail.geo-discussion-forums@prib32> Reply-To: comp.lang.postscript@googlegroups.com NNTP-Posting-Host: 124.184.77.143 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: posting.google.com 1318327724 26547 127.0.0.1 (11 Oct 2011 10:08:44 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Tue, 11 Oct 2011 10:08:44 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=124.184.77.143; posting-account=A5YXqgoAAAAbDxo-x9fcYgsEu_QAX8W5 User-Agent: G2/1.0 X-Google-Web-Client: true Xref: x330-a1.tempe.blueboxinc.net comp.lang.postscript:386 Good afternoon, Do you have some recommends and/or sample code for comparing textual and geometric layout information across pages? [in C or C++] Basically I'm trying to realise patterns within documents, e.g., page numbers, header and footers, title, column information &etc; using regex [which will recognise the patterns and store them in a boost::bimap.] Thanks for all suggestions, Alec Taylor