Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #406
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail |
|---|---|
| From | Helge Blischke <h.blischke@acm.org> |
| Newsgroups | comp.lang.postscript |
| Subject | Re: Magnifying pdf cleans irregularities? |
| Followup-To | comp.lang.postscript |
| Date | Sun, 23 Oct 2011 10:29:15 +0200 |
| Lines | 40 |
| Message-ID | <9gi1itF8diU1@mid.individual.net> (permalink) |
| References | <j7vda5$2kk$1@dont-email.me> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="us-ascii" |
| Content-Transfer-Encoding | 7Bit |
| X-Trace | individual.net 08OMqFPcB/Zfwh7jZ1X6KQHfkpXuzFbb3JfxvV0OuNgrZAqaRS |
| Cancel-Lock | sha1:hyMhBrVST57hcKrZjKYyku5N7J4= |
| User-Agent | KNode/0.99.01 |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.postscript:406 |
Followups directed to: comp.lang.postscript
Show key headers only | View raw
no.top.post@gmail.com wrote: > By using gocr on: > http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf > I've been trying to extract the ASCII. > > So far, using: > pdftoppm -f 13 -l 13 -r 300 sun.clarion2005.pdf | gocr -o ppm13.300 > gives the best Optical Character recognition results. > But it sees "k" as "h". > > What confuses me, is that when I view with xpdf, the text > looks as if it was printed by a bad-condition 1950 typewriter. > > I especially remember "2004" where the 'bottoms' were > badly un-aligned. But if I set xpdf to 'magnify' a section of > the text, it looks clean, and of course gocr decodes perfectly. > > I don't know exactly how the rendering works, but imagine > that if the 'normal size' uses a bad quality font, and the > magnified version uses a good quality font, that could > explain what I'm seeing. > > Since the information that 'the char IS a "k" and not > a "h" is in the *.pdf file, and quiet independant of ANY > rendering, and gocr can correctly decode BIG font, > should I not expect to be able to get gocr to decode > correctly, by <filtering it through a suiatble font>? > > Thanks, > > == Chris Glur. If you look at the PDF properties, you'll recognize that the fonts used are bitmapped type3 fonts (in a fairly high resolution, though). That leads to degraded rendering whenever recalculation of the bitmaps is required due to the different resolution of the canvas. Helge
Back to comp.lang.postscript | Previous | Next — Previous in thread | Find similar
Magnifying pdf cleans irregularities? no.top.post@gmail.com - 2011-10-22 21:41 +0000 Re: Magnifying pdf cleans irregularities? luser- -droog <mijoryx@yahoo.com> - 2011-10-22 23:19 -0700 Re: Magnifying pdf cleans irregularities? Helge Blischke <h.blischke@acm.org> - 2011-10-23 10:29 +0200
csiph-web