Re: Magnifying pdf cleans irregularities?

Path	csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From	Helge Blischke <h.blischke@acm.org>
Newsgroups	comp.lang.postscript
Subject	Re: Magnifying pdf cleans irregularities?
Followup-To	comp.lang.postscript
Date	Sun, 23 Oct 2011 10:29:15 +0200
Lines	40
Message-ID	<9gi1itF8diU1@mid.individual.net> (permalink)
References	<j7vda5$2kk$1@dont-email.me>
Mime-Version	1.0
Content-Type	text/plain; charset="us-ascii"
Content-Transfer-Encoding	7Bit
X-Trace	individual.net 08OMqFPcB/Zfwh7jZ1X6KQHfkpXuzFbb3JfxvV0OuNgrZAqaRS
Cancel-Lock	sha1:hyMhBrVST57hcKrZjKYyku5N7J4=
User-Agent	KNode/0.99.01
Xref	x330-a1.tempe.blueboxinc.net comp.lang.postscript:406

Followups directed to: comp.lang.postscript

Show key headers only | View raw

no.top.post@gmail.com wrote:

> By using gocr on:
> http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf
> I've been trying to extract the ASCII.
> 
> So far, using:
> pdftoppm -f 13 -l 13 -r 300 sun.clarion2005.pdf | gocr -o ppm13.300
> gives the best Optical Character recognition results.
> But it sees "k" as "h".
> 
> What confuses me, is that when I view with xpdf, the text
> looks as if it was printed by a bad-condition 1950 typewriter.
> 
> I especially remember "2004" where the 'bottoms' were
> badly un-aligned. But if I set xpdf to 'magnify' a section of
> the text, it looks clean, and of course gocr decodes perfectly.
> 
> I don't know exactly how the rendering works, but imagine
> that if the 'normal size' uses a bad quality font, and the
> magnified version uses a good quality font, that could
> explain what I'm seeing.
> 
> Since the information that 'the char IS a "k" and not
> a "h" is in the *.pdf file, and quiet independant of ANY
> rendering, and gocr can correctly decode BIG font,
> should I not expect to be able to get gocr to decode
> correctly, by <filtering it through a suiatble font>?
> 
> Thanks,
> 
> == Chris Glur.

If you look at the PDF properties, you'll recognize that the fonts used are 
bitmapped type3 fonts (in a fairly high resolution, though). That leads to 
degraded rendering whenever recalculation of the bitmaps is required due to 
the different resolution of the canvas.

Helge

Back to comp.lang.postscript | Previous | Next — Previous in thread | Find similar

Thread

Magnifying pdf cleans irregularities? no.top.post@gmail.com - 2011-10-22 21:41 +0000
  Re: Magnifying pdf cleans irregularities? luser- -droog <mijoryx@yahoo.com> - 2011-10-22 23:19 -0700
  Re: Magnifying pdf cleans irregularities? Helge Blischke <h.blischke@acm.org> - 2011-10-23 10:29 +0200

csiph-web