Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #406
| From | Helge Blischke <h.blischke@acm.org> |
|---|---|
| Newsgroups | comp.lang.postscript |
| Subject | Re: Magnifying pdf cleans irregularities? |
| Followup-To | comp.lang.postscript |
| Date | 2011-10-23 10:29 +0200 |
| Message-ID | <9gi1itF8diU1@mid.individual.net> (permalink) |
| References | <j7vda5$2kk$1@dont-email.me> |
Followups directed to: comp.lang.postscript
no.top.post@gmail.com wrote: > By using gocr on: > http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf > I've been trying to extract the ASCII. > > So far, using: > pdftoppm -f 13 -l 13 -r 300 sun.clarion2005.pdf | gocr -o ppm13.300 > gives the best Optical Character recognition results. > But it sees "k" as "h". > > What confuses me, is that when I view with xpdf, the text > looks as if it was printed by a bad-condition 1950 typewriter. > > I especially remember "2004" where the 'bottoms' were > badly un-aligned. But if I set xpdf to 'magnify' a section of > the text, it looks clean, and of course gocr decodes perfectly. > > I don't know exactly how the rendering works, but imagine > that if the 'normal size' uses a bad quality font, and the > magnified version uses a good quality font, that could > explain what I'm seeing. > > Since the information that 'the char IS a "k" and not > a "h" is in the *.pdf file, and quiet independant of ANY > rendering, and gocr can correctly decode BIG font, > should I not expect to be able to get gocr to decode > correctly, by <filtering it through a suiatble font>? > > Thanks, > > == Chris Glur. If you look at the PDF properties, you'll recognize that the fonts used are bitmapped type3 fonts (in a fairly high resolution, though). That leads to degraded rendering whenever recalculation of the bitmaps is required due to the different resolution of the canvas. Helge
Back to comp.lang.postscript | Previous | Next — Previous in thread | Find similar
Magnifying pdf cleans irregularities? no.top.post@gmail.com - 2011-10-22 21:41 +0000 Re: Magnifying pdf cleans irregularities? luser- -droog <mijoryx@yahoo.com> - 2011-10-22 23:19 -0700 Re: Magnifying pdf cleans irregularities? Helge Blischke <h.blischke@acm.org> - 2011-10-23 10:29 +0200
csiph-web