Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #404
| From | no.top.post@gmail.com |
|---|---|
| Newsgroups | comp.lang.postscript |
| Subject | Magnifying pdf cleans irregularities? |
| Date | 2011-10-22 21:41 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <j7vda5$2kk$1@dont-email.me> (permalink) |
By using gocr on: http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf I've been trying to extract the ASCII. So far, using: pdftoppm -f 13 -l 13 -r 300 sun.clarion2005.pdf | gocr -o ppm13.300 gives the best Optical Character recognition results. But it sees "k" as "h". What confuses me, is that when I view with xpdf, the text looks as if it was printed by a bad-condition 1950 typewriter. I especially remember "2004" where the 'bottoms' were badly un-aligned. But if I set xpdf to 'magnify' a section of the text, it looks clean, and of course gocr decodes perfectly. I don't know exactly how the rendering works, but imagine that if the 'normal size' uses a bad quality font, and the magnified version uses a good quality font, that could explain what I'm seeing. Since the information that 'the char IS a "k" and not a "h" is in the *.pdf file, and quiet independant of ANY rendering, and gocr can correctly decode BIG font, should I not expect to be able to get gocr to decode correctly, by <filtering it through a suiatble font>? Thanks, == Chris Glur.
Back to comp.lang.postscript | Previous | Next — Next in thread | Find similar
Magnifying pdf cleans irregularities? no.top.post@gmail.com - 2011-10-22 21:41 +0000 Re: Magnifying pdf cleans irregularities? luser- -droog <mijoryx@yahoo.com> - 2011-10-22 23:19 -0700 Re: Magnifying pdf cleans irregularities? Helge Blischke <h.blischke@acm.org> - 2011-10-23 10:29 +0200
csiph-web