Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #874
| From | no.top.post@gmail.com |
|---|---|
| Newsgroups | comp.lang.postscript, comp.sources.postscript, comp.text.pdf |
| Subject | Can't <pdf to text> |
| Date | 2012-08-12 08:06 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <k07o6l$mvs$1@dont-email.me> (permalink) |
Cross-posted to 3 groups.
What's with these *.pdf files which can't do <pdf to text>? eg. http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf Is the idea to prevent them being <copied>? Or is it that a photo/pixel-grab of the paper was the source? Is it that pdf & postscript render [to the VDU] a rectangle of pixels: being the font of a single char/glyph/image; and for a single char, the pixels are obtained from the bit-map/font? And for these problematic/un-decodable 'texts', it's a full-page rectangle 'photo' of the original text? Can someone recommend an OCR-utility for linux? == TIA.
Back to comp.lang.postscript | Previous | Next — Next in thread | Find similar | Unroll thread
Can't <pdf to text> no.top.post@gmail.com - 2012-08-12 08:06 +0000
Re: Can't <pdf to text> Ross Presser <rpresser@gmail.com> - 2012-08-12 21:16 -0700
Re (2): Can't <pdf to text> no.top.post@gmail.com - 2012-08-13 21:55 +0000
Re: Can't <pdf to text> tlvp <mPiOsUcB.EtLlLvEp@att.net> - 2012-08-13 01:48 -0400
Re: Can't <pdf to text> Joe Beanfish <joebeanfish@nospam.duh> - 2012-08-13 14:13 +0000
csiph-web