Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #874

Can't <pdf to text>

From no.top.post@gmail.com
Newsgroups comp.lang.postscript, comp.sources.postscript, comp.text.pdf
Subject Can't <pdf to text>
Date 2012-08-12 08:06 +0000
Organization A noiseless patient Spider
Message-ID <k07o6l$mvs$1@dont-email.me> (permalink)

Cross-posted to 3 groups.

Show all headers | View raw


What's with these *.pdf files which can't do <pdf to text>?
eg. http://www.cogsci.rpi.edu/~rsun/sun.clarion2005.pdf

Is the idea to prevent them being <copied>?

Or is it that a photo/pixel-grab of the paper was the source?

Is it that pdf & postscript render [to the VDU] a rectangle
of pixels: being the font of a single char/glyph/image;
and for a single char, the pixels are obtained from the 
bit-map/font?

And for these problematic/un-decodable 'texts', it's
a full-page rectangle 'photo' of the original text?

Can someone recommend an OCR-utility for linux?

== TIA.

Back to comp.lang.postscript | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Can't <pdf to text> no.top.post@gmail.com - 2012-08-12 08:06 +0000
  Re: Can't <pdf to text> Ross Presser <rpresser@gmail.com> - 2012-08-12 21:16 -0700
    Re (2): Can't <pdf to text> no.top.post@gmail.com - 2012-08-13 21:55 +0000
  Re: Can't <pdf to text> tlvp <mPiOsUcB.EtLlLvEp@att.net> - 2012-08-13 01:48 -0400
  Re: Can't <pdf to text> Joe Beanfish <joebeanfish@nospam.duh> - 2012-08-13 14:13 +0000

csiph-web