Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.os.linux.misc > #14921
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!us.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!newsfeed.straub-nv.de!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | Unknown <dog@gmail.com> |
| Newsgroups | comp.os.linux.misc |
| Subject | Re: pdf & O.C.R ? |
| Date | Sat, 13 Jun 2015 13:29:22 +0000 (UTC) |
| Organization | A noiseless patient Spider |
| Lines | 43 |
| Message-ID | <pan.2015.06.13.13.30.56@gmail.com> (permalink) |
| References | <pan.2015.05.23.07.50.46@gmail.com> <bjj73c-fim.ln1@mail.binaryfoundry.ca> <mk1sam$r2l$1@dont-email.me> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | 8bit |
| Injection-Date | Sat, 13 Jun 2015 13:29:22 +0000 (UTC) |
| Injection-Info | mx02.eternal-september.org; posting-host="14721bec97c85d45e51d42a6b98c7030"; logging-data="26765"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/25ySso8S429ONqeWggH8V0VSsFx1osQA=" |
| User-Agent | Pan/0.133 (House of Butterflies) |
| Cancel-Lock | sha1:4DPQbEJikdCz/ycM79yhRzo+uqc= |
| Xref | csiph.com comp.os.linux.misc:14921 |
Show key headers only | View raw
On Tue, 26 May 2015 13:26:46 +0000, Joe Beanfish wrote: > On Sat, 23 May 2015 20:46:03 -0400, John-Paul Stewart wrote: > >> On 23/05/15 03:49 AM, Unknown wrote: >>> I'm confused and disturbed that xpdf of: >>> http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.Computer.pdf >>> is perfect to the pixel, with maximum magnification [400%], which >>> is expected, since it's computer-font generated, whereas: >>> http://www.northernlaw.co.za/images/stories/files/actsandbills/COMPANY % >>> 20LAW%20ACT.pdf >>> shows blotchy and fibers as if it's a photo-of-a-paper-copy. >>> >>> And scanned copies of papers are apparently normal. >>> >>> BUT!! How is it that xpdf allows me to extract the text, via >>> mouse-copy from COMPANY%20LAW%20ACT.pdf ? >>> That would mean that the mouse-driver is doing O.C.R. ?! >> >> Why would you think the mouse driver is doing OCR? >> >> A PDF file can contain both text and images. It is common when >> scanning paper documents to turn them into a so-called "searchable PDF" >> that contains the scanned image of the page overlaid on top of the >> (OCRed) text. So what you see visually is the (possibly blurry) >> picture, while what the mouse is copying (and pdftotext is extracting) >> is the text that's hidden underneath. >> >> Adobe's own Acrobat software can create such "searchable PDF" files. >> I'm sure there are other tools, too. > This is TOO-MUCH!! You mean they send the original-keyed-in-pdftotextable, AND the graphical image of the crumpled-paper-version <overlaid>. What's the aim of such expensive deception? > Yeah, It's kinda interesting when your workstation's bogged down and the > pdf is big you might see the OCR text render first, then the image will > render, covering it up. Or maybe that only happens in the browser when > it's downloading and hasn't gotten to the image yet? Haven't seen it > happen in a while.
Back to comp.os.linux.misc | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
pdf & O.C.R ? Unknown <dog@gmail.com> - 2015-05-23 07:49 +0000
Re: pdf & O.C.R ? Bob Tennent <BobT@cs.queensu.ca> - 2015-05-23 11:13 +0000
Re: pdf & O.C.R ? Unknown <dog@gmail.com> - 2015-05-27 17:11 +0000
Re: pdf & O.C.R ? John-Paul Stewart <jpstewart@sympatico.ca> - 2015-05-23 20:46 -0400
Re: pdf & O.C.R ? Joe Beanfish <joebeanfish@nospam.duh> - 2015-05-26 13:26 +0000
Re: pdf & O.C.R ? Unknown <dog@gmail.com> - 2015-06-13 13:29 +0000
Re: pdf & O.C.R ? Robert Heller <heller@deepsoft.com> - 2015-06-13 12:52 -0500
Re: pdf & O.C.R ? Unknown <dog@gmail.com> - 2015-05-27 17:10 +0000
Re: pdf & O.C.R ? John-Paul Stewart <jpstewart@sympatico.ca> - 2015-05-29 20:31 -0400
csiph-web