Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #313
| From | ken <ken@spamcop.net> |
|---|---|
| Newsgroups | comp.lang.postscript |
| Subject | Re: Re (2): Subject: techniques of extracting the original ASCII? |
| Date | 2011-08-28 07:24 +0100 |
| Message-ID | <MPG.28c3ef44d4c061cb98985d@usenet.plus.net> (permalink) |
| References | <MPG.28b19188fb469bc0989859@usenet.plus.net> <j3b0dm$qma$1@dont-email.me> |
In article <j3b0dm$qma$1@dont-email.me>, no.top.post@gmail.com says...
> How many of these <f-something> ligatures are there?
In Latin languages, I think ff, ffi, ffl, are the most common, but see:
http://en.wikipedia.org/wiki/Typographic_ligature
> Do they have a common ID, for the different fonts?
I'm, not sure what you mean by a common ID. The glyphs are named things
like '/ffi'.
> Can they be easily 'extracted' from a *.pdf?
The same as any other text, yes.
> Can the renderer be modified to do:
> IF <fi-ligature> THEN put("f); put("i") ?
NO. Also, why would you care ebout rendering ?
> > You could try using the new experimental 'txtwrite' device in the
latest
> > version of Ghostscript (9.04), which will produce UTF-16 (NOT ASCII)
> > output from a file. I plan to add UTF-8 later, which would be ASCII
> > output if the input is ASCII. I'm not planning to add ligature
> > conversion but you could do it yourself easily enough.
> >
> Isn't ASCII to UTF-16 a one-to-one-mapping?
The content of the PDF file may have characters encoded in some fashion
other than ASCII, and almost certainly not UTF-16. Why would it be a
one-way map ? If I k now its ASCII, then I can convert it to something
else (eg UTF-16 or UTF-8) and vice versa.
> Does "ligature conversion" mean eg. converting glyph(fi)
> to chars("fi")
Yes, exactly.
> and if so why don't the converters do it?
Because its not atually the same thing.
> Even Win7's adobe <pdf to speech> can't handle "ff".
Well speech output is a little different.
> Where's the basic *.pdf renderer algorithm explained?
Everything about PDF is explained in the PDF Reference Manual. Its not
(IMO) as good a document as the PostScript Language Reference Manual,
but it isn't too bad. One of the biggest problems is that Adobe Acrobat
doesn't actually stick to it, and will open many files which are
technically illegal.
There are few details of rendering, because (with the exception of
things like pixel coverage) it doesn't matter how you render it, this is
left up to the rasteriser. If you really want to know more about
rendering graphical objects in PDF, then you should also read the
PostScript Language Reference Manual, which has more details.
Ken
Back to comp.lang.postscript | Previous | Next — Previous in thread | Next in thread | Find similar
Subject: techniques of extracting the original ASCII? "NoHtmlMailsPlease" <UsePlainText@dog.edu> - 2011-08-12 20:59 +0200
Re: Subject: techniques of extracting the original ASCII? ken <ken@spamcop.net> - 2011-08-14 09:03 +0100
Re (2): Subject: techniques of extracting the original ASCII? no.top.post@gmail.com - 2011-08-27 14:53 +0000
Re: Re (2): Subject: techniques of extracting the original ASCII? ken <ken@spamcop.net> - 2011-08-28 07:24 +0100
Re: Subject: techniques of extracting the original ASCII? bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-08-15 10:09 +0100
Re: Subject: techniques of extracting the original ASCII? John Reiser <jreiserfl@comcast.net> - 2011-08-15 06:10 -0700
csiph-web