Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #291

Re: Subject: techniques of extracting the original ASCII?

From ken <ken@spamcop.net>
Newsgroups comp.lang.postscript
Subject Re: Subject: techniques of extracting the original ASCII?
Date 2011-08-14 09:03 +0100
Message-ID <MPG.28b19188fb469bc0989859@usenet.plus.net> (permalink)
References <j25v3u$2i0$3@dont-email.me>

Show all headers | View raw


In article <j25v3u$2i0$3@dont-email.me>, UsePlainText@dog.edu says...

> I previously asked here:
> Why is ps/pdf quirky with "ff" ?
> 
> and got the answers that "it's rendered by a glyph".
> Well of course, but WHY? Why then isn't "a"
> "rendered by a glyph"?

It is.
 
> Since char("f") was originally entered by a keyboard as
> ASCII, why should *IT*, and not other chars be transformed?

Because, typographically, if you get certain character combinations (ff, 
fi, fl, ffi etc) they look 'poor' if rendered by drawing the individual 
glpyhs. That is 'f' then 'i' 'f' then 'f' etc.

So you aren't entering 'f', you are entering 'f' followed by another 
character.

Typographers like their fonts to look their best and so, for those 
specific combinations, and others in other languages, they create what 
are called 'ligatures'. These are glyphs which comprise the individual 
letters, but drawn in a more pleasing way. Often this means drawing (for 
example) the f and i close together and dropping the dot above the 'i' 
which would otherwise collide with the f.

Applications which recognise the combinations, and detect the existence 
of the ligature in the font, can use the ligature instead of the 
individual glpyhs, to give a more pleasing result.

Note that this is the *application* making the decision, NOT PostScript 
or PDF.
 
> How do you solve this problem of not being able to get
> as ascii version of such 'texts' ?

You could try using the new experimental 'txtwrite' device in the latest 
version of Ghostscript (9.04), which will produce UTF-16 (NOT ASCII) 
output from a file. I plan to add UTF-8 later, which would be ASCII 
output if the input is ASCII. I'm not planning to add ligature 
conversion but you could do it yourself easily enough.

 
> Does ps & pdf render characters sequentially, or pixels,
> or columns or glyphs sequentially; 

None of the above, they are rendered in the order the document desribes 
them, which may be completely arbitrary. The character code is converted 
to a glyph name or CID and that is used to find the outline in the font 
program, which is then rendered.


> and if by glyphs: do 
> they have variable positions on the screen?

They can have different positions (obviously) on the output and certain 
operators can even alter the spacing between glyphs.



			Ken

Back to comp.lang.postscript | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Subject: techniques of extracting the original ASCII? "NoHtmlMailsPlease" <UsePlainText@dog.edu> - 2011-08-12 20:59 +0200
  Re: Subject: techniques of extracting the original ASCII? ken <ken@spamcop.net> - 2011-08-14 09:03 +0100
    Re (2): Subject: techniques of extracting the original ASCII? no.top.post@gmail.com - 2011-08-27 14:53 +0000
      Re: Re (2): Subject: techniques of extracting the original ASCII? ken <ken@spamcop.net> - 2011-08-28 07:24 +0100
  Re: Subject: techniques of extracting the original ASCII? bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-08-15 10:09 +0100
  Re: Subject: techniques of extracting the original ASCII? John Reiser <jreiserfl@comcast.net> - 2011-08-15 06:10 -0700

csiph-web