Groups > comp.lang.c > #77629 > unrolled thread

unicode is a fail

Started by	fir <profesor.fir@gmail.com>
First post	2015-12-02 08:01 -0800
Last post	2015-12-06 13:45 +0000
Articles	20 on this page of 158 — 25 participants

Back to article view | Back to comp.lang.c

  unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
    Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
                Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
                  Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
              Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
                Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
                  Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
                    Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
              Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
                Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
              Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
        Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
      Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
        Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
      Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
                  Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
        Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
        Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
            Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
          Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
          Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
              Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
                  Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
                    Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
              Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
                  Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
                        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
                          Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
                            Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
                      Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
                        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
                    Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
                        Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
                          Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
                              Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
                                Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
                                    Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
                                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
                                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
                                      Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
                                        Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
                                          Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
                                            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
                                            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
                                              Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
                                                    Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
                                                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
                                                        Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
                                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
                                              Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
                                              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
                                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
                                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
                                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
                                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
                                                  Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
                                            Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
                              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
                                    Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
                                      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
              Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
                  Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
                    Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
                      Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
                        Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
                          Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
                      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
                Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
                Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
                  Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
          Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
      Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
        OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
          Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
        Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
          OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
            Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000

Page 6 of 8 — ← Prev page 1 2 3 4 5 [6] 7 8 Next page →

#78178

From	Lowell Gilbert <lgusenet@be-well.ilk.org>
Date	2015-12-08 11:40 -0500
Message-ID	<44bna0lwqh.fsf@be-well.ilk.org>
In reply to	#78138

Stephen Sprunk <stephen@sprunk.org> writes:

> On 07-Dec-15 08:31, Malcolm McLean wrote:
>> Ben Bacarisse wrote:
>>> You say "You've got to consider the users" but you are not
>>> considering them.  You are classifying texts by language, not be
>>> what texts users want to read or write.  Users in Western Europe
>>> often want to use non-Latin scripts.
>> 
>> Only Greek, and in the special case where the non-Latin script 
>> language or text is itself the subject of the material.
>
> Western Europeans haven't discovered emojis yet?  They don't use
> mathematical or scientific symbols?  There are no translators,
> immigrants or diplomats who know a non-Latin language?  There are no
> schools that teach non-Latin languages?

This has now devolved into an argument over the word "often."

I suggest it may be time to take a break; go see "Pirates of Penzance."
We'll still be here when you get back.

-- 
"... I am not an orphan. And what's more, I never was one!"

[toc] | [prev] | [next] | [standalone]

#78187

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2015-12-08 17:18 +0000
Message-ID	<87a8pk7tb8.fsf@bsb.me.uk>
In reply to	#78178

Lowell Gilbert <lgusenet@be-well.ilk.org> writes:

> Stephen Sprunk <stephen@sprunk.org> writes:
>
>> On 07-Dec-15 08:31, Malcolm McLean wrote:
>>> Ben Bacarisse wrote:
>>>> You say "You've got to consider the users" but you are not
>>>> considering them.  You are classifying texts by language, not be
>>>> what texts users want to read or write.  Users in Western Europe
>>>> often want to use non-Latin scripts.
>>> 
>>> Only Greek, and in the special case where the non-Latin script 
>>> language or text is itself the subject of the material.
>>
>> Western Europeans haven't discovered emojis yet?  They don't use
>> mathematical or scientific symbols?  There are no translators,
>> immigrants or diplomats who know a non-Latin language?  There are no
>> schools that teach non-Latin languages?
>
> This has now devolved into an argument over the word "often."

Has it?  I don't see anyone objecting to my use of the word, and I'd be
happy to retract it if they did as I'm not a fan of arguing over vague
quantities.  The disagreement was over a statement about what "the rest
of Western Europe" uses.  It seems likely that the statement was just a
poor choice of words which Malcolm felt obliged to defend.  Had he said:

  "For simple English text you need ascii. The other Western European
  languages need extended Latin, and annoyingly those characters won't
  all quite fit into 8 bits."

I don't think there would be much to argue over.

<snip>
-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#78240

From	"Osmium" <r124c4u102@comcast.net>
Date	2015-12-09 08:36 -0600
Message-ID	<dcqsk5FoqslU1@mid.individual.net>
In reply to	#78187

"Ben Bacarisse" wrote:

> Has it?  I don't see anyone objecting to my use of the word, and I'd be
> happy to retract it if they did as I'm not a fan of arguing over vague
> quantities.  The disagreement was over a statement about what "the rest
> of Western Europe" uses.  It seems likely that the statement was just a
> poor choice of words which Malcolm felt obliged to defend.  Had he said:
>
>  "For simple English text you need ascii. The other Western European
>  languages need extended Latin, and annoyingly those characters won't
>  all quite fit into 8 bits."
>
> I don't think there would be much to argue over.

I haven't been watching this thread but that brings up a pet peeve of mine. 
If you didn't throw away about 32x2 = 64 characters for control characters, 
most of which are unused in the real world, I suspect European's could live 
very comfortably with the result.  ASCII was forced down our throats by 
AT&T and their Teletype® division. I would estimate at least 57 characters 
could be reclaimed.

[toc] | [prev] | [next] | [standalone]

#78252

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-09 10:06 -0600
Message-ID	<n49jdn$5g4$1@dont-email.me>
In reply to	#78240

On 09-Dec-15 08:36, Osmium wrote:
> "Ben Bacarisse" wrote:
>> "For simple English text you need ascii. The other Western
>> European languages need extended Latin, and annoyingly those
>> characters won't all quite fit into 8 bits."
>> 
>> I don't think there would be much to argue over.
> 
> I haven't been watching this thread but that brings up a pet peeve
> of mine. If you didn't throw away about 32x2 = 64 characters for
> control characters, most of which are unused in the real world, I
> suspect European's could live very comfortably with the result.
> ASCII was forced down our throats by AT&T and their Teletype®
> division. I would estimate at least 57 characters could be
> reclaimed.

The only difference between ISO-8859-1 and Windows-1252 is that the
latter replaces the C1 control codes with useful characters.  It still
isn't quite enough to cover all Western European languages, much less
all Latin-based scripts due to all the additional diacritics needed in
Central/Eastern Europe.

I haven't counted characters to see if replacing most of the C0 control
codes would do it, but I highly doubt it.

OTOH, if you accept combining rather than precomposed characters, it
becomes trivial to fit them all in; however, that brings in a host of
other "Unicode" problems that folks have been complaining about.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78262

From	Keith Thompson <kst-u@mib.org>
Date	2015-12-09 09:35 -0800
Message-ID	<ln8u53msog.fsf@kst-u.example.com>
In reply to	#78240

"Osmium" <r124c4u102@comcast.net> writes:
> "Ben Bacarisse" wrote:
>> Has it?  I don't see anyone objecting to my use of the word, and I'd be
>> happy to retract it if they did as I'm not a fan of arguing over vague
>> quantities.  The disagreement was over a statement about what "the rest
>> of Western Europe" uses.  It seems likely that the statement was just a
>> poor choice of words which Malcolm felt obliged to defend.  Had he said:
>>
>>  "For simple English text you need ascii. The other Western European
>>  languages need extended Latin, and annoyingly those characters won't
>>  all quite fit into 8 bits."
>>
>> I don't think there would be much to argue over.
>
> I haven't been watching this thread but that brings up a pet peeve of mine. 
> If you didn't throw away about 32x2 = 64 characters for control characters, 
> most of which are unused in the real world, I suspect European's could live 
> very comfortably with the result.  ASCII was forced down our throats by 
> AT&T and their Teletype® division. I would estimate at least 57 characters 
> could be reclaimed. 

ASCII is a 7-bit character code.  It has only 33 control characters
(0..31 and 127, DEL).

The various ISO-8859-N character sets, as well as Unicode, add
another 32 control characters from 128 to 159 (character 160,
"NO-BREAK SPACE", isn't considered a control character, but it's
"controlesque").

The control characters from 128 to 159 are very rarely used as far
as I know, and I agree that it would probably have made more sense
to use that range for printable characters.  On the other hand,
I don't know the issues that led to those control characters being
added; there must have been *some* valid reason for it.

But the ASCII control characters 0..31 and 127 are *very* useful
and necessary.  Neither vi nor emacs would work without them.
There might be alternatives for accepting similar keystrokes without
mapping them to 7-bit character codes, but I can't think of a scheme
that would work well when running over something that works like
a serial connection (like the one I'm using now to write this).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]

#78264

From	supercat@casperkitty.com
Date	2015-12-09 10:07 -0800
Message-ID	<448ac191-4d48-4160-bc44-e8ff696ca284@googlegroups.com>
In reply to	#78262

On Wednesday, December 9, 2015 at 11:35:32 AM UTC-6, Keith Thompson wrote:
> But the ASCII control characters 0..31 and 127 are *very* useful
> and necessary.  Neither vi nor emacs would work without them.

Codes 127/255 are an interesting case.  The purpose of 127/255 was not to
perform an action, but rather to be a nop alternative to 0.  A blank row
of punch-tape reads as zero; an all-holes-punched row reads as FF.  If the
operator of an ASR-33 was typing a story and made a mistake, the procedure
for making a correction was to push the back-one-row button on the punch
(which mechanically moved the paper back one row without sending any sort
of code) and then punch the "rub-out" button which sent code 127/255.  The
existence of the rub-out character on the tape would increase transmission
time by a tenth of a second, but not have any other adverse consequences.

As for codes 0x80-0x9F, those were set aside I think because some terminals
regard 0x80-0xFF as synonymous with 0x00-0x7F on reception, which meant that
if a terminal was being used for display-only purposes there was no need to
worry about parity settings.  If one sent a document with 8-bit chracter
data to a terminal configured for 7 bits ignore parity, characters beyond
0xA0 would show up as alternative characters, but everything else would
appear as it should.  If the document used characters 0x80-0x9F as printable
characters, they could cause the appearance of other characters to be
garbled.

[toc] | [prev] | [next] | [standalone]

#78280

From	Keith Thompson <kst-u@mib.org>
Date	2015-12-09 12:04 -0800
Message-ID	<lnvb87l76m.fsf@kst-u.example.com>
In reply to	#78264

supercat@casperkitty.com writes:
> On Wednesday, December 9, 2015 at 11:35:32 AM UTC-6, Keith Thompson wrote:
>> But the ASCII control characters 0..31 and 127 are *very* useful
>> and necessary.  Neither vi nor emacs would work without them.
>
> Codes 127/255 are an interesting case.  The purpose of 127/255 was not to
> perform an action, but rather to be a nop alternative to 0.  A blank row
> of punch-tape reads as zero; an all-holes-punched row reads as FF.  If the
> operator of an ASR-33 was typing a story and made a mistake, the procedure
> for making a correction was to push the back-one-row button on the punch
> (which mechanically moved the paper back one row without sending any sort
> of code) and then punch the "rub-out" button which sent code 127/255.  The
> existence of the rub-out character on the tape would increase transmission
> time by a tenth of a second, but not have any other adverse consequences.

Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a
control character used in interactive input.  It's commonly denotes
deleting a character, but only because of the mnemonic name, not because
it has 7 bits set to 1.  And 255 is LATIN SMALL LETTER Y WITH DIAERESIS.
The history of the all-rows-punched semantics is interesting, but it
doesn't directly affect modern usage.

> As for codes 0x80-0x9F, those were set aside I think because some terminals
> regard 0x80-0xFF as synonymous with 0x00-0x7F on reception, which meant that
> if a terminal was being used for display-only purposes there was no need to
> worry about parity settings.  If one sent a document with 8-bit chracter
> data to a terminal configured for 7 bits ignore parity, characters beyond
> 0xA0 would show up as alternative characters, but everything else would
> appear as it should.  If the document used characters 0x80-0x9F as printable
> characters, they could cause the appearance of other characters to be
> garbled.

I don't know why they were *originally set aside, but certainly Latin-N
and Unicode don't treat them as equivalent to the 0..31 control
characters.  For example, U+0006 is ACKNOWLEDGE or ACK, and U+0086 is
START OF SELECTED AREA.  And Windows-1252 has printable characters in
(most of) the range 128..160; as far as I know that hasn't caused any
problems other than incompatibility with non-Windows character sets.
(Windows-1252 apparently was originally intended to be an ANSI standard,
but ISO 8859 went in a different diretion for some reason.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]

#78288

From	supercat@casperkitty.com
Date	2015-12-09 12:35 -0800
Message-ID	<533391a5-5a24-4a57-b756-d0a4c46a8396@googlegroups.com>
In reply to	#78280

On Wednesday, December 9, 2015 at 2:04:59 PM UTC-6, Keith Thompson wrote:
> Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a
> control character used in interactive input.  It's commonly denotes
> deleting a character, but only because of the mnemonic name, not because
> it has 7 bits set to 1.  And 255 is LATIN SMALL LETTER Y WITH DIAERESIS.
> The history of the all-rows-punched semantics is interesting, but it
> doesn't directly affect modern usage.

I'm not sure why "rub-out" was changed to "delete", but the purpose of the
character code was to act as an all-bits-set NOP.  Later on, someone who
wanted a key to delete a character from the middle of some text saw that
there was a key marked "Delete" and decided to use it for that.  Likewise
someone wanted a key to leave certain interactive modes and thought
"Escape" seemed like a good choice even though the purpose was not to
allow a user to escape from a certain mode, but rather to escape the
meaning of succeeding characters.  The fact that on many terminals the
only difference between cursor up and (IIRC) escape-leftbracket-A is
timing means that when using "vi" with such a terminal over TCP/IP, a
communications hiccup can cause a cursor key that's typed within insert
mode to be mistaken for an attempt to leave edit mode, then use "[" and
"A" commands.

> I don't know why they were *originally set aside, but certainly Latin-N
> and Unicode don't treat them as equivalent to the 0..31 control
> characters.  For example, U+0006 is ACKNOWLEDGE or ACK, and U+0086 is
> START OF SELECTED AREA.  And Windows-1252 has printable characters in
> (most of) the range 128..160; as far as I know that hasn't caused any
> problems other than incompatibility with non-Windows character sets.
> (Windows-1252 apparently was originally intended to be an ANSI standard,
> but ISO 8859 went in a different diretion for some reason.)

Printable characters were chosen to avoid "shadowing" control codes received
with the parity bit set; then new control codes were chosen to avoid
conflicts with new (or old) codes.

[toc] | [prev] | [next] | [standalone]

#78313

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2015-12-09 23:46 +0000
Message-ID	<n4aeh7$54m$1@speranza.aioe.org>
In reply to	#78288

supercat@casperkitty.com wrote:
> On Wednesday, December 9, 2015 at 2:04:59 PM UTC-6, Keith Thompson wrote:
>> Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a
>> control character used in interactive input.  It's commonly denotes
>> deleting a character, but only because of the mnemonic name, not because
>> it has 7 bits set to 1.  And 255 is LATIN SMALL LETTER Y WITH DIAERESIS.
>> The history of the all-rows-punched semantics is interesting, but it
>> doesn't directly affect modern usage.

> I'm not sure why "rub-out" was changed to "delete", but the purpose of the
> character code was to act as an all-bits-set NOP.  

As previously noted, convenient for erasing characters on paper tape.

I suspect this goes back to ASR-33 days, when people would punch
messages (not yet programs) onto tape for later transmission.

> Later on, someone who
> wanted a key to delete a character from the middle of some text saw that
> there was a key marked "Delete" and decided to use it for that.  

As far as I know, it is DEC's fault. Again it seems likely ASR33
related. The ASR33 can't backspace, and doens't have a backspace key.

DEC systems I used to use, would consider the previous character
erased when DEL was types. They would print the deleted characters
between slashes, so that you would know which ones they were.

With DECwriters, printing terminals that could backspace, they 
still used the slash system, as otherwise you write over the
previous printed characters.

Unix lets you set the erase character, with 0x08 and 0x7f being
the two popular choices, with about equal probability.

> Likewise
> someone wanted a key to leave certain interactive modes and thought
> "Escape" seemed like a good choice even though the purpose was not to
> allow a user to escape from a certain mode, but rather to escape the
> meaning of succeeding characters.  

Well, it does escape the meaning of the following characters...

> The fact that on many terminals the
> only difference between cursor up and (IIRC) escape-leftbracket-A is
> timing means that when using "vi" with such a terminal over TCP/IP, a
> communications hiccup can cause a cursor key that's typed within insert
> mode to be mistaken for an attempt to leave edit mode, then use "[" and
> "A" commands.

This is a problem, but the cursor movement keys are supposed to
be h, j, k, and l, not the arrows.

(snip)

> Printable characters were chosen to avoid "shadowing" control codes received
> with the parity bit set; then new control codes were chosen to avoid
> conflicts with new (or old) codes.

-- glen

[toc] | [prev] | [next] | [standalone]

#78317

From	supercat@casperkitty.com
Date	2015-12-09 16:15 -0800
Message-ID	<aa5b9efe-833c-4be4-8bcf-a39f62dd1b4f@googlegroups.com>
In reply to	#78313

On Wednesday, December 9, 2015 at 5:47:02 PM UTC-6, glen herrmannsfeldt wrote:
> supercat wrote:
> > I'm not sure why "rub-out" was changed to "delete", but the purpose of the
> > character code was to act as an all-bits-set NOP.  
> 
> As previously noted, convenient for erasing characters on paper tape.

That was the purpose of the "RUB OUT" key.  Saying "delete" would seem to
imply that it could cause the preceding and following characters on the tape
to become adjacent to each other.

> I suspect this goes back to ASR-33 days, when people would punch
> messages (not yet programs) onto tape for later transmission.

I've used an ASR-33.  The key to generate all-bits-set is labeled "RUB OUT"
(two lines).

> > Later on, someone who
> > wanted a key to delete a character from the middle of some text saw that
> > there was a key marked "Delete" and decided to use it for that.  
> 
> As far as I know, it is DEC's fault. Again it seems likely ASR33
> related. The ASR33 can't backspace, and doens't have a backspace key.

Interestingly, Altair BASIC uses an underscore character to behave in the
fashion usually associated with backspace; underscore has the most bits set
of any character other than rub-out which the ASR-33 is capable of
generating.

> DEC systems I used to use, would consider the previous character
> erased when DEL was types. They would print the deleted characters
> between slashes, so that you would know which ones they were.
> 
> With DECwriters, printing terminals that could backspace, they 
> still used the slash system, as otherwise you write over the
> previous printed characters.

HP-basic used backspace, but would echo an underscore.  Deleting one
character would thus mean that the character would get replaced by an
underscore, but hitting backspace more than once would simply cause the
last character typed to get underlined multiple times.

> > The fact that on many terminals the
> > only difference between cursor up and (IIRC) escape-leftbracket-A is
> > timing means that when using "vi" with such a terminal over TCP/IP, a
> > communications hiccup can cause a cursor key that's typed within insert
> > mode to be mistaken for an attempt to leave edit mode, then use "[" and
> > "A" commands.
> 
> This is a problem, but the cursor movement keys are supposed to
> be h, j, k, and l, not the arrows.

The vi implementations I've used try to support the arrows as well, though
I don't use them because they're not reliable.  The problem is fundamentally
that use of keys which send escape sequences interferes with use of the
escape key as a key.

[toc] | [prev] | [next] | [standalone]

#78321

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2015-12-10 03:49 +0000
Message-ID	<n4asp6$641$1@speranza.aioe.org>
In reply to	#78317

supercat@casperkitty.com wrote:

(snip on DELete, RUB OUT, and similar ASCII characters)
(then I wrote)
>> As far as I know, it is DEC's fault. Again it seems likely ASR33
>> related. The ASR33 can't backspace, and doens't have a backspace key.
 
> Interestingly, Altair BASIC uses an underscore character to behave in the
> fashion usually associated with backspace; underscore has the most bits set
> of any character other than rub-out which the ASR-33 is capable of
> generating.

There was an actual change to ASCII somewhere along the way.

The ASR33 ASCII has a backwards pointing arrow where the underscore
is now, and an upward pointing arrow where carat is now.

The carat sort of looks like an arrow without a stem.

Underscore sort of looks like the stem of a left pointing arrow.
 
>> DEC systems I used to use, would consider the previous character
>> erased when DEL was types. They would print the deleted characters
>> between slashes, so that you would know which ones they were.
 
>> With DECwriters, printing terminals that could backspace, they 
>> still used the slash system, as otherwise you write over the
>> previous printed characters.
 
> HP-basic used backspace, but would echo an underscore.  Deleting one
> character would thus mean that the character would get replaced by an
> underscore, but hitting backspace more than once would simply cause the
> last character typed to get underlined multiple times.

I remember HP Basic, but not that one.  As I remember, it uses the
ASR33 back arrow for backspace, which echoes itself. 
 
(snip, I wrote)

>> This is a problem, but the cursor movement keys are supposed to
>> be h, j, k, and l, not the arrows.
 
> The vi implementations I've used try to support the arrows as well, though
> I don't use them because they're not reliable.  The problem is fundamentally
> that use of keys which send escape sequences interferes with use of the
> escape key as a key.

I mostly got used to vi that didn't have those. It might be that I
use them when not in insert mode, but not in insert mode. 

-- glen

[toc] | [prev] | [next] | [standalone]

#78316

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-09 18:12 -0600
Message-ID	<n4afsg$val$1@dont-email.me>
In reply to	#78280

On 09-Dec-15 14:04, Keith Thompson wrote:
> supercat@casperkitty.com writes:
>> As for codes 0x80-0x9F, those were set aside I think because some
>> terminals regard 0x80-0xFF as synonymous with 0x00-0x7F on
>> reception, which meant that if a terminal was being used for
>> display-only purposes there was no need to worry about parity
>> settings.  If one sent a document with 8-bit chracter data to a
>> terminal configured for 7 bits ignore parity, characters beyond 
>> 0xA0 would show up as alternative characters, but everything else
>> would appear as it should.  If the document used characters
>> 0x80-0x9F as printable characters, they could cause the appearance
>> of other characters to be garbled.
> 
> I don't know why they were *originally set aside,

ISO 646 (aka ASCII) divided the 7-bit coding space into:
00-1F control characters
20-7F graphical characters

ISO 2022 extended this to 8 bits by repeating the same division
00-1F control characters (C0)
20-7F graphical characters (GL)
80-9F control characters (C1)
A0-FF graphical characters (GR)

There are also four working sets, named G0 to G3, which can each have
multiple planes of 94^N or (except for G0) 96^N characters.  By default,
GL is mapped to G0 and GR is mapped to G1, but C0 or C1 codes can be
used to remap either to a different set or to select between planes
within a set.

Most of that complexity was only actually used for three encodings:
ISO-2022-CN, ISO-2022-JP and ISO-2022-KR, i.e. the CJK family.

ISO 8859-X built on ISO 2022 and defined G0 as ASCII and G1 as one
96-entry plane of something else; they do not define G2 or G3.  There
was no need for the C1 codes, yet there was also no way to assign them
to graphical characters within the ISO 2022 framework.

> And Windows-1252 has printable characters in (most of) the range
> 128..160; as far as I know that hasn't caused any problems other
> than incompatibility with non-Windows character sets.

HTML5 goes so far as to specify that pages labeled "iso-8859-1" should
instead be interpreted as "windows-1252" due to the prevalence of that
specific issue.  Since the former will never (in practice) have any C1
codes present, this is mostly harmless.

> (Windows-1252 apparently was originally intended to be an ANSI
> standard, but ISO 8859 went in a different diretion for some reason.)

Windows-1252 was initially identical to ISO 8859-1; it wasn't until
nearly a decade later that Microsoft reassigned the C1 codes to
additional graphical characters, breaking compatibility.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78265

From	James Kuyper <jameskuyper@verizon.net>
Date	2015-12-09 13:12 -0500
Message-ID	<56686F2A.6050201@verizon.net>
In reply to	#78262

On 12/09/2015 12:35 PM, Keith Thompson wrote:
...
> The various ISO-8859-N character sets, as well as Unicode, add
> another 32 control characters from 128 to 159 (character 160,
> "NO-BREAK SPACE", isn't considered a control character, but it's
> "controlesque").
> 
> The control characters from 128 to 159 are very rarely used as far
> as I know, and I agree that it would probably have made more sense
> to use that range for printable characters.  On the other hand,
> I don't know the issues that led to those control characters being
> added; there must have been *some* valid reason for it.

That's kinder than necessary: there must have been some reason for it
that the designers thought was valid, but they weren't necessarily
correct in that belief.

[toc] | [prev] | [next] | [standalone]

#78282

From	Keith Thompson <kst-u@mib.org>
Date	2015-12-09 12:12 -0800
Message-ID	<lnr3ivl6t9.fsf@kst-u.example.com>
In reply to	#78265

James Kuyper <jameskuyper@verizon.net> writes:
> On 12/09/2015 12:35 PM, Keith Thompson wrote:
> ...
>> The various ISO-8859-N character sets, as well as Unicode, add
>> another 32 control characters from 128 to 159 (character 160,
>> "NO-BREAK SPACE", isn't considered a control character, but it's
>> "controlesque").
>> 
>> The control characters from 128 to 159 are very rarely used as far
>> as I know, and I agree that it would probably have made more sense
>> to use that range for printable characters.  On the other hand,
>> I don't know the issues that led to those control characters being
>> added; there must have been *some* valid reason for it.
>
> That's kinder than necessary: there must have been some reason for it
> that the designers thought was valid, but they weren't necessarily
> correct in that belief.

The 0..31 and 128..159 sets of control characters are referred to as C0
and C1, respectively.  The C1 control codes are more commonly
represented via escape sequences, such as ESC _ for 0x9F.

https://en.wikipedia.org/wiki/C0_and_C1_control_codes

It would IMHO have been better to use that range for printable
characters, but it's too late to change it now.  Any characters that
could be put in that range already have other representations.  For
example, Windows-1252 has the trademark sign as code 0x99; Unicode uses
0x2122 for the same character.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]

#78370

From	raltbos@xs4all.nl (Richard Bos)
Date	2015-12-10 20:48 +0000
Message-ID	<5669e503.30067453@news.xs4all.nl>
In reply to	#78262

Keith Thompson <kst-u@mib.org> wrote:

> But the ASCII control characters 0..31 and 127 are *very* useful
> and necessary.  Neither vi nor emacs would work without them.

All the more reason for getting rid of them!

*mutter*bleepingmonsters*grumble*

Richard

[toc] | [prev] | [next] | [standalone]

#78312

From	BartC <bc@freeuk.com>
Date	2015-12-09 23:44 +0000
Message-ID	<n4ae93$q4s$1@dont-email.me>
In reply to	#78240

On 09/12/2015 14:36, Osmium wrote:
> "Ben Bacarisse" wrote:
>
>> Has it?  I don't see anyone objecting to my use of the word, and I'd be
>> happy to retract it if they did as I'm not a fan of arguing over vague
>> quantities.  The disagreement was over a statement about what "the rest
>> of Western Europe" uses.  It seems likely that the statement was just a
>> poor choice of words which Malcolm felt obliged to defend.  Had he said:
>>
>>  "For simple English text you need ascii. The other Western European
>>  languages need extended Latin, and annoyingly those characters won't
>>  all quite fit into 8 bits."
>>
>> I don't think there would be much to argue over.
>
> I haven't been watching this thread but that brings up a pet peeve of
> mine. If you didn't throw away about 32x2 = 64 characters for control
> characters, most of which are unused in the real world, I suspect
> European's could live very comfortably with the result.  ASCII was
> forced down our throats by AT&T and their Teletype® division. I would
> estimate at least 57 characters could be reclaimed.

I also found it odd too that out of 256 1-byte codes, a very valuable 
commodity which means most western European text could be expressed as 
strings of 8-bit characters, 1/4 of them would be wasted on control codes.

Why not just waste just one code as an escape for introducing control codes?

Most of the first 32 could be reclaimed too: the only ones I regularly 
see in use are nul, tab, cr, lf, and sometimes ff and etx.

Then there are controls such as bs and del, but they don't belong in a 
txt storage format. So would be have been using 250 characters instead 
of 192.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#78325

From	Robert Wessel <robertwessel2@yahoo.com>
Date	2015-12-10 01:13 -0600
Message-ID	<cp8i6btma8kuurqhpf44mtbsj73uo0lr3b@4ax.com>
In reply to	#78312

On Wed, 9 Dec 2015 23:44:36 +0000, BartC <bc@freeuk.com> wrote:

>On 09/12/2015 14:36, Osmium wrote:
>> "Ben Bacarisse" wrote:
>>
>>> Has it?  I don't see anyone objecting to my use of the word, and I'd be
>>> happy to retract it if they did as I'm not a fan of arguing over vague
>>> quantities.  The disagreement was over a statement about what "the rest
>>> of Western Europe" uses.  It seems likely that the statement was just a
>>> poor choice of words which Malcolm felt obliged to defend.  Had he said:
>>>
>>>  "For simple English text you need ascii. The other Western European
>>>  languages need extended Latin, and annoyingly those characters won't
>>>  all quite fit into 8 bits."
>>>
>>> I don't think there would be much to argue over.
>>
>> I haven't been watching this thread but that brings up a pet peeve of
>> mine. If you didn't throw away about 32x2 = 64 characters for control
>> characters, most of which are unused in the real world, I suspect
>> European's could live very comfortably with the result.  ASCII was
>> forced down our throats by AT&T and their Teletype® division. I would
>> estimate at least 57 characters could be reclaimed.
>
>I also found it odd too that out of 256 1-byte codes, a very valuable 
>commodity which means most western European text could be expressed as 
>strings of 8-bit characters, 1/4 of them would be wasted on control codes.
>
>Why not just waste just one code as an escape for introducing control codes?

In the days of non-transparent links, there were often *many* control
characters on the link.  A typical bisync frame* would have 4-6
control characters marking the start and end of each frame (and more
are possible).  Making those multi-byte sequences would make them
harder to detect** and considerably increase overhead.  Several other
control characters are needed for normal (and frequent) control
messages (although there are *in*frequent control messages that could
have been expanded with little difficulty)  And that's just for the
*link*, ignoring any use of control characters as part of the payload.

Out-of-band signaling for much of that was a big improvement, but it
was basically *not* an option when these character sets were defined.

*Admitting that the (modern) term 'frame' is slightly abused when
applied to bisync.

**And given that SYN had special properties in terms of detectability,
it would have required major surgery to support a two-byte form.  Even
now, XON/XOFF is supported as flow control on many async links -
multi-byte forms of that would be problematic.

[toc] | [prev] | [next] | [standalone]

#78330

From	BartC <bc@freeuk.com>
Date	2015-12-10 10:39 +0000
Message-ID	<n4bkk2$5ap$1@dont-email.me>
In reply to	#78325

On 10/12/2015 07:13, Robert Wessel wrote:
> On Wed, 9 Dec 2015 23:44:36 +0000, BartC <bc@freeuk.com> wrote:

>> I also found it odd too that out of 256 1-byte codes, a very valuable
>> commodity which means most western European text could be expressed as
>> strings of 8-bit characters, 1/4 of them would be wasted on control codes.
>>
>> Why not just waste just one code as an escape for introducing control codes?
>
> In the days of non-transparent links, there were often *many* control
> characters on the link.  A typical bisync frame* would have 4-6
> control characters marking the start and end of each frame (and more
> are possible).  Making those multi-byte sequences would make them
> harder to detect** and considerably increase overhead.  Several other
> control characters are needed for normal (and frequent) control
> messages (although there are *in*frequent control messages that could
> have been expanded with little difficulty)  And that's just for the
> *link*, ignoring any use of control characters as part of the payload.
>
> Out-of-band signaling for much of that was a big improvement, but it
> was basically *not* an option when these character sets were defined.

How about now? Are those transmission-related codes still relevant?

(I would separate out bit and byte sequences for framing data, from 
actual content.)

Actually, MS code page 437 seems to have ignored all the control codes 
by making all 256 codes represent some visible character (although codes 
0 and 255 are spaces).

I'm not sure how that works, but presumably if you write the sequence 
"ABC\nDEF" to a pixel display (not one that emulates a terminal) it 
would show "ABC♪DEF" (with a musical note symbol in the middle). But \n 
is still needed inside TXT files.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#78332

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-10 03:33 -0800
Message-ID	<2a1cd182-a854-4c66-9043-2ba94cf16997@googlegroups.com>
In reply to	#78330

On Thursday, December 10, 2015 at 10:39:27 AM UTC, Bart wrote:
>
> Actually, MS code page 437 seems to have ignored all the control codes 
> by making all 256 codes represent some visible character (although codes 
> 0 and 255 are spaces).
> 
> I'm not sure how that works, but presumably if you write the sequence 
> "ABC\nDEF" to a pixel display (not one that emulates a terminal) it 
> would show "ABC♪DEF" (with a musical note symbol in the middle). But \n 
> is still needed inside TXT files.
> 
There's a terminal, and a character-mapped raster display.

The character mapped raster display usually has one byte for character data per
cell. So control codes need to be mapped to something - a space, an error,
or an extra character.
The terminal is build on top of the character mapped raster display. It accepts a 
stream of data, and interprets the control codes, often scrolling the raster
at newline. Since the control codes are single bytes, if you want it to use
the code as a character identifier, you have to escape it somehow.

[toc] | [prev] | [next] | [standalone]

#78339

From	supercat@casperkitty.com
Date	2015-12-10 06:07 -0800
Message-ID	<cc1bf231-4822-4a72-8078-fd2538b95cb1@googlegroups.com>
In reply to	#78330

On Thursday, December 10, 2015 at 4:39:27 AM UTC-6, Bart wrote:
> Actually, MS code page 437 seems to have ignored all the control codes 
> by making all 256 codes represent some visible character (although codes 
> 0 and 255 are spaces).

The Color Display Adapter includes 16Kbytes of RAM which are accessible to the
processor at address 0xB8000-0xBBFFF, and an 8Kbyte ROM which is not.  When
configured for text mode, the first 4000 bytes of RAM are used.  On the first
eight lines of the display (from a hardware perspective), the first 160 bytes
of RAM are fetched consecutively (the same 160 bytes each time).  Bytes are
grouped into pairs, and after fetching the first pair of each byte from RAM,
the board forms a ROM address by taking a 3-bit scan line counter, the eight
bits of character data, and a couple of hard-coded bits.  The byte of data
fetched from ROM is then fed through a parallel-to-serial shift register
which determines, for each pixel, whether it should be displayed using the
foreground color or the background color.

After the eighth scan line, the memory-address pointer is allowed to advance
so the next eight scan lines use the next 160 bytes of RAM.  This then
happens 23 more times until a total of 25 lines have been displayed.  At
that point, a blanking circuit kicks in for the remainder of the frame.
After about 220 scan lines total have been output, the board outputs a 3
line sync pulse and then continues outputting the remainder of the frame
with the blanking still enabled.

The display hardware knows nothing about newlines, carriage returns, back-
spaces, or anything else.  It just fetches 160 bytes of RAM for each
character row and uses the fetch bytes as look-ups into a font-shape ROM.
All 256 values that could appear in RAM have eight bytes of ROM devoted to
holding their shape.

[toc] | [prev] | [next] | [standalone]

Page 6 of 8 — ← Prev page 1 2 3 4 5 [6] 7 8 Next page →

csiph-web

unicode is a fail

Contents

#78178

#78187

#78240

#78252

#78262

#78264

#78280

#78288

#78313

#78317

#78321

#78316

#78265

#78282

#78370

#78312

#78325

#78330

#78332

#78339