Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77629 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2015-12-02 08:01 -0800 |
| Last post | 2015-12-06 13:45 +0000 |
| Articles | 20 on this page of 158 — 25 participants |
Back to article view | Back to comp.lang.c
unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000
Page 6 of 8 — ← Prev page 1 2 3 4 5 [6] 7 8 Next page →
| From | Lowell Gilbert <lgusenet@be-well.ilk.org> |
|---|---|
| Date | 2015-12-08 11:40 -0500 |
| Message-ID | <44bna0lwqh.fsf@be-well.ilk.org> |
| In reply to | #78138 |
Stephen Sprunk <stephen@sprunk.org> writes: > On 07-Dec-15 08:31, Malcolm McLean wrote: >> Ben Bacarisse wrote: >>> You say "You've got to consider the users" but you are not >>> considering them. You are classifying texts by language, not be >>> what texts users want to read or write. Users in Western Europe >>> often want to use non-Latin scripts. >> >> Only Greek, and in the special case where the non-Latin script >> language or text is itself the subject of the material. > > Western Europeans haven't discovered emojis yet? They don't use > mathematical or scientific symbols? There are no translators, > immigrants or diplomats who know a non-Latin language? There are no > schools that teach non-Latin languages? This has now devolved into an argument over the word "often." I suggest it may be time to take a break; go see "Pirates of Penzance." We'll still be here when you get back. -- "... I am not an orphan. And what's more, I never was one!"
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2015-12-08 17:18 +0000 |
| Message-ID | <87a8pk7tb8.fsf@bsb.me.uk> |
| In reply to | #78178 |
Lowell Gilbert <lgusenet@be-well.ilk.org> writes: > Stephen Sprunk <stephen@sprunk.org> writes: > >> On 07-Dec-15 08:31, Malcolm McLean wrote: >>> Ben Bacarisse wrote: >>>> You say "You've got to consider the users" but you are not >>>> considering them. You are classifying texts by language, not be >>>> what texts users want to read or write. Users in Western Europe >>>> often want to use non-Latin scripts. >>> >>> Only Greek, and in the special case where the non-Latin script >>> language or text is itself the subject of the material. >> >> Western Europeans haven't discovered emojis yet? They don't use >> mathematical or scientific symbols? There are no translators, >> immigrants or diplomats who know a non-Latin language? There are no >> schools that teach non-Latin languages? > > This has now devolved into an argument over the word "often." Has it? I don't see anyone objecting to my use of the word, and I'd be happy to retract it if they did as I'm not a fan of arguing over vague quantities. The disagreement was over a statement about what "the rest of Western Europe" uses. It seems likely that the statement was just a poor choice of words which Malcolm felt obliged to defend. Had he said: "For simple English text you need ascii. The other Western European languages need extended Latin, and annoyingly those characters won't all quite fit into 8 bits." I don't think there would be much to argue over. <snip> -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | "Osmium" <r124c4u102@comcast.net> |
|---|---|
| Date | 2015-12-09 08:36 -0600 |
| Message-ID | <dcqsk5FoqslU1@mid.individual.net> |
| In reply to | #78187 |
"Ben Bacarisse" wrote: > Has it? I don't see anyone objecting to my use of the word, and I'd be > happy to retract it if they did as I'm not a fan of arguing over vague > quantities. The disagreement was over a statement about what "the rest > of Western Europe" uses. It seems likely that the statement was just a > poor choice of words which Malcolm felt obliged to defend. Had he said: > > "For simple English text you need ascii. The other Western European > languages need extended Latin, and annoyingly those characters won't > all quite fit into 8 bits." > > I don't think there would be much to argue over. I haven't been watching this thread but that brings up a pet peeve of mine. If you didn't throw away about 32x2 = 64 characters for control characters, most of which are unused in the real world, I suspect European's could live very comfortably with the result. ASCII was forced down our throats by AT&T and their Teletype® division. I would estimate at least 57 characters could be reclaimed.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-09 10:06 -0600 |
| Message-ID | <n49jdn$5g4$1@dont-email.me> |
| In reply to | #78240 |
On 09-Dec-15 08:36, Osmium wrote: > "Ben Bacarisse" wrote: >> "For simple English text you need ascii. The other Western >> European languages need extended Latin, and annoyingly those >> characters won't all quite fit into 8 bits." >> >> I don't think there would be much to argue over. > > I haven't been watching this thread but that brings up a pet peeve > of mine. If you didn't throw away about 32x2 = 64 characters for > control characters, most of which are unused in the real world, I > suspect European's could live very comfortably with the result. > ASCII was forced down our throats by AT&T and their Teletype® > division. I would estimate at least 57 characters could be > reclaimed. The only difference between ISO-8859-1 and Windows-1252 is that the latter replaces the C1 control codes with useful characters. It still isn't quite enough to cover all Western European languages, much less all Latin-based scripts due to all the additional diacritics needed in Central/Eastern Europe. I haven't counted characters to see if replacing most of the C0 control codes would do it, but I highly doubt it. OTOH, if you accept combining rather than precomposed characters, it becomes trivial to fit them all in; however, that brings in a host of other "Unicode" problems that folks have been complaining about. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-09 09:35 -0800 |
| Message-ID | <ln8u53msog.fsf@kst-u.example.com> |
| In reply to | #78240 |
"Osmium" <r124c4u102@comcast.net> writes:
> "Ben Bacarisse" wrote:
>> Has it? I don't see anyone objecting to my use of the word, and I'd be
>> happy to retract it if they did as I'm not a fan of arguing over vague
>> quantities. The disagreement was over a statement about what "the rest
>> of Western Europe" uses. It seems likely that the statement was just a
>> poor choice of words which Malcolm felt obliged to defend. Had he said:
>>
>> "For simple English text you need ascii. The other Western European
>> languages need extended Latin, and annoyingly those characters won't
>> all quite fit into 8 bits."
>>
>> I don't think there would be much to argue over.
>
> I haven't been watching this thread but that brings up a pet peeve of mine.
> If you didn't throw away about 32x2 = 64 characters for control characters,
> most of which are unused in the real world, I suspect European's could live
> very comfortably with the result. ASCII was forced down our throats by
> AT&T and their Teletype® division. I would estimate at least 57 characters
> could be reclaimed.
ASCII is a 7-bit character code. It has only 33 control characters
(0..31 and 127, DEL).
The various ISO-8859-N character sets, as well as Unicode, add
another 32 control characters from 128 to 159 (character 160,
"NO-BREAK SPACE", isn't considered a control character, but it's
"controlesque").
The control characters from 128 to 159 are very rarely used as far
as I know, and I agree that it would probably have made more sense
to use that range for printable characters. On the other hand,
I don't know the issues that led to those control characters being
added; there must have been *some* valid reason for it.
But the ASCII control characters 0..31 and 127 are *very* useful
and necessary. Neither vi nor emacs would work without them.
There might be alternatives for accepting similar keystrokes without
mapping them to 7-bit character codes, but I can't think of a scheme
that would work well when running over something that works like
a serial connection (like the one I'm using now to write this).
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2015-12-09 10:07 -0800 |
| Message-ID | <448ac191-4d48-4160-bc44-e8ff696ca284@googlegroups.com> |
| In reply to | #78262 |
On Wednesday, December 9, 2015 at 11:35:32 AM UTC-6, Keith Thompson wrote: > But the ASCII control characters 0..31 and 127 are *very* useful > and necessary. Neither vi nor emacs would work without them. Codes 127/255 are an interesting case. The purpose of 127/255 was not to perform an action, but rather to be a nop alternative to 0. A blank row of punch-tape reads as zero; an all-holes-punched row reads as FF. If the operator of an ASR-33 was typing a story and made a mistake, the procedure for making a correction was to push the back-one-row button on the punch (which mechanically moved the paper back one row without sending any sort of code) and then punch the "rub-out" button which sent code 127/255. The existence of the rub-out character on the tape would increase transmission time by a tenth of a second, but not have any other adverse consequences. As for codes 0x80-0x9F, those were set aside I think because some terminals regard 0x80-0xFF as synonymous with 0x00-0x7F on reception, which meant that if a terminal was being used for display-only purposes there was no need to worry about parity settings. If one sent a document with 8-bit chracter data to a terminal configured for 7 bits ignore parity, characters beyond 0xA0 would show up as alternative characters, but everything else would appear as it should. If the document used characters 0x80-0x9F as printable characters, they could cause the appearance of other characters to be garbled.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-09 12:04 -0800 |
| Message-ID | <lnvb87l76m.fsf@kst-u.example.com> |
| In reply to | #78264 |
supercat@casperkitty.com writes:
> On Wednesday, December 9, 2015 at 11:35:32 AM UTC-6, Keith Thompson wrote:
>> But the ASCII control characters 0..31 and 127 are *very* useful
>> and necessary. Neither vi nor emacs would work without them.
>
> Codes 127/255 are an interesting case. The purpose of 127/255 was not to
> perform an action, but rather to be a nop alternative to 0. A blank row
> of punch-tape reads as zero; an all-holes-punched row reads as FF. If the
> operator of an ASR-33 was typing a story and made a mistake, the procedure
> for making a correction was to push the back-one-row button on the punch
> (which mechanically moved the paper back one row without sending any sort
> of code) and then punch the "rub-out" button which sent code 127/255. The
> existence of the rub-out character on the tape would increase transmission
> time by a tenth of a second, but not have any other adverse consequences.
Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a
control character used in interactive input. It's commonly denotes
deleting a character, but only because of the mnemonic name, not because
it has 7 bits set to 1. And 255 is LATIN SMALL LETTER Y WITH DIAERESIS.
The history of the all-rows-punched semantics is interesting, but it
doesn't directly affect modern usage.
> As for codes 0x80-0x9F, those were set aside I think because some terminals
> regard 0x80-0xFF as synonymous with 0x00-0x7F on reception, which meant that
> if a terminal was being used for display-only purposes there was no need to
> worry about parity settings. If one sent a document with 8-bit chracter
> data to a terminal configured for 7 bits ignore parity, characters beyond
> 0xA0 would show up as alternative characters, but everything else would
> appear as it should. If the document used characters 0x80-0x9F as printable
> characters, they could cause the appearance of other characters to be
> garbled.
I don't know why they were *originally set aside, but certainly Latin-N
and Unicode don't treat them as equivalent to the 0..31 control
characters. For example, U+0006 is ACKNOWLEDGE or ACK, and U+0086 is
START OF SELECTED AREA. And Windows-1252 has printable characters in
(most of) the range 128..160; as far as I know that hasn't caused any
problems other than incompatibility with non-Windows character sets.
(Windows-1252 apparently was originally intended to be an ANSI standard,
but ISO 8859 went in a different diretion for some reason.)
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2015-12-09 12:35 -0800 |
| Message-ID | <533391a5-5a24-4a57-b756-d0a4c46a8396@googlegroups.com> |
| In reply to | #78280 |
On Wednesday, December 9, 2015 at 2:04:59 PM UTC-6, Keith Thompson wrote: > Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a > control character used in interactive input. It's commonly denotes > deleting a character, but only because of the mnemonic name, not because > it has 7 bits set to 1. And 255 is LATIN SMALL LETTER Y WITH DIAERESIS. > The history of the all-rows-punched semantics is interesting, but it > doesn't directly affect modern usage. I'm not sure why "rub-out" was changed to "delete", but the purpose of the character code was to act as an all-bits-set NOP. Later on, someone who wanted a key to delete a character from the middle of some text saw that there was a key marked "Delete" and decided to use it for that. Likewise someone wanted a key to leave certain interactive modes and thought "Escape" seemed like a good choice even though the purpose was not to allow a user to escape from a certain mode, but rather to escape the meaning of succeeding characters. The fact that on many terminals the only difference between cursor up and (IIRC) escape-leftbracket-A is timing means that when using "vi" with such a terminal over TCP/IP, a communications hiccup can cause a cursor key that's typed within insert mode to be mistaken for an attempt to leave edit mode, then use "[" and "A" commands. > I don't know why they were *originally set aside, but certainly Latin-N > and Unicode don't treat them as equivalent to the 0..31 control > characters. For example, U+0006 is ACKNOWLEDGE or ACK, and U+0086 is > START OF SELECTED AREA. And Windows-1252 has printable characters in > (most of) the range 128..160; as far as I know that hasn't caused any > problems other than incompatibility with non-Windows character sets. > (Windows-1252 apparently was originally intended to be an ANSI standard, > but ISO 8859 went in a different diretion for some reason.) Printable characters were chosen to avoid "shadowing" control codes received with the parity bit set; then new control codes were chosen to avoid conflicts with new (or old) codes.
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2015-12-09 23:46 +0000 |
| Message-ID | <n4aeh7$54m$1@speranza.aioe.org> |
| In reply to | #78288 |
supercat@casperkitty.com wrote: > On Wednesday, December 9, 2015 at 2:04:59 PM UTC-6, Keith Thompson wrote: >> Sure -- but code 127 (in ASCII, Latin-1, and Unicode) is DEL, which a >> control character used in interactive input. It's commonly denotes >> deleting a character, but only because of the mnemonic name, not because >> it has 7 bits set to 1. And 255 is LATIN SMALL LETTER Y WITH DIAERESIS. >> The history of the all-rows-punched semantics is interesting, but it >> doesn't directly affect modern usage. > I'm not sure why "rub-out" was changed to "delete", but the purpose of the > character code was to act as an all-bits-set NOP. As previously noted, convenient for erasing characters on paper tape. I suspect this goes back to ASR-33 days, when people would punch messages (not yet programs) onto tape for later transmission. > Later on, someone who > wanted a key to delete a character from the middle of some text saw that > there was a key marked "Delete" and decided to use it for that. As far as I know, it is DEC's fault. Again it seems likely ASR33 related. The ASR33 can't backspace, and doens't have a backspace key. DEC systems I used to use, would consider the previous character erased when DEL was types. They would print the deleted characters between slashes, so that you would know which ones they were. With DECwriters, printing terminals that could backspace, they still used the slash system, as otherwise you write over the previous printed characters. Unix lets you set the erase character, with 0x08 and 0x7f being the two popular choices, with about equal probability. > Likewise > someone wanted a key to leave certain interactive modes and thought > "Escape" seemed like a good choice even though the purpose was not to > allow a user to escape from a certain mode, but rather to escape the > meaning of succeeding characters. Well, it does escape the meaning of the following characters... > The fact that on many terminals the > only difference between cursor up and (IIRC) escape-leftbracket-A is > timing means that when using "vi" with such a terminal over TCP/IP, a > communications hiccup can cause a cursor key that's typed within insert > mode to be mistaken for an attempt to leave edit mode, then use "[" and > "A" commands. This is a problem, but the cursor movement keys are supposed to be h, j, k, and l, not the arrows. (snip) > Printable characters were chosen to avoid "shadowing" control codes received > with the parity bit set; then new control codes were chosen to avoid > conflicts with new (or old) codes. -- glen
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2015-12-09 16:15 -0800 |
| Message-ID | <aa5b9efe-833c-4be4-8bcf-a39f62dd1b4f@googlegroups.com> |
| In reply to | #78313 |
On Wednesday, December 9, 2015 at 5:47:02 PM UTC-6, glen herrmannsfeldt wrote: > supercat wrote: > > I'm not sure why "rub-out" was changed to "delete", but the purpose of the > > character code was to act as an all-bits-set NOP. > > As previously noted, convenient for erasing characters on paper tape. That was the purpose of the "RUB OUT" key. Saying "delete" would seem to imply that it could cause the preceding and following characters on the tape to become adjacent to each other. > I suspect this goes back to ASR-33 days, when people would punch > messages (not yet programs) onto tape for later transmission. I've used an ASR-33. The key to generate all-bits-set is labeled "RUB OUT" (two lines). > > Later on, someone who > > wanted a key to delete a character from the middle of some text saw that > > there was a key marked "Delete" and decided to use it for that. > > As far as I know, it is DEC's fault. Again it seems likely ASR33 > related. The ASR33 can't backspace, and doens't have a backspace key. Interestingly, Altair BASIC uses an underscore character to behave in the fashion usually associated with backspace; underscore has the most bits set of any character other than rub-out which the ASR-33 is capable of generating. > DEC systems I used to use, would consider the previous character > erased when DEL was types. They would print the deleted characters > between slashes, so that you would know which ones they were. > > With DECwriters, printing terminals that could backspace, they > still used the slash system, as otherwise you write over the > previous printed characters. HP-basic used backspace, but would echo an underscore. Deleting one character would thus mean that the character would get replaced by an underscore, but hitting backspace more than once would simply cause the last character typed to get underlined multiple times. > > The fact that on many terminals the > > only difference between cursor up and (IIRC) escape-leftbracket-A is > > timing means that when using "vi" with such a terminal over TCP/IP, a > > communications hiccup can cause a cursor key that's typed within insert > > mode to be mistaken for an attempt to leave edit mode, then use "[" and > > "A" commands. > > This is a problem, but the cursor movement keys are supposed to > be h, j, k, and l, not the arrows. The vi implementations I've used try to support the arrows as well, though I don't use them because they're not reliable. The problem is fundamentally that use of keys which send escape sequences interferes with use of the escape key as a key.
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2015-12-10 03:49 +0000 |
| Message-ID | <n4asp6$641$1@speranza.aioe.org> |
| In reply to | #78317 |
supercat@casperkitty.com wrote: (snip on DELete, RUB OUT, and similar ASCII characters) (then I wrote) >> As far as I know, it is DEC's fault. Again it seems likely ASR33 >> related. The ASR33 can't backspace, and doens't have a backspace key. > Interestingly, Altair BASIC uses an underscore character to behave in the > fashion usually associated with backspace; underscore has the most bits set > of any character other than rub-out which the ASR-33 is capable of > generating. There was an actual change to ASCII somewhere along the way. The ASR33 ASCII has a backwards pointing arrow where the underscore is now, and an upward pointing arrow where carat is now. The carat sort of looks like an arrow without a stem. Underscore sort of looks like the stem of a left pointing arrow. >> DEC systems I used to use, would consider the previous character >> erased when DEL was types. They would print the deleted characters >> between slashes, so that you would know which ones they were. >> With DECwriters, printing terminals that could backspace, they >> still used the slash system, as otherwise you write over the >> previous printed characters. > HP-basic used backspace, but would echo an underscore. Deleting one > character would thus mean that the character would get replaced by an > underscore, but hitting backspace more than once would simply cause the > last character typed to get underlined multiple times. I remember HP Basic, but not that one. As I remember, it uses the ASR33 back arrow for backspace, which echoes itself. (snip, I wrote) >> This is a problem, but the cursor movement keys are supposed to >> be h, j, k, and l, not the arrows. > The vi implementations I've used try to support the arrows as well, though > I don't use them because they're not reliable. The problem is fundamentally > that use of keys which send escape sequences interferes with use of the > escape key as a key. I mostly got used to vi that didn't have those. It might be that I use them when not in insert mode, but not in insert mode. -- glen
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-09 18:12 -0600 |
| Message-ID | <n4afsg$val$1@dont-email.me> |
| In reply to | #78280 |
On 09-Dec-15 14:04, Keith Thompson wrote: > supercat@casperkitty.com writes: >> As for codes 0x80-0x9F, those were set aside I think because some >> terminals regard 0x80-0xFF as synonymous with 0x00-0x7F on >> reception, which meant that if a terminal was being used for >> display-only purposes there was no need to worry about parity >> settings. If one sent a document with 8-bit chracter data to a >> terminal configured for 7 bits ignore parity, characters beyond >> 0xA0 would show up as alternative characters, but everything else >> would appear as it should. If the document used characters >> 0x80-0x9F as printable characters, they could cause the appearance >> of other characters to be garbled. > > I don't know why they were *originally set aside, ISO 646 (aka ASCII) divided the 7-bit coding space into: 00-1F control characters 20-7F graphical characters ISO 2022 extended this to 8 bits by repeating the same division 00-1F control characters (C0) 20-7F graphical characters (GL) 80-9F control characters (C1) A0-FF graphical characters (GR) There are also four working sets, named G0 to G3, which can each have multiple planes of 94^N or (except for G0) 96^N characters. By default, GL is mapped to G0 and GR is mapped to G1, but C0 or C1 codes can be used to remap either to a different set or to select between planes within a set. Most of that complexity was only actually used for three encodings: ISO-2022-CN, ISO-2022-JP and ISO-2022-KR, i.e. the CJK family. ISO 8859-X built on ISO 2022 and defined G0 as ASCII and G1 as one 96-entry plane of something else; they do not define G2 or G3. There was no need for the C1 codes, yet there was also no way to assign them to graphical characters within the ISO 2022 framework. > And Windows-1252 has printable characters in (most of) the range > 128..160; as far as I know that hasn't caused any problems other > than incompatibility with non-Windows character sets. HTML5 goes so far as to specify that pages labeled "iso-8859-1" should instead be interpreted as "windows-1252" due to the prevalence of that specific issue. Since the former will never (in practice) have any C1 codes present, this is mostly harmless. > (Windows-1252 apparently was originally intended to be an ANSI > standard, but ISO 8859 went in a different diretion for some reason.) Windows-1252 was initially identical to ISO 8859-1; it wasn't until nearly a decade later that Microsoft reassigned the C1 codes to additional graphical characters, breaking compatibility. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@verizon.net> |
|---|---|
| Date | 2015-12-09 13:12 -0500 |
| Message-ID | <56686F2A.6050201@verizon.net> |
| In reply to | #78262 |
On 12/09/2015 12:35 PM, Keith Thompson wrote: ... > The various ISO-8859-N character sets, as well as Unicode, add > another 32 control characters from 128 to 159 (character 160, > "NO-BREAK SPACE", isn't considered a control character, but it's > "controlesque"). > > The control characters from 128 to 159 are very rarely used as far > as I know, and I agree that it would probably have made more sense > to use that range for printable characters. On the other hand, > I don't know the issues that led to those control characters being > added; there must have been *some* valid reason for it. That's kinder than necessary: there must have been some reason for it that the designers thought was valid, but they weren't necessarily correct in that belief.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-09 12:12 -0800 |
| Message-ID | <lnr3ivl6t9.fsf@kst-u.example.com> |
| In reply to | #78265 |
James Kuyper <jameskuyper@verizon.net> writes:
> On 12/09/2015 12:35 PM, Keith Thompson wrote:
> ...
>> The various ISO-8859-N character sets, as well as Unicode, add
>> another 32 control characters from 128 to 159 (character 160,
>> "NO-BREAK SPACE", isn't considered a control character, but it's
>> "controlesque").
>>
>> The control characters from 128 to 159 are very rarely used as far
>> as I know, and I agree that it would probably have made more sense
>> to use that range for printable characters. On the other hand,
>> I don't know the issues that led to those control characters being
>> added; there must have been *some* valid reason for it.
>
> That's kinder than necessary: there must have been some reason for it
> that the designers thought was valid, but they weren't necessarily
> correct in that belief.
The 0..31 and 128..159 sets of control characters are referred to as C0
and C1, respectively. The C1 control codes are more commonly
represented via escape sequences, such as ESC _ for 0x9F.
https://en.wikipedia.org/wiki/C0_and_C1_control_codes
It would IMHO have been better to use that range for printable
characters, but it's too late to change it now. Any characters that
could be put in that range already have other representations. For
example, Windows-1252 has the trademark sign as code 0x99; Unicode uses
0x2122 for the same character.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | raltbos@xs4all.nl (Richard Bos) |
|---|---|
| Date | 2015-12-10 20:48 +0000 |
| Message-ID | <5669e503.30067453@news.xs4all.nl> |
| In reply to | #78262 |
Keith Thompson <kst-u@mib.org> wrote: > But the ASCII control characters 0..31 and 127 are *very* useful > and necessary. Neither vi nor emacs would work without them. All the more reason for getting rid of them! *mutter*bleepingmonsters*grumble* Richard
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-09 23:44 +0000 |
| Message-ID | <n4ae93$q4s$1@dont-email.me> |
| In reply to | #78240 |
On 09/12/2015 14:36, Osmium wrote: > "Ben Bacarisse" wrote: > >> Has it? I don't see anyone objecting to my use of the word, and I'd be >> happy to retract it if they did as I'm not a fan of arguing over vague >> quantities. The disagreement was over a statement about what "the rest >> of Western Europe" uses. It seems likely that the statement was just a >> poor choice of words which Malcolm felt obliged to defend. Had he said: >> >> "For simple English text you need ascii. The other Western European >> languages need extended Latin, and annoyingly those characters won't >> all quite fit into 8 bits." >> >> I don't think there would be much to argue over. > > I haven't been watching this thread but that brings up a pet peeve of > mine. If you didn't throw away about 32x2 = 64 characters for control > characters, most of which are unused in the real world, I suspect > European's could live very comfortably with the result. ASCII was > forced down our throats by AT&T and their Teletype® division. I would > estimate at least 57 characters could be reclaimed. I also found it odd too that out of 256 1-byte codes, a very valuable commodity which means most western European text could be expressed as strings of 8-bit characters, 1/4 of them would be wasted on control codes. Why not just waste just one code as an escape for introducing control codes? Most of the first 32 could be reclaimed too: the only ones I regularly see in use are nul, tab, cr, lf, and sometimes ff and etx. Then there are controls such as bs and del, but they don't belong in a txt storage format. So would be have been using 250 characters instead of 192. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Robert Wessel <robertwessel2@yahoo.com> |
|---|---|
| Date | 2015-12-10 01:13 -0600 |
| Message-ID | <cp8i6btma8kuurqhpf44mtbsj73uo0lr3b@4ax.com> |
| In reply to | #78312 |
On Wed, 9 Dec 2015 23:44:36 +0000, BartC <bc@freeuk.com> wrote: >On 09/12/2015 14:36, Osmium wrote: >> "Ben Bacarisse" wrote: >> >>> Has it? I don't see anyone objecting to my use of the word, and I'd be >>> happy to retract it if they did as I'm not a fan of arguing over vague >>> quantities. The disagreement was over a statement about what "the rest >>> of Western Europe" uses. It seems likely that the statement was just a >>> poor choice of words which Malcolm felt obliged to defend. Had he said: >>> >>> "For simple English text you need ascii. The other Western European >>> languages need extended Latin, and annoyingly those characters won't >>> all quite fit into 8 bits." >>> >>> I don't think there would be much to argue over. >> >> I haven't been watching this thread but that brings up a pet peeve of >> mine. If you didn't throw away about 32x2 = 64 characters for control >> characters, most of which are unused in the real world, I suspect >> European's could live very comfortably with the result. ASCII was >> forced down our throats by AT&T and their Teletype® division. I would >> estimate at least 57 characters could be reclaimed. > >I also found it odd too that out of 256 1-byte codes, a very valuable >commodity which means most western European text could be expressed as >strings of 8-bit characters, 1/4 of them would be wasted on control codes. > >Why not just waste just one code as an escape for introducing control codes? In the days of non-transparent links, there were often *many* control characters on the link. A typical bisync frame* would have 4-6 control characters marking the start and end of each frame (and more are possible). Making those multi-byte sequences would make them harder to detect** and considerably increase overhead. Several other control characters are needed for normal (and frequent) control messages (although there are *in*frequent control messages that could have been expanded with little difficulty) And that's just for the *link*, ignoring any use of control characters as part of the payload. Out-of-band signaling for much of that was a big improvement, but it was basically *not* an option when these character sets were defined. *Admitting that the (modern) term 'frame' is slightly abused when applied to bisync. **And given that SYN had special properties in terms of detectability, it would have required major surgery to support a two-byte form. Even now, XON/XOFF is supported as flow control on many async links - multi-byte forms of that would be problematic.
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-10 10:39 +0000 |
| Message-ID | <n4bkk2$5ap$1@dont-email.me> |
| In reply to | #78325 |
On 10/12/2015 07:13, Robert Wessel wrote: > On Wed, 9 Dec 2015 23:44:36 +0000, BartC <bc@freeuk.com> wrote: >> I also found it odd too that out of 256 1-byte codes, a very valuable >> commodity which means most western European text could be expressed as >> strings of 8-bit characters, 1/4 of them would be wasted on control codes. >> >> Why not just waste just one code as an escape for introducing control codes? > > In the days of non-transparent links, there were often *many* control > characters on the link. A typical bisync frame* would have 4-6 > control characters marking the start and end of each frame (and more > are possible). Making those multi-byte sequences would make them > harder to detect** and considerably increase overhead. Several other > control characters are needed for normal (and frequent) control > messages (although there are *in*frequent control messages that could > have been expanded with little difficulty) And that's just for the > *link*, ignoring any use of control characters as part of the payload. > > Out-of-band signaling for much of that was a big improvement, but it > was basically *not* an option when these character sets were defined. How about now? Are those transmission-related codes still relevant? (I would separate out bit and byte sequences for framing data, from actual content.) Actually, MS code page 437 seems to have ignored all the control codes by making all 256 codes represent some visible character (although codes 0 and 255 are spaces). I'm not sure how that works, but presumably if you write the sequence "ABC\nDEF" to a pixel display (not one that emulates a terminal) it would show "ABC♪DEF" (with a musical note symbol in the middle). But \n is still needed inside TXT files. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-10 03:33 -0800 |
| Message-ID | <2a1cd182-a854-4c66-9043-2ba94cf16997@googlegroups.com> |
| In reply to | #78330 |
On Thursday, December 10, 2015 at 10:39:27 AM UTC, Bart wrote: > > Actually, MS code page 437 seems to have ignored all the control codes > by making all 256 codes represent some visible character (although codes > 0 and 255 are spaces). > > I'm not sure how that works, but presumably if you write the sequence > "ABC\nDEF" to a pixel display (not one that emulates a terminal) it > would show "ABC♪DEF" (with a musical note symbol in the middle). But \n > is still needed inside TXT files. > There's a terminal, and a character-mapped raster display. The character mapped raster display usually has one byte for character data per cell. So control codes need to be mapped to something - a space, an error, or an extra character. The terminal is build on top of the character mapped raster display. It accepts a stream of data, and interprets the control codes, often scrolling the raster at newline. Since the control codes are single bytes, if you want it to use the code as a character identifier, you have to escape it somehow.
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2015-12-10 06:07 -0800 |
| Message-ID | <cc1bf231-4822-4a72-8078-fd2538b95cb1@googlegroups.com> |
| In reply to | #78330 |
On Thursday, December 10, 2015 at 4:39:27 AM UTC-6, Bart wrote: > Actually, MS code page 437 seems to have ignored all the control codes > by making all 256 codes represent some visible character (although codes > 0 and 255 are spaces). The Color Display Adapter includes 16Kbytes of RAM which are accessible to the processor at address 0xB8000-0xBBFFF, and an 8Kbyte ROM which is not. When configured for text mode, the first 4000 bytes of RAM are used. On the first eight lines of the display (from a hardware perspective), the first 160 bytes of RAM are fetched consecutively (the same 160 bytes each time). Bytes are grouped into pairs, and after fetching the first pair of each byte from RAM, the board forms a ROM address by taking a 3-bit scan line counter, the eight bits of character data, and a couple of hard-coded bits. The byte of data fetched from ROM is then fed through a parallel-to-serial shift register which determines, for each pixel, whether it should be displayed using the foreground color or the background color. After the eighth scan line, the memory-address pointer is allowed to advance so the next eight scan lines use the next 160 bytes of RAM. This then happens 23 more times until a total of 25 lines have been displayed. At that point, a blanking circuit kicks in for the remainder of the frame. After about 220 scan lines total have been output, the board outputs a 3 line sync pulse and then continues outputting the remainder of the frame with the blanking still enabled. The display hardware knows nothing about newlines, carriage returns, back- spaces, or anything else. It just fetches 160 bytes of RAM for each character row and uses the fetch bytes as look-ups into a font-shape ROM. All 256 values that could appear in RAM have eight bytes of ROM devoted to holding their shape.
[toc] | [prev] | [next] | [standalone]
Page 6 of 8 — ← Prev page 1 2 3 4 5 [6] 7 8 Next page →
Back to top | Article view | comp.lang.c
csiph-web