Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77629 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2015-12-02 08:01 -0800 |
| Last post | 2015-12-06 13:45 +0000 |
| Articles | 20 on this page of 158 — 25 participants |
Back to article view | Back to comp.lang.c
unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000
Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →
| From | "Osmium" <r124c4u102@comcast.net> |
|---|---|
| Date | 2015-12-10 08:21 -0600 |
| Message-ID | <dctg2eFeb57U1@mid.individual.net> |
| In reply to | #78330 |
"BartC" wrote: > Actually, MS code page 437 seems to have ignored all the control codes by > making all 256 codes represent some visible character (although codes 0 > and 255 are spaces). > > I'm not sure how that works, but presumably if you write the sequence > "ABC\nDEF" to a pixel display (not one that emulates a terminal) it would > show "ABC♪DEF" (with a musical note symbol in the middle). But \n is still > needed inside TXT files. Some clever guy decided to separate the message from the envelope that contained the message, that's how that works! The ASCII committee could have used this guy, if they had listened to him. As I said earlier, ASCII is based on a Teletype® centric universe. The end result is a mish-mash of transmission protocols *and* data. The Teletype is gone but the after effects linger on. ---------------------- I don't know how anyone could look at the huge list of code pages for DOS and EBCDIC at the end of the code page in the link and not be absolutely horrified at what a mess has been made of a relatively simply situation. I estimate about *200* code pages for DOS alone! https://en.wikipedia.org/wiki/Windows-1252
[toc] | [prev] | [next] | [standalone]
| From | Robert Wessel <robertwessel2@yahoo.com> |
|---|---|
| Date | 2015-12-10 00:59 -0600 |
| Message-ID | <2i8i6bhq3snbro74hup4l96spgqlqlk6ka@4ax.com> |
| In reply to | #78240 |
On Wed, 9 Dec 2015 08:36:54 -0600, "Osmium" <r124c4u102@comcast.net> wrote: >"Ben Bacarisse" wrote: > >> Has it? I don't see anyone objecting to my use of the word, and I'd be >> happy to retract it if they did as I'm not a fan of arguing over vague >> quantities. The disagreement was over a statement about what "the rest >> of Western Europe" uses. It seems likely that the statement was just a >> poor choice of words which Malcolm felt obliged to defend. Had he said: >> >> "For simple English text you need ascii. The other Western European >> languages need extended Latin, and annoyingly those characters won't >> all quite fit into 8 bits." >> >> I don't think there would be much to argue over. > >I haven't been watching this thread but that brings up a pet peeve of mine. >If you didn't throw away about 32x2 = 64 characters for control characters, >most of which are unused in the real world, I suspect European's could live >very comfortably with the result. ASCII was forced down our throats by >AT&T and their Teletype® division. I would estimate at least 57 characters >could be reclaimed. As others have pointed out, ASCII is a 7-bit code. OTOH, EBCDIC is an 8-bit code, and also dedicates 64 code points to control characters. Certainly today, the vast majority of those are pointless (at least as control characters intended to control a device).
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-07 14:33 +0000 |
| Message-ID | <n4458d$din$1@dont-email.me> |
| In reply to | #78028 |
On 07/12/2015 03:14, Malcolm McLean wrote: > On Monday, December 7, 2015 at 1:56:13 AM UTC, Bart wrote: >> On 07/12/2015 00:38, Martin Shobe wrote: >>> On 12/6/2015 7:09 AM, BartC wrote: >> >>>> I spent 5 minutes thinking about an alternative to Unicode, and 10 >>>> minutes writing up a first draft, and 10 more minutes for a second draft >>>> (I won't bore you with the details). >> >>>> In 32-bit form, the two schemes (Unicode, and mine), aren't that >>>> different in that each character is allocated a dedicated code-point. >>>> But in mine, the large alphabets are tidily partitioned out of the way. >>>> A similar concept to code-pages, but 32K characters each and that can >>>> co-exist in the same text. >> >>> Can you give a link to it? >> >> It was only a dozen or so lines of text! >> >> Anyway I thought about it for another ten or twenty minutes and I have a >> revised scheme (the previous one included non-character escape codes >> within a string which I didn't like). Here's version 3: >> > You've got to consider the users. > For simple English text you need ascii. The rest of Western Europe > uses extended Latin, and annoyingly it won't quite fit into 8 bits. Not even if they make use of obsolete control codes? Unicode seems to have even more than ASCII did. > Eastern Europe uses Greek characters. Complex English text includes > ascii, extended Latin, and Greek, and a few special symbols not > included in ascii. At that point, we start to have the issue of > what is markup and what is content. Is 1/2 the same content as > a half symbol? There shouldn't be any mark-up. (Although even ASCII suffers from that a little with tab characters and other codes intended to control layout. But that is well-understood.) The 1/2 can just be a special symbol like (C) and TM. It would be up to the text processing application to superimpose a user-friendly interface where a search for "1/2" might find "1/2" or the special symbol. (That's one big problem with these symbols, having to go and look them up.) > Then you've got minority scripts with small alphabets, and the > Far Eastern languages with massive character sets, and the Indian > languages. Again, virtually all of the symbols are meaningless > to the average English reader, but it's not usually true the > other way round - Far Eastern and Indian readers are likely to > know the English characters and embed English text in their > documents. Exactly. They want their own language plus enough characters to represent international content, which usually means English. Or sometimes there is another official language that might English, French or whatever depending on which colonial power invaded them in the past. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-06 22:45 -0600 |
| Message-ID | <n432p0$ol4$1@dont-email.me> |
| In reply to | #78023 |
On 06-Dec-15 19:55, BartC wrote: > On 07/12/2015 00:38, Martin Shobe wrote: >> Can you give a link to it? > > It was only a dozen or so lines of text! > > Anyway I thought about it for another ten or twenty minutes and I > have a revised scheme (the previous one included non-character escape > codes within a string which I didn't like). Here's version 3: > > * In-memory representation, 32-bit version > > * All large alphabets are organised into sets of 64K characters, each > is given an alphabet code (similar to a code-page, but bigger) > > * ASCII, small alphabets and symbols fit into a single special > alphabet of 64K characters, and itself has an alphabet code of zero What do you consider a "large" vs "small" alphabet? If you exclude CJK, then _every_ script--modern, historical or even fictional--would fit in 16 bits. OTOH, CJK alone is over 16 bits. It sounds like who you're after is segregation: a 32-bit ghetto for CJK and a 16-bit suburb for everyone else. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-07 12:38 +0000 |
| Message-ID | <n43ugm$i6i$1@dont-email.me> |
| In reply to | #78029 |
On 07/12/2015 04:45, Stephen Sprunk wrote: > On 06-Dec-15 19:55, BartC wrote: >> On 07/12/2015 00:38, Martin Shobe wrote: >>> Can you give a link to it? >> >> It was only a dozen or so lines of text! >> >> Anyway I thought about it for another ten or twenty minutes and I >> have a revised scheme (the previous one included non-character escape >> codes within a string which I didn't like). Here's version 3: >> >> * In-memory representation, 32-bit version >> >> * All large alphabets are organised into sets of 64K characters, each >> is given an alphabet code (similar to a code-page, but bigger) >> >> * ASCII, small alphabets and symbols fit into a single special >> alphabet of 64K characters, and itself has an alphabet code of zero > > What do you consider a "large" vs "small" alphabet? If it's currently got an 8-bit code page, then I guess that's a small alphabet. > If you exclude CJK, then _every_ script--modern, historical or even > fictional--would fit in 16 bits. That's good, then we can tidily put all those together. Although one idea I considered was to separate out small alphabets too, with many characters duplicated across several alphabets, then each could be self-contained. (Perhaps with common characters retaining the same code-points.) But this can introduce some extra problems with programming such text, and I wanted it as simple as possible. > OTOH, CJK alone is over 16 bits. Then that would occupy several 'alphabets'. (Probably, two, with consecutive codes. Then effectively it uses a 17-bit encoding.) > It sounds like who you're after is segregation: a 32-bit ghetto for CJK > and a 16-bit suburb for everyone else. That's exactly the aim. If we wanted true integration then characters from all languages of the world would have had randomly assigned code-points. /There is already segregation./ -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-07 13:55 -0600 |
| Message-ID | <n44o3n$sv6$1@dont-email.me> |
| In reply to | #78067 |
On 07-Dec-15 06:38, BartC wrote: > On 07/12/2015 04:45, Stephen Sprunk wrote: >> On 06-Dec-15 19:55, BartC wrote: >>> * ASCII, small alphabets and symbols fit into a single special >>> alphabet of 64K characters, and itself has an alphabet code of >>> zero >> >> What do you consider a "large" vs "small" alphabet? > > If it's currently got an 8-bit code page, then I guess that's a > small alphabet. So, essentially everything except CJK. >> If you exclude CJK, then _every_ script--modern, historical or >> even fictional--would fit in 16 bits. > > That's good, then we can tidily put all those together. But ... how? New code points are assigned every year, and you don't know whether they'll be CJK or non-CJK, so are you proposing that every piece of software that uses your encoding will need to be updated before it can use new code points--unlike the purely algorithmic UTF-8/16/32 that can _already_ properly encode every valid code point? > Although one idea I considered was to separate out small alphabets > too, with many characters duplicated across several alphabets, then > each could be self-contained. (Perhaps with common characters > retaining the same code-points.) If you do that, then you're just recreating the code page mess. The point of Unicode was to get _away_ from that! > But this can introduce some extra problems with programming such > text, and I wanted it as simple as possible. Indeed. >> OTOH, CJK alone is over 16 bits. > > Then that would occupy several 'alphabets'. (Probably, two, with > consecutive codes. Then effectively it uses a 17-bit encoding.) You're doubling the size of _every_ character just to get one more bit? Consider that UTF-8 and UTF-16 can encode all of the most common CJK characters in just two bytes. Also, what happens with a string that is mixed CJK and non-CJK? Does the size of every non-CJK character double just because one CJK character is present? How is this any better than UTF-32? Or are you going to rob the non-CJK code page of one bit to indicate which encoding is used for each character, which means it can no longer hold all ~50k non-CJK characters, which then means some non-CJK scripts must be sent to the ghetto along with CJK? >> It sounds like who you're after is segregation: a 32-bit ghetto for >> CJK and a 16-bit suburb for everyone else. > > That's exactly the aim. Try convincing the CJK countries to accept that. Even many non-CJK countries would reject such a plan due to your clear (and now admitted) discriminatory intent. OTOH, other non-CJK countries that wouldn't care about that issue also happen to be the ones that benefit most from the status quo, so they would likely reject your plan too. So, who would want this, other than you? > If we wanted true integration then characters from all languages of > the world would have had randomly assigned code-points. /There is > already segregation./ De jure vs de facto makes a big difference. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-07 21:14 +0000 |
| Message-ID | <n44snd$gsn$1@dont-email.me> |
| In reply to | #78123 |
On 07/12/2015 19:55, Stephen Sprunk wrote: > On 07-Dec-15 06:38, BartC wrote: >>> What do you consider a "large" vs "small" alphabet? >> >> If it's currently got an 8-bit code page, then I guess that's a >> small alphabet. > > So, essentially everything except CJK. Yes, CJK brings a big bunch of problems. It's different (well the C part is certainly different. The J bit I thought was just katakana, a phonetic alphabet). >> That's good, then we can tidily put all those together. > > But ... how? How are the code points assigned now? > New code points are assigned every year, How are new ones assigned now? How was it done when the alphabet in question had a dedicated code page of a fixed size? and you don't know whether > they'll be CJK or non-CJK, so are you proposing that every piece of > software that uses your encoding will need to be updated I'm not proposing any changes, only looking what could have been alternate approaches. However, considering the palaver involved just in getting £ to display properly (I've seen 4 or 5 different versions recently), even using the new official encoding schemes, my version can't be much worse. >> Although one idea I considered was to separate out small alphabets >> too, with many characters duplicated across several alphabets, then >> each could be self-contained. (Perhaps with common characters >> retaining the same code-points.) > > If you do that, then you're just recreating the code page mess. The > point of Unicode was to get _away_ from that! The problem with code pages I think was that you could only have one at a time. Otherwise it is useful to give an alphabet an identity. Like you did with CJK. (My 1931 typewriter always prints "£" reliably. But then that supports only British-English which is perhaps why it's so reliable.) >> Then that would occupy several 'alphabets'. (Probably, two, with >> consecutive codes. Then effectively it uses a 17-bit encoding.) > > You're doubling the size of _every_ character just to get one more bit? I understand that that is how Python works. A million-character string consisting entirely of 'A's apart from a single SMP character, would take 4MB instead of 1MB. > Consider that UTF-8 and UTF-16 can encode all of the most common CJK > characters in just two bytes. UTF8 manages that in just two bytes? It takes two bytes just for "£"! And £ has the short code of 163. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-07 16:50 -0600 |
| Message-ID | <n452bt$6vm$1@dont-email.me> |
| In reply to | #78134 |
On 07-Dec-15 15:14, BartC wrote: > On 07/12/2015 19:55, Stephen Sprunk wrote: >> On 07-Dec-15 06:38, BartC wrote: >>> If it's currently got an 8-bit code page, then I guess that's a >>> small alphabet. >> >> So, essentially everything except CJK. > > Yes, CJK brings a big bunch of problems. It's different (well the C > part is certainly different. The J bit I thought was just katakana, > a phonetic alphabet). Japanese has _three_ scripts: kanji (CJK), hiragana and katakana. Korean has _two_ scripts: hanja (CJK) and hangul. Vietnamese switched to Latin characters during the French occupation; prior to that, they used chữ Nôm (CJKV). Chinese has the added complication of both Simplified characters (used in PRC and Singapore) and Traditional characters (used in ROC, Macau, Hong Kong, Japan and Korea). For mostly political reasons, distinct code points are assigned for characters that vary between the two. >>> That's good, then we can tidily put all those together. >> >> But ... how? > > How are the code points assigned now? Plane 0 (BMP) was assigned first-come, first-serve as each script's working group reached consensus. There is no pattern except that each block's size is a multiple of 16. That's also apparently how Plane 1 (SMP) is being assigned, except that new CJK characters are put in Plane 2 (SIP) instead. >> New code points are assigned every year, > > How are new ones assigned now? See above. > How was it done when the alphabet in question had a dedicated code > page of a fixed size? Vendors and/or national standards bodies created them, so each has its own unique history. ISO tried to standardize them, but mostly they just made the mess even worse than it was, which is what led to Unicode. >> and you don't know whether they'll be CJK or non-CJK, so are you >> proposing that every piece of software that uses your encoding will >> need to be updated > > I'm not proposing any changes, only looking what could have been > alternate approaches. However, considering the palaver involved just > in getting £ to display properly (I've seen 4 or 5 different > versions recently), even using the new official encoding schemes, my > version can't be much worse. Mojibake would go extinct overnight if everyone just used UTF-8, and indeed the world _is_ slowly heading that way. The main thing holding us back is MS's refusal to allow it as a default code page. >>> Then that would occupy several 'alphabets'. (Probably, two, with >>> consecutive codes. Then effectively it uses a 17-bit encoding.) >> >> You're doubling the size of _every_ character just to get one more >> bit? > > I understand that that is how Python works. A million-character > string consisting entirely of 'A's apart from a single SMP character, > would take 4MB instead of 1MB. Yep. But at least it transparently uses shorter forms for strings known to contain only BMP or only ASCII (actually Latin-1) characters. >> Consider that UTF-8 and UTF-16 can encode all of the most common >> CJK characters in just two bytes. > > UTF8 manages that in just two bytes? It takes two bytes just for > "£"! And £ has the short code of 163. Editing error; UTF-8 needs three bytes for common CJK. Despite that, UTF-8 is far more popular than UTF-16, GB18030/GB2312, Big5 and EUC-KR _combined_, which require only two bytes for CJK characters in the BMP. ShiftJIS alone is still clinging to life, but it's falling to UTF-8 too, just more slowly than the others. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Robert Wessel <robertwessel2@yahoo.com> |
|---|---|
| Date | 2015-12-07 02:38 -0600 |
| Message-ID | <92ha6b15getnrn2in17o0j3lu61vgpuc6a@4ax.com> |
| In reply to | #78023 |
On Mon, 7 Dec 2015 01:55:49 +0000, BartC <bc@freeuk.com> wrote: >On 07/12/2015 00:38, Martin Shobe wrote: >> On 12/6/2015 7:09 AM, BartC wrote: > >>> I spent 5 minutes thinking about an alternative to Unicode, and 10 >>> minutes writing up a first draft, and 10 more minutes for a second draft >>> (I won't bore you with the details). > >>> In 32-bit form, the two schemes (Unicode, and mine), aren't that >>> different in that each character is allocated a dedicated code-point. >>> But in mine, the large alphabets are tidily partitioned out of the way. >>> A similar concept to code-pages, but 32K characters each and that can >>> co-exist in the same text. > >> Can you give a link to it? > >It was only a dozen or so lines of text! > >Anyway I thought about it for another ten or twenty minutes and I have a >revised scheme (the previous one included non-character escape codes >within a string which I didn't like). Here's version 3: > >* In-memory representation, 32-bit version > >* All large alphabets are organised into sets of 64K characters, each is >given an alphabet code (similar to a code-page, but bigger) Unicode CJK has something like 75K characters at the moment. >* ASCII, small alphabets and symbols fit into a single special alphabet >of 64K characters, and itself has an alphabet code of zero > >* Local character encodings for each alphabet are from 0 to 65535, which >form the lsw of the 32-bit code. > >* The msw of the 32-bit code is the alphabet code. The complete code >forms a unique identifier for the character (ignoring the possibilities >of duplicates). The set of all character codes is sparse (not all >alphabets will occupy 64K slots) > >* Where one only alphabet is known to be in use (alphabet 0 also counts >as just one), then a 16-bit in-memory encoding can be used. (With a >similar trick for 8-bit encoding when all character codes are 0 to 255.) > >* (This can also be done on a per-string basic, with the alphabet in use >being an attribute associated with the string.) > >* (Possibly, the first 256 codes of alphabet 0, which are really general >purpose characters, could be repeated at the start of all alphabets. But >this creates the problem of multiple encodings of these characters.) You've ignored RTL/LTR issues, and languages, like Korean, for which composing character out of base pieces is pretty much a requirement (although Unicode also includes over 10K of the most common pre-composed Hangul). You also ignored byte order issues, and compatibility with existing APIs.
[toc] | [prev] | [next] | [standalone]
| From | Steve Thompson <stevet810@gmail.com> |
|---|---|
| Date | 2015-12-06 07:34 +0000 |
| Message-ID | <g7DsLI.43F.mQuAF@gmail.com> |
| In reply to | #77879 |
On Sat, Dec 05, 2015 at 11:47:45AM +0000, BartC wrote:
> On 05/12/2015 01:04, Steve Thompson wrote:
> >On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote:
>
> >>Fine, then we move to 16 bits, which had long been anticipated anyway,
> >>and gives us plenty of room for special symbols. But not if we have to
> >>throw in every single alphabet and writing system that anybody has ever
> >>heard of (and apparently plenty that no one has heard of!).
> >
> >I rather suspect the Anthropologists will scream bloody murder if
> >Egyptian hieroglyphics, Linear B, and all the rest are excluded.
>
> They probably wouldn't notice. Whatever software they use to enter and
> display the characters would still work if a different encoding scheme
> was used.
>
> Or many might prefer just using mark-up to describe it:
> {snake}{bird}{water}.
It seems to me that the code positions for those two languages are
already assigned.
> >>(And then you have vast, sprawling 'alphabets' like Chinese which are
> >>words rather than the letters used to build the words.)
> >
> >So go tell the Chinese (and Japanese, and Thais, and ...) that they
> >should man-up and use a Western alphabet. Such schemes exist, after
> >all.
>
> No, they can use the same alphabets, but they don't put them all into
> one giant melting pot with every other.
>
> Now, I can now longer write what had been trivial string handling
> routines such as capitalise, toupper, reverse, compare, left, leftn,
> etc etc. All are very well defined in ASCII, but would no longer be
> guaranteed to work with Unicode because most of the alphabets are so weird.
I'm not sure what to say. As others have pointed out (or suggested)
the complexity of language conventions is a product of undirected
evolution throughout history. It may be a mess, but nevertheless it
has to be dealt with.
Sorting in particular is a problem if one requires case insensitivity.
I suppose the only solution is a good set of per-language tables which
can be put in arrays for quick access. The combining characters are
another problem.
>From the "unicode" man-page on my system:
Implementation Levels
As not all systems are expected to support advanced mechanisms like
combining characters, ISO 10646-1 specifies the following three
implementation levels of UCS:
Level 1 Combining characters and Hangul Jamo (a variant encoding of
the Korean script, where a Hangul syllable glyph is coded as a
triplet or pair of vovel/consonant codes) are not supported.
Level 2 In addition to level 1, combining characters are now
allowed for some languages where they are essential (e.g., Thai,
Lao, Hebrew, Arabic, Devanagari, Malayalam).
Level 3 All UCS characters are supported.
The Unicode 3.0 Standard published by the Unicode Consortium
contains exactly the UCS Basic Multilingual Plane at implementation
level 3, as described in ISO 10646-1:2000. Unicode 3.1 added the
supplemental planes of ISO 10646-2. The Unicode standard and
technical reports published by the Unicode Consortium provide much
additional information on the semantics and recommended usages of
various characters. They provide guidelines and algorithms for
editing, sorting, comparing, normalizing, converting and displaying
Unicode strings.
I wonder what their algorithm hints are. Unfortunately something I
just don't have time to treat in depth at the moment.
Regards,
Steve Thompson
--
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden." -- MysteryDog in 24hoursupport.helpdesk.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-06 00:24 -0800 |
| Message-ID | <137e7850-c535-49ae-9594-618c56576ab3@googlegroups.com> |
| In reply to | #77944 |
On Sunday, December 6, 2015 at 7:40:36 AM UTC, Steve Thompson wrote: > > Level 2 In addition to level 1, combining characters are now > allowed for some languages where they are essential (e.g., Thai, > Lao, Hebrew, Arabic, Devanagari, Malayalam). > Depends what you mean by essential. Everyday Hebrew is written without vowels or hardening dots (eg to make F into P). However religious text is printed with vowels. But it's agreed that the vowels are man-supplied, they're not considered part of the text given to Moses (for those who take the traditional view).
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-04 19:49 -0600 |
| Message-ID | <n3tfm3$ok8$1@dont-email.me> |
| In reply to | #77857 |
On 04-Dec-15 17:46, BartC wrote: > On 04/12/2015 19:17, Steve Thompson wrote: >> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: >>> So that is something about Unicode I'm not comfortable with. Our >>> nice tidy little alphabet (perhaps one of the reasons the West >>> has been ahead technologically) is swamped by these huge >>> character sets from around the world, which still don't like >>> being marshalled into neat little units. >> >> The West? Are you forgetting the Europe is also part of "the >> West"? > > No. But western Europe at least still uses small alphabets, and > mostly they are based around A-Z. Yes, aka Latin scripts, but unless you're willing to accept combining characters, even _those_ won't all fit in 256 slots. Adding Cyrillic and Greek seems only fair, but then you're past 256 even if you _do_ accept combining characters. And that's just Europe! Well, "Europe" had a lot of colonies, so their scripts cover nearly everyone in North America, South America, Australia, and sub-Saharan Africa who is likely to be using a computer. That leaves Asia and Northern Africa, but Asia is a _serious_ problem due to CJK. >> The technological lead of the West is another matter, and I am >> sorry if you are inconvenienced by the catch-up game underway in >> other parts of the world. Greek, APL, formal logic, mathematics, >> etc. are all sufficiently pervasive that their symbols merit >> inclusion in any reasonable general-use character set, and on that >> basis any fixation on English is bound to be terribly >> short-sighted. > > Fine, then we move to 16 bits, which had long been anticipated > anyway, and gives us plenty of room for special symbols. But not if > we have to throw in every single alphabet and writing system that > anybody has ever heard of (and apparently plenty that no one has > heard of!). CJK alone has >70,000 characters, so a 16-bit system was doomed from the very start. Once you break that barrier, you might as well include everything else--not because they're all important but because your code space is infinite for all practical purposes, which means it's tough to justify _not_ giving some of it to everyone who asks. We blew ~12 bits (99.974%) of UCS-4's space on the UTF-16 hack alone, so a few code points for emoji or Klingon silliness ain't nothin'. > (Imagine you were in the position of creating a new font, with a > hundreds of thousands of to design! I've done that, but for only 100 > characters.) Most fonts only target a specific script, and it's not surprising that CJK has only a handful of fonts available while smaller scripts have thousands of different fonts available. Also, most CJK characters are so detailed that there's really not much room for font variations in the first place. The simplest characters can be stylized, sure, but then you might as well just fall back to an existing font for the remainder. (This also means you could do them in small batches, rather than have to do the entire script in one go.) >> Again which languages? Software I use would be prudent to include >> the capacity to render English, French, German, Swedish >> (Scandinavian language generally), Greek, Latin, > > What's special about Latin? 4.6 billion people use Latin scripts; that is rather special. Latin itself is dead, but it costs nothing extra to include, so why not. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Richard Heathfield <rjh@cpax.org.uk> |
|---|---|
| Date | 2015-12-05 21:32 +0000 |
| Message-ID | <n3vl0t$c9u$1@dont-email.me> |
| In reply to | #77847 |
On 04/12/15 19:17, Steve Thompson wrote: > On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: <snip> >> So that is something about Unicode I'm not comfortable with. Our nice >> tidy little alphabet (perhaps one of the reasons the West has been ahead >> technologically) is swamped by these huge character sets from around the >> world, which still don't like being marshalled into neat little units. > > The West? Are you forgetting the Europe is also part of "the West"? Much of it isn't. Some of Spain, most of France, and all of Belgium, the Netherlands, Germany, Italy, and so on, are in the East. -- Richard Heathfield Email: rjh at cpax dot org dot uk "Usenet is a strange place" - dmr 29 July 1999 Sig line 4 vacant - apply within
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-05 13:50 -0800 |
| Message-ID | <c81a832a-454a-4829-9371-b2cb22e479ca@googlegroups.com> |
| In reply to | #77919 |
On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote: > On 04/12/15 19:17, Steve Thompson wrote: > > On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: > > <snip> > > >> So that is something about Unicode I'm not comfortable with. Our > >> nice tidy little alphabet (perhaps one of the reasons the West has > >> been ahead technologically) is swamped by these huge character sets > >> from around the world, which still don't like being marshalled into > >> neat little units. > > > > The West? Are you forgetting the Europe is also part of "the West"? > > Much of it isn't. > > Some of Spain, most of France, and all of Belgium, the Netherlands, > Germany, Italy, and so on, are in the East. > Depends if you regard Greenwich or Jerusalem as the centre of the world. The latter is more traditional, but the former makes more sense if you want a line that doesn't hit any land round the back.
[toc] | [prev] | [next] | [standalone]
| From | Richard Heathfield <rjh@cpax.org.uk> |
|---|---|
| Date | 2015-12-05 22:15 +0000 |
| Message-ID | <n3vnhh$lum$1@dont-email.me> |
| In reply to | #77920 |
On 05/12/15 21:50, Malcolm McLean wrote: > On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote: >> On 04/12/15 19:17, Steve Thompson wrote: >>> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: >> >> <snip> >> >>>> So that is something about Unicode I'm not comfortable with. Our >>>> nice tidy little alphabet (perhaps one of the reasons the West has >>>> been ahead technologically) is swamped by these huge character sets >>>> from around the world, which still don't like being marshalled into >>>> neat little units. >>> >>> The West? Are you forgetting the Europe is also part of "the West"? >> >> Much of it isn't. >> >> Some of Spain, most of France, and all of Belgium, the Netherlands, >> Germany, Italy, and so on, are in the East. >> > Depends if you regard Greenwich or Jerusalem as the centre of the > world. Neither. The centre of the /world/ is around 4000 miles straight down. And of course I (perfectly correctly) regard my current location as the centre of the observable universe. As for East and West, I am observing the convention that the 0 degree longitude line divides the Eastern hemisphere from the Western hemisphere. > The latter is more traditional, but the former makes > more sense if you want a line that doesn't hit any land round the > back. I'm just abiding by existing conventions. I do that a lot, even when I don't necessarily agree with them 100%. -- Richard Heathfield Email: rjh at cpax dot org dot uk "Usenet is a strange place" - dmr 29 July 1999 Sig line 4 vacant - apply within
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@verizon.net> |
|---|---|
| Date | 2015-12-05 17:27 -0500 |
| Message-ID | <566364D9.3030907@verizon.net> |
| In reply to | #77924 |
On 12/05/2015 05:15 PM, Richard Heathfield wrote: > On 05/12/15 21:50, Malcolm McLean wrote: >> On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote: >>> On 04/12/15 19:17, Steve Thompson wrote: >>>> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: >>> >>> <snip> >>> >>>>> So that is something about Unicode I'm not comfortable with. Our >>>>> nice tidy little alphabet (perhaps one of the reasons the West has >>>>> been ahead technologically) is swamped by these huge character sets >>>>> from around the world, which still don't like being marshalled into >>>>> neat little units. >>>> >>>> The West? Are you forgetting the Europe is also part of "the West"? >>> >>> Much of it isn't. >>> >>> Some of Spain, most of France, and all of Belgium, the Netherlands, >>> Germany, Italy, and so on, are in the East. >>> >> Depends if you regard Greenwich or Jerusalem as the centre of the >> world. > > Neither. The centre of the /world/ is around 4000 miles straight down. > And of course I (perfectly correctly) regard my current location as the > centre of the observable universe. > > As for East and West, I am observing the convention that the 0 degree > longitude line divides the Eastern hemisphere from the Western hemisphere. > >> The latter is more traditional, but the former makes >> more sense if you want a line that doesn't hit any land round the >> back. > > I'm just abiding by existing conventions. I do that a lot, even when I > don't necessarily agree with them 100%. Existing conventions do NOT equate "The West" with "The western hemisphere". The closest match is meaning number 6 at <https://en.wiktionary.org/wiki/West>, where "any region" could, in this case, be a region centered on yourself - but that's clearly not the intended meaning. If it had been, "your West" rather than "the West" would have been a more appropriate way of expressing that meaning.
[toc] | [prev] | [next] | [standalone]
| From | Richard Heathfield <rjh@cpax.org.uk> |
|---|---|
| Date | 2015-12-05 23:06 +0000 |
| Message-ID | <n3vqh7$4c6$1@dont-email.me> |
| In reply to | #77925 |
On 05/12/15 22:27, James Kuyper wrote: > On 12/05/2015 05:15 PM, Richard Heathfield wrote: <snip> >> I'm just abiding by existing conventions. I do that a lot, even when I >> don't necessarily agree with them 100%. > > Existing conventions do NOT equate "The West" with "The western > hemisphere". I don't abide by existing conventions *ALL* the time. :-) -- Richard Heathfield Email: rjh at cpax dot org dot uk "Usenet is a strange place" - dmr 29 July 1999 Sig line 4 vacant - apply within
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@verizon.net> |
|---|---|
| Date | 2015-12-05 18:29 -0500 |
| Message-ID | <5663736C.2020406@verizon.net> |
| In reply to | #77926 |
On 12/05/2015 06:06 PM, Richard Heathfield wrote: > On 05/12/15 22:27, James Kuyper wrote: >> On 12/05/2015 05:15 PM, Richard Heathfield wrote: > > <snip> > >>> I'm just abiding by existing conventions. I do that a lot, even when I >>> don't necessarily agree with them 100%. >> >> Existing conventions do NOT equate "The West" with "The western >> hemisphere". > > I don't abide by existing conventions *ALL* the time. :-) And in this case you are not "just abiding by existing conventions." as claimed above.
[toc] | [prev] | [next] | [standalone]
| From | Richard Heathfield <rjh@cpax.org.uk> |
|---|---|
| Date | 2015-12-05 23:50 +0000 |
| Message-ID | <n3vt2r$cuh$1@dont-email.me> |
| In reply to | #77927 |
On 05/12/15 23:29, James Kuyper wrote: > On 12/05/2015 06:06 PM, Richard Heathfield wrote: >> On 05/12/15 22:27, James Kuyper wrote: >>> On 12/05/2015 05:15 PM, Richard Heathfield wrote: >> >> <snip> >> >>>> I'm just abiding by existing conventions. I do that a lot, even when I >>>> don't necessarily agree with them 100%. >>> >>> Existing conventions do NOT equate "The West" with "The western >>> hemisphere". >> >> I don't abide by existing conventions *ALL* the time. :-) > > And in this case you are not "just abiding by existing conventions." as > claimed above. I'd prefer to argue that I'm just choosing which conventions to observe. But yes, I'm pushing a joke too hard, and it isn't even remotely about C, so I'll drop it. -- Richard Heathfield Email: rjh at cpax dot org dot uk "Usenet is a strange place" - dmr 29 July 1999 Sig line 4 vacant - apply within
[toc] | [prev] | [next] | [standalone]
| From | Steve Thompson <stevet810@gmail.com> |
|---|---|
| Date | 2015-12-06 06:38 +0000 |
| Message-ID | <U1VE4L.uuj.uPXu7@gmail.com> |
| In reply to | #77924 |
On Sat, Dec 05, 2015 at 10:15:31PM +0000, Richard Heathfield wrote: > On 05/12/15 21:50, Malcolm McLean wrote: > >On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote: > >>On 04/12/15 19:17, Steve Thompson wrote: > >>>On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote: > >> > >><snip> > >> > >>>>So that is something about Unicode I'm not comfortable with. Our > >>>>nice tidy little alphabet (perhaps one of the reasons the West has > >>>>been ahead technologically) is swamped by these huge character sets > >>>>from around the world, which still don't like being marshalled into > >>>>neat little units. > >>> > >>>The West? Are you forgetting the Europe is also part of "the West"? > >> > >>Much of it isn't. > >> > >>Some of Spain, most of France, and all of Belgium, the Netherlands, > >>Germany, Italy, and so on, are in the East. > >> > >Depends if you regard Greenwich or Jerusalem as the centre of the > >world. > > Neither. The centre of the /world/ is around 4000 miles straight down. > And of course I (perfectly correctly) regard my current location as the > centre of the observable universe. Oh good. Now we can have a holy-war over who truly occupies the center of the universe. Once I thought it was Toronto, Canada, but as I became enligtened through meditation over C I realized that the One True Center of the Universe is in fact two feet below my chair. Prepare for jihad, infidel. > As for East and West, I am observing the convention that the 0 degree > longitude line divides the Eastern hemisphere from the Western hemisphere. > > >The latter is more traditional, but the former makes > >more sense if you want a line that doesn't hit any land round the > >back. > > I'm just abiding by existing conventions. I do that a lot, even when I > don't necessarily agree with them 100%. Belgium will be so unhappy to learn they can never join the West. Regards, Steve Thompson -- "If I had a nickel for every time some idiot called me about a computer problem that turned out to be user error, I would be able to retire and spend the rest of my days cultivating clues in my backyard hillside garden." -- MysteryDog in 24hoursupport.helpdesk.
[toc] | [prev] | [next] | [standalone]
Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →
Back to top | Article view | comp.lang.c
csiph-web