Groups > comp.lang.c > #77629 > unrolled thread

unicode is a fail

Started by	fir <profesor.fir@gmail.com>
First post	2015-12-02 08:01 -0800
Last post	2015-12-06 13:45 +0000
Articles	20 on this page of 158 — 25 participants

Back to article view | Back to comp.lang.c

  unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
    Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
                Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
                  Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
              Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
                Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
                  Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
                    Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
              Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
                Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
              Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
        Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
      Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
        Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
      Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
                  Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
        Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
        Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
            Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
          Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
          Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
              Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
                  Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
                    Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
              Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
                  Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
                        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
                          Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
                            Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
                      Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
                        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
                    Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
                        Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
                          Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
                              Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
                                Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
                                    Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
                                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
                                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
                                      Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
                                        Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
                                          Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
                                            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
                                            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
                                              Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
                                                    Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
                                                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
                                                        Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
                                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
                                              Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
                                              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
                                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
                                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
                                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
                                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
                                                  Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
                                            Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
                              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
                                    Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
                                      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
              Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
                  Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
                    Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
                      Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
                        Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
                          Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
                      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
                Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
                Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
                  Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
          Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
      Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
        OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
          Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
        Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
          OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
            Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

#78340

From	"Osmium" <r124c4u102@comcast.net>
Date	2015-12-10 08:21 -0600
Message-ID	<dctg2eFeb57U1@mid.individual.net>
In reply to	#78330

"BartC" wrote:

> Actually, MS code page 437 seems to have ignored all the control codes by 
> making all 256 codes represent some visible character (although codes 0 
> and 255 are spaces).
>
> I'm not sure how that works, but presumably if you write the sequence 
> "ABC\nDEF" to a pixel display (not one that emulates a terminal) it would 
> show "ABC♪DEF" (with a musical note symbol in the middle). But \n is still 
> needed inside TXT files.

Some clever guy decided to separate the message from the envelope that 
contained the message, that's how that works! The ASCII committee could have 
used this guy, if they had listened to him.  As I said earlier, ASCII is 
based on a Teletype® centric universe.  The end result is a mish-mash of 
transmission protocols *and* data. The Teletype is gone but the after 
effects linger on.
----------------------
I don't know how anyone could look at the huge list of code pages for DOS 
and EBCDIC at the end of the code page in the link and not be absolutely 
horrified at what a mess has been made of a relatively simply situation. I 
estimate about *200* code pages for DOS alone!

https://en.wikipedia.org/wiki/Windows-1252

[toc] | [prev] | [next] | [standalone]

#78323

From	Robert Wessel <robertwessel2@yahoo.com>
Date	2015-12-10 00:59 -0600
Message-ID	<2i8i6bhq3snbro74hup4l96spgqlqlk6ka@4ax.com>
In reply to	#78240

On Wed, 9 Dec 2015 08:36:54 -0600, "Osmium" <r124c4u102@comcast.net>
wrote:

>"Ben Bacarisse" wrote:
>
>> Has it?  I don't see anyone objecting to my use of the word, and I'd be
>> happy to retract it if they did as I'm not a fan of arguing over vague
>> quantities.  The disagreement was over a statement about what "the rest
>> of Western Europe" uses.  It seems likely that the statement was just a
>> poor choice of words which Malcolm felt obliged to defend.  Had he said:
>>
>>  "For simple English text you need ascii. The other Western European
>>  languages need extended Latin, and annoyingly those characters won't
>>  all quite fit into 8 bits."
>>
>> I don't think there would be much to argue over.
>
>I haven't been watching this thread but that brings up a pet peeve of mine. 
>If you didn't throw away about 32x2 = 64 characters for control characters, 
>most of which are unused in the real world, I suspect European's could live 
>very comfortably with the result.  ASCII was forced down our throats by 
>AT&T and their Teletype® division. I would estimate at least 57 characters 
>could be reclaimed. 


As others have pointed out, ASCII is a 7-bit code.  OTOH, EBCDIC is an
8-bit code, and also dedicates 64 code points to control characters.
Certainly today, the vast majority of those are pointless (at least as
control characters intended to control a device).

[toc] | [prev] | [next] | [standalone]

#78089

From	BartC <bc@freeuk.com>
Date	2015-12-07 14:33 +0000
Message-ID	<n4458d$din$1@dont-email.me>
In reply to	#78028

On 07/12/2015 03:14, Malcolm McLean wrote:
> On Monday, December 7, 2015 at 1:56:13 AM UTC, Bart wrote:
>> On 07/12/2015 00:38, Martin Shobe wrote:
>>> On 12/6/2015 7:09 AM, BartC wrote:
>>
>>>> I spent 5 minutes thinking about an alternative to Unicode, and 10
>>>> minutes writing up a first draft, and 10 more minutes for a second draft
>>>> (I won't bore you with the details).
>>
>>>> In 32-bit form, the two schemes (Unicode, and mine), aren't that
>>>> different in that each character is allocated a dedicated code-point.
>>>> But in mine, the large alphabets are tidily partitioned out of the way.
>>>> A similar concept to code-pages, but 32K characters each and that can
>>>> co-exist in the same text.
>>
>>> Can you give a link to it?
>>
>> It was only a dozen or so lines of text!
>>
>> Anyway I thought about it for another ten or twenty minutes and I have a
>> revised scheme (the previous one included non-character escape codes
>> within a string which I didn't like). Here's version 3:
>>
> You've got to consider the users.
> For simple English text you need ascii. The rest of Western Europe
> uses extended Latin, and annoyingly it won't quite fit into 8 bits.

Not even if they make use of obsolete control codes? Unicode seems to 
have even more than ASCII did.

> Eastern Europe uses Greek characters. Complex English text includes
> ascii, extended Latin, and Greek, and a few special symbols not
> included in ascii. At that point, we start to have the issue of
> what is markup and what is content. Is 1/2 the same content as
> a half symbol?

There shouldn't be any mark-up. (Although even ASCII suffers from that a 
little with tab characters and other codes intended to control layout. 
But that is well-understood.)

The 1/2 can just be a special symbol like (C) and TM. It would be up to 
the text processing application to superimpose a user-friendly interface 
where a search for "1/2" might find "1/2" or the special symbol.

(That's one big problem with these symbols, having to go and look them up.)

> Then you've got minority scripts with small alphabets, and the
> Far Eastern languages with massive character sets, and the Indian
> languages. Again, virtually all of the symbols are meaningless
> to the average English reader, but it's not usually true the
> other way round - Far Eastern and Indian readers are likely to
> know the English characters and embed English text in their
> documents.

Exactly. They want their own language plus enough characters to 
represent international content, which usually means English. Or 
sometimes there is another official language that might English, French 
or whatever depending on which colonial power invaded them in the past.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#78029

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-06 22:45 -0600
Message-ID	<n432p0$ol4$1@dont-email.me>
In reply to	#78023

On 06-Dec-15 19:55, BartC wrote:
> On 07/12/2015 00:38, Martin Shobe wrote:
>> Can you give a link to it?
> 
> It was only a dozen or so lines of text!
> 
> Anyway I thought about it for another ten or twenty minutes and I
> have a revised scheme (the previous one included non-character escape
> codes within a string which I didn't like). Here's version 3:
> 
> * In-memory representation, 32-bit version
> 
> * All large alphabets are organised into sets of 64K characters, each
> is given an alphabet code (similar to a code-page, but bigger)
> 
> * ASCII, small alphabets and symbols fit into a single special
> alphabet of 64K characters, and itself has an alphabet code of zero

What do you consider a "large" vs "small" alphabet?

If you exclude CJK, then _every_ script--modern, historical or even
fictional--would fit in 16 bits.  OTOH, CJK alone is over 16 bits.

It sounds like who you're after is segregation: a 32-bit ghetto for CJK
and a 16-bit suburb for everyone else.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78067

From	BartC <bc@freeuk.com>
Date	2015-12-07 12:38 +0000
Message-ID	<n43ugm$i6i$1@dont-email.me>
In reply to	#78029

On 07/12/2015 04:45, Stephen Sprunk wrote:
> On 06-Dec-15 19:55, BartC wrote:
>> On 07/12/2015 00:38, Martin Shobe wrote:
>>> Can you give a link to it?
>>
>> It was only a dozen or so lines of text!
>>
>> Anyway I thought about it for another ten or twenty minutes and I
>> have a revised scheme (the previous one included non-character escape
>> codes within a string which I didn't like). Here's version 3:
>>
>> * In-memory representation, 32-bit version
>>
>> * All large alphabets are organised into sets of 64K characters, each
>> is given an alphabet code (similar to a code-page, but bigger)
>>
>> * ASCII, small alphabets and symbols fit into a single special
>> alphabet of 64K characters, and itself has an alphabet code of zero
>
> What do you consider a "large" vs "small" alphabet?

If it's currently got an 8-bit code page, then I guess that's a small 
alphabet.

> If you exclude CJK, then _every_ script--modern, historical or even
> fictional--would fit in 16 bits.

That's good, then we can tidily put all those together.

Although one idea I considered was to separate out small alphabets too, 
with many characters duplicated across several alphabets, then each 
could be self-contained. (Perhaps with common characters retaining the 
same code-points.)

But this can introduce some extra problems with programming such text, 
and I wanted it as simple as possible.

>  OTOH, CJK alone is over 16 bits.

Then that would occupy several 'alphabets'. (Probably, two, with 
consecutive codes. Then effectively it uses a 17-bit encoding.)

> It sounds like who you're after is segregation: a 32-bit ghetto for CJK
> and a 16-bit suburb for everyone else.

That's exactly the aim. If we wanted true integration then characters 
from all languages of the world would have had randomly assigned 
code-points. /There is already segregation./

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#78123

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-07 13:55 -0600
Message-ID	<n44o3n$sv6$1@dont-email.me>
In reply to	#78067

On 07-Dec-15 06:38, BartC wrote:
> On 07/12/2015 04:45, Stephen Sprunk wrote:
>> On 06-Dec-15 19:55, BartC wrote:
>>> * ASCII, small alphabets and symbols fit into a single special 
>>> alphabet of 64K characters, and itself has an alphabet code of
>>> zero
>> 
>> What do you consider a "large" vs "small" alphabet?
> 
> If it's currently got an 8-bit code page, then I guess that's a
> small alphabet.

So, essentially everything except CJK.

>> If you exclude CJK, then _every_ script--modern, historical or
>> even fictional--would fit in 16 bits.
> 
> That's good, then we can tidily put all those together.

But ... how?

New code points are assigned every year, and you don't know whether
they'll be CJK or non-CJK, so are you proposing that every piece of
software that uses your encoding will need to be updated before it can
use new code points--unlike the purely algorithmic UTF-8/16/32 that can
_already_ properly encode every valid code point?

> Although one idea I considered was to separate out small alphabets
> too, with many characters duplicated across several alphabets, then
> each could be self-contained. (Perhaps with common characters
> retaining the same code-points.)

If you do that, then you're just recreating the code page mess.  The
point of Unicode was to get _away_ from that!

> But this can introduce some extra problems with programming such
> text, and I wanted it as simple as possible.

Indeed.

>> OTOH, CJK alone is over 16 bits.
> 
> Then that would occupy several 'alphabets'. (Probably, two, with 
> consecutive codes. Then effectively it uses a 17-bit encoding.)

You're doubling the size of _every_ character just to get one more bit?

Consider that UTF-8 and UTF-16 can encode all of the most common CJK
characters in just two bytes.

Also, what happens with a string that is mixed CJK and non-CJK?  Does
the size of every non-CJK character double just because one CJK
character is present?  How is this any better than UTF-32?

Or are you going to rob the non-CJK code page of one bit to indicate
which encoding is used for each character, which means it can no longer
hold all ~50k non-CJK characters, which then means some non-CJK scripts
must be sent to the ghetto along with CJK?

>> It sounds like who you're after is segregation: a 32-bit ghetto for
>> CJK and a 16-bit suburb for everyone else.
> 
> That's exactly the aim.

Try convincing the CJK countries to accept that.

Even many non-CJK countries would reject such a plan due to your clear
(and now admitted) discriminatory intent.

OTOH, other non-CJK countries that wouldn't care about that issue also
happen to be the ones that benefit most from the status quo, so they
would likely reject your plan too.

So, who would want this, other than you?

> If we wanted true integration then characters from all languages of
> the world would have had randomly assigned code-points. /There is
> already segregation./

De jure vs de facto makes a big difference.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78134

From	BartC <bc@freeuk.com>
Date	2015-12-07 21:14 +0000
Message-ID	<n44snd$gsn$1@dont-email.me>
In reply to	#78123

On 07/12/2015 19:55, Stephen Sprunk wrote:
> On 07-Dec-15 06:38, BartC wrote:

>>> What do you consider a "large" vs "small" alphabet?
>>
>> If it's currently got an 8-bit code page, then I guess that's a
>> small alphabet.
>
> So, essentially everything except CJK.

Yes, CJK brings a big bunch of problems. It's different (well the C part 
is certainly different. The J bit I thought was just katakana, a 
phonetic alphabet).

>> That's good, then we can tidily put all those together.
>
> But ... how?

How are the code points assigned now?

> New code points are assigned every year,

How are new ones assigned now? How was it done when the alphabet in 
question had a dedicated code page of a fixed size?

  and you don't know whether
> they'll be CJK or non-CJK, so are you proposing that every piece of
> software that uses your encoding will need to be updated

I'm not proposing any changes, only looking what could have been 
alternate approaches. However, considering the palaver involved just in 
getting £ to display properly (I've seen 4 or 5 different versions 
recently), even using the new official encoding schemes, my version 
can't be much worse.

>> Although one idea I considered was to separate out small alphabets
>> too, with many characters duplicated across several alphabets, then
>> each could be self-contained. (Perhaps with common characters
>> retaining the same code-points.)
>
> If you do that, then you're just recreating the code page mess.  The
> point of Unicode was to get _away_ from that!

The problem with code pages I think was that you could only have one at 
a time. Otherwise it is useful to give an alphabet an identity. Like you 
did with CJK. (My 1931 typewriter always prints "£" reliably. But then 
that supports only British-English which is perhaps why it's so reliable.)

>> Then that would occupy several 'alphabets'. (Probably, two, with
>> consecutive codes. Then effectively it uses a 17-bit encoding.)
>
> You're doubling the size of _every_ character just to get one more bit?

I understand that that is how Python works. A million-character string 
consisting entirely of 'A's apart from a single SMP character, would 
take 4MB instead of 1MB.

> Consider that UTF-8 and UTF-16 can encode all of the most common CJK
> characters in just two bytes.

UTF8 manages that in just two bytes? It takes two bytes just for "£"! 
And £ has the short code of 163.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#78146

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-07 16:50 -0600
Message-ID	<n452bt$6vm$1@dont-email.me>
In reply to	#78134

On 07-Dec-15 15:14, BartC wrote:
> On 07/12/2015 19:55, Stephen Sprunk wrote:
>> On 07-Dec-15 06:38, BartC wrote:
>>> If it's currently got an 8-bit code page, then I guess that's a 
>>> small alphabet.
>> 
>> So, essentially everything except CJK.
> 
> Yes, CJK brings a big bunch of problems. It's different (well the C
> part is certainly different. The J bit I thought was just katakana,
> a phonetic alphabet).

Japanese has _three_ scripts: kanji (CJK), hiragana and katakana.

Korean has _two_ scripts: hanja (CJK) and hangul.

Vietnamese switched to Latin characters during the French occupation;
prior to that, they used chữ Nôm (CJKV).

Chinese has the added complication of both Simplified characters (used
in PRC and Singapore) and Traditional characters (used in ROC, Macau,
Hong Kong, Japan and Korea).  For mostly political reasons, distinct
code points are assigned for characters that vary between the two.

>>> That's good, then we can tidily put all those together.
>> 
>> But ... how?
> 
> How are the code points assigned now?

Plane 0 (BMP) was assigned first-come, first-serve as each script's
working group reached consensus.  There is no pattern except that each
block's size is a multiple of 16.

That's also apparently how Plane 1 (SMP) is being assigned, except that
new CJK characters are put in Plane 2 (SIP) instead.

>> New code points are assigned every year,
> 
> How are new ones assigned now?

See above.

> How was it done when the alphabet in question had a dedicated code
> page of a fixed size?

Vendors and/or national standards bodies created them, so each has its
own unique history.  ISO tried to standardize them, but mostly they just
made the mess even worse than it was, which is what led to Unicode.

>> and you don't know whether they'll be CJK or non-CJK, so are you
>> proposing that every piece of software that uses your encoding will
>> need to be updated
> 
> I'm not proposing any changes, only looking what could have been 
> alternate approaches. However, considering the palaver involved just
> in getting £ to display properly (I've seen 4 or 5 different
> versions recently), even using the new official encoding schemes, my
> version can't be much worse.

Mojibake would go extinct overnight if everyone just used UTF-8, and
indeed the world _is_ slowly heading that way.  The main thing holding
us back is MS's refusal to allow it as a default code page.

>>> Then that would occupy several 'alphabets'. (Probably, two, with 
>>> consecutive codes. Then effectively it uses a 17-bit encoding.)
>> 
>> You're doubling the size of _every_ character just to get one more
>> bit?
> 
> I understand that that is how Python works. A million-character
> string consisting entirely of 'A's apart from a single SMP character,
> would take 4MB instead of 1MB.

Yep.  But at least it transparently uses shorter forms for strings known
to contain only BMP or only ASCII (actually Latin-1) characters.

>> Consider that UTF-8 and UTF-16 can encode all of the most common
>> CJK characters in just two bytes.
> 
> UTF8 manages that in just two bytes? It takes two bytes just for
> "£"! And £ has the short code of 163.

Editing error; UTF-8 needs three bytes for common CJK.

Despite that, UTF-8 is far more popular than UTF-16, GB18030/GB2312,
Big5 and EUC-KR _combined_, which require only two bytes for CJK
characters in the BMP.  ShiftJIS alone is still clinging to life, but
it's falling to UTF-8 too, just more slowly than the others.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78033

From	Robert Wessel <robertwessel2@yahoo.com>
Date	2015-12-07 02:38 -0600
Message-ID	<92ha6b15getnrn2in17o0j3lu61vgpuc6a@4ax.com>
In reply to	#78023

On Mon, 7 Dec 2015 01:55:49 +0000, BartC <bc@freeuk.com> wrote:

>On 07/12/2015 00:38, Martin Shobe wrote:
>> On 12/6/2015 7:09 AM, BartC wrote:
>
>>> I spent 5 minutes thinking about an alternative to Unicode, and 10
>>> minutes writing up a first draft, and 10 more minutes for a second draft
>>> (I won't bore you with the details).
>
>>> In 32-bit form, the two schemes (Unicode, and mine), aren't that
>>> different in that each character is allocated a dedicated code-point.
>>> But in mine, the large alphabets are tidily partitioned out of the way.
>>> A similar concept to code-pages, but 32K characters each and that can
>>> co-exist in the same text.
>
>> Can you give a link to it?
>
>It was only a dozen or so lines of text!
>
>Anyway I thought about it for another ten or twenty minutes and I have a 
>revised scheme (the previous one included non-character escape codes 
>within a string which I didn't like). Here's version 3:
>
>* In-memory representation, 32-bit version
>
>* All large alphabets are organised into sets of 64K characters, each is 
>given an alphabet code (similar to a code-page, but bigger)


Unicode CJK has something like 75K characters at the moment.


>* ASCII, small alphabets and symbols fit into a single special alphabet 
>of 64K characters, and itself has an alphabet code of zero
>
>* Local character encodings for each alphabet are from 0 to 65535, which 
>form the lsw of the 32-bit code.
>
>* The msw of the 32-bit code is the alphabet code. The complete code 
>forms a unique identifier for the character (ignoring the possibilities 
>of duplicates). The set of all character codes is sparse (not all 
>alphabets will occupy 64K slots)
>
>* Where one only alphabet is known to be in use (alphabet 0 also counts 
>as just one), then a 16-bit in-memory encoding can be used. (With a 
>similar trick for 8-bit encoding when all character codes are 0 to 255.)
>
>* (This can also be done on a per-string basic, with the alphabet in use 
>being an attribute associated with the string.)
>
>* (Possibly, the first 256 codes of alphabet 0, which are really general 
>purpose characters, could be repeated at the start of all alphabets. But 
>this creates the problem of multiple encodings of these characters.)


You've ignored RTL/LTR issues, and languages, like Korean, for which
composing character out of base pieces is pretty much a requirement
(although Unicode also includes over 10K of the most common
pre-composed Hangul).  You also ignored byte order issues, and
compatibility with existing APIs.

[toc] | [prev] | [next] | [standalone]

#77944

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-06 07:34 +0000
Message-ID	<g7DsLI.43F.mQuAF@gmail.com>
In reply to	#77879

On Sat, Dec 05, 2015 at 11:47:45AM +0000, BartC wrote:
> On 05/12/2015 01:04, Steve Thompson wrote:
> >On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote:
> 
> >>Fine, then we move to 16 bits, which had long been anticipated anyway,
> >>and gives us plenty of room for special symbols. But not if we have to
> >>throw in every single alphabet and writing system that anybody has ever
> >>heard of (and apparently plenty that no one has heard of!).
> >
> >I rather suspect the Anthropologists will scream bloody murder if
> >Egyptian hieroglyphics, Linear B, and all the rest are excluded.
> 
> They probably wouldn't notice. Whatever software they use to enter and 
> display the characters would still work if a different encoding scheme 
> was used.
> 
> Or many might prefer just using mark-up to describe it: 
> {snake}{bird}{water}.

It seems to me that the code positions for those two languages are
already assigned.

> >>(And then you have vast, sprawling 'alphabets' like Chinese which are
> >>words rather than the letters used to build the words.)
> >
> >So go tell the Chinese (and Japanese, and Thais, and ...) that they
> >should man-up and use a Western alphabet.  Such schemes exist, after
> >all.
> 
> No, they can use the same alphabets, but they don't put them all into 
> one giant melting pot with every other.
> 
> Now, I can now longer write what had been trivial string handling 
> routines such as capitalise, toupper, reverse, compare, left, leftn, 
> etc etc. All are very well defined in ASCII, but would no longer be 
> guaranteed to work with Unicode because most of the alphabets are so weird.

I'm not sure what to say.  As others have pointed out (or suggested)
the complexity of language conventions is a product of undirected
evolution throughout history.  It may be a mess, but nevertheless it
has to be dealt with.

Sorting in particular is a problem if one requires case insensitivity.
I suppose the only solution is a good set of per-language tables which
can be put in arrays for quick access.  The combining characters are
another problem.

>From the "unicode" man-page on my system:

   Implementation Levels

   As not all systems are expected to support advanced mechanisms like
   combining characters, ISO 10646-1 specifies the following three
   implementation levels of UCS:

   Level 1 Combining characters and Hangul Jamo (a variant encoding of
   the Korean script, where a Hangul syllable glyph is coded as a
   triplet or pair of vovel/consonant codes) are not supported.

   Level 2 In addition to level 1, combining characters are now
   allowed for some languages where they are essential (e.g., Thai,
   Lao, Hebrew, Arabic, Devanagari, Malayalam).

   Level 3  All UCS characters are supported.

   The Unicode 3.0 Standard published by the Unicode Consortium
   contains exactly the UCS Basic Multilingual Plane at implementation
   level 3, as described in ISO 10646-1:2000.  Unicode 3.1 added the
   supplemental planes of ISO 10646-2.  The Unicode standard and
   technical reports published by the Unicode Consortium provide much
   additional information on the semantics and recommended usages of
   various characters.  They provide guidelines and algorithms for
   editing, sorting, comparing, normalizing, converting and displaying
   Unicode strings.

I wonder what their algorithm hints are.  Unfortunately something I
just don't have time to treat in depth at the moment.

Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77946

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-06 00:24 -0800
Message-ID	<137e7850-c535-49ae-9594-618c56576ab3@googlegroups.com>
In reply to	#77944

On Sunday, December 6, 2015 at 7:40:36 AM UTC, Steve Thompson wrote:
>
>    Level 2 In addition to level 1, combining characters are now
>    allowed for some languages where they are essential (e.g., Thai,
>    Lao, Hebrew, Arabic, Devanagari, Malayalam).
>    
Depends what you mean by essential.
Everyday Hebrew is written without vowels or hardening dots (eg
to make F into P). However religious text is printed with vowels.
But it's agreed that the vowels are man-supplied, they're not
considered part of the text given to Moses (for those who take
the traditional view).

[toc] | [prev] | [next] | [standalone]

#77870

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-04 19:49 -0600
Message-ID	<n3tfm3$ok8$1@dont-email.me>
In reply to	#77857

On 04-Dec-15 17:46, BartC wrote:
> On 04/12/2015 19:17, Steve Thompson wrote:
>> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
>>> So that is something about Unicode I'm not comfortable with. Our
>>> nice tidy little alphabet (perhaps one of the reasons the West
>>> has been ahead technologically) is swamped by these huge
>>> character sets from around the world, which still don't like
>>> being marshalled into neat little units.
>> 
>> The West?  Are you forgetting the Europe is also part of "the
>> West"?
> 
> No. But western Europe at least still uses small alphabets, and
> mostly they are based around A-Z.

Yes, aka Latin scripts, but unless you're willing to accept combining
characters, even _those_ won't all fit in 256 slots.  Adding Cyrillic
and Greek seems only fair, but then you're past 256 even if you _do_
accept combining characters.  And that's just Europe!

Well, "Europe" had a lot of colonies, so their scripts cover nearly
everyone in North America, South America, Australia, and sub-Saharan
Africa who is likely to be using a computer.  That leaves Asia and
Northern Africa, but Asia is a _serious_ problem due to CJK.

>> The technological lead of the West is another matter, and I am
>> sorry if you are inconvenienced by the catch-up game underway in
>> other parts of the world.  Greek, APL, formal logic, mathematics,
>> etc. are all sufficiently pervasive that their symbols merit
>> inclusion in any reasonable general-use character set, and on that
>> basis any fixation on English is bound to be terribly
>> short-sighted.
> 
> Fine, then we move to 16 bits, which had long been anticipated
> anyway, and gives us plenty of room for special symbols. But not if
> we have to throw in every single alphabet and writing system that
> anybody has ever heard of (and apparently plenty that no one has
> heard of!).

CJK alone has >70,000 characters, so a 16-bit system was doomed from the
very start.  Once you break that barrier, you might as well include
everything else--not because they're all important but because your code
space is infinite for all practical purposes, which means it's tough to
justify _not_ giving some of it to everyone who asks.

We blew ~12 bits (99.974%) of UCS-4's space on the UTF-16 hack alone, so
a few code points for emoji or Klingon silliness ain't nothin'.

> (Imagine you were in the position of creating a new font, with a 
> hundreds of thousands of to design! I've done that, but for only 100 
> characters.)

Most fonts only target a specific script, and it's not surprising that
CJK has only a handful of fonts available while smaller scripts have
thousands of different fonts available.

Also, most CJK characters are so detailed that there's really not much
room for font variations in the first place.  The simplest characters
can be stylized, sure, but then you might as well just fall back to an
existing font for the remainder.  (This also means you could do them in
small batches, rather than have to do the entire script in one go.)

>> Again which languages?  Software I use would be prudent to include
>> the capacity to render English, French, German, Swedish
>> (Scandinavian language generally), Greek, Latin,
> 
> What's special about Latin?

4.6 billion people use Latin scripts; that is rather special.  Latin
itself is dead, but it costs nothing extra to include, so why not.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77919

From	Richard Heathfield <rjh@cpax.org.uk>
Date	2015-12-05 21:32 +0000
Message-ID	<n3vl0t$c9u$1@dont-email.me>
In reply to	#77847

On 04/12/15 19:17, Steve Thompson wrote:
> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:

<snip>

>> So that is something about Unicode I'm not comfortable with. Our nice
>> tidy little alphabet (perhaps one of the reasons the West has been ahead
>> technologically) is swamped by these huge character sets from around the
>> world, which still don't like being marshalled into neat little units.
>
> The West?  Are you forgetting the Europe is also part of "the West"?

Much of it isn't.

Some of Spain, most of France, and all of Belgium, the Netherlands, 
Germany, Italy, and so on, are in the East.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

[toc] | [prev] | [next] | [standalone]

#77920

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-05 13:50 -0800
Message-ID	<c81a832a-454a-4829-9371-b2cb22e479ca@googlegroups.com>
In reply to	#77919

On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote:
> On 04/12/15 19:17, Steve Thompson wrote:
> > On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
> 
> <snip>
> 
> >> So that is something about Unicode I'm not comfortable with. Our
> >> nice tidy little alphabet (perhaps one of the reasons the West has 
> >> been ahead technologically) is swamped by these huge character sets
> >> from around the world, which still don't like being marshalled into
> >> neat little units.
> >
> > The West?  Are you forgetting the Europe is also part of "the West"?
> 
> Much of it isn't.
> 
> Some of Spain, most of France, and all of Belgium, the Netherlands, 
> Germany, Italy, and so on, are in the East.
> 
Depends if you regard Greenwich or Jerusalem as the centre of the
world. The latter is more traditional, but the former makes
more sense if you want a line that doesn't hit any land round the
back.

[toc] | [prev] | [next] | [standalone]

#77924

From	Richard Heathfield <rjh@cpax.org.uk>
Date	2015-12-05 22:15 +0000
Message-ID	<n3vnhh$lum$1@dont-email.me>
In reply to	#77920

On 05/12/15 21:50, Malcolm McLean wrote:
> On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote:
>> On 04/12/15 19:17, Steve Thompson wrote:
>>> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
>>
>> <snip>
>>
>>>> So that is something about Unicode I'm not comfortable with. Our
>>>> nice tidy little alphabet (perhaps one of the reasons the West has
>>>> been ahead technologically) is swamped by these huge character sets
>>>> from around the world, which still don't like being marshalled into
>>>> neat little units.
>>>
>>> The West?  Are you forgetting the Europe is also part of "the West"?
>>
>> Much of it isn't.
>>
>> Some of Spain, most of France, and all of Belgium, the Netherlands,
>> Germany, Italy, and so on, are in the East.
>>
> Depends if you regard Greenwich or Jerusalem as the centre of the
> world.

Neither. The centre of the /world/ is around 4000 miles straight down. 
And of course I (perfectly correctly) regard my current location as the 
centre of the observable universe.

As for East and West, I am observing the convention that the 0 degree 
longitude line divides the Eastern hemisphere from the Western hemisphere.

> The latter is more traditional, but the former makes
> more sense if you want a line that doesn't hit any land round the
> back.

I'm just abiding by existing conventions. I do that a lot, even when I 
don't necessarily agree with them 100%.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

[toc] | [prev] | [next] | [standalone]

#77925

From	James Kuyper <jameskuyper@verizon.net>
Date	2015-12-05 17:27 -0500
Message-ID	<566364D9.3030907@verizon.net>
In reply to	#77924

On 12/05/2015 05:15 PM, Richard Heathfield wrote:
> On 05/12/15 21:50, Malcolm McLean wrote:
>> On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote:
>>> On 04/12/15 19:17, Steve Thompson wrote:
>>>> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
>>>
>>> <snip>
>>>
>>>>> So that is something about Unicode I'm not comfortable with. Our
>>>>> nice tidy little alphabet (perhaps one of the reasons the West has
>>>>> been ahead technologically) is swamped by these huge character sets
>>>>> from around the world, which still don't like being marshalled into
>>>>> neat little units.
>>>>
>>>> The West?  Are you forgetting the Europe is also part of "the West"?
>>>
>>> Much of it isn't.
>>>
>>> Some of Spain, most of France, and all of Belgium, the Netherlands,
>>> Germany, Italy, and so on, are in the East.
>>>
>> Depends if you regard Greenwich or Jerusalem as the centre of the
>> world.
> 
> Neither. The centre of the /world/ is around 4000 miles straight down. 
> And of course I (perfectly correctly) regard my current location as the 
> centre of the observable universe.
> 
> As for East and West, I am observing the convention that the 0 degree 
> longitude line divides the Eastern hemisphere from the Western hemisphere.
> 
>> The latter is more traditional, but the former makes
>> more sense if you want a line that doesn't hit any land round the
>> back.
> 
> I'm just abiding by existing conventions. I do that a lot, even when I 
> don't necessarily agree with them 100%.

Existing conventions do NOT equate "The West" with "The western
hemisphere". The closest match is meaning number 6 at
<https://en.wiktionary.org/wiki/West>, where "any region" could, in this
case, be a region centered on yourself - but that's clearly not the
intended meaning. If it had been, "your West" rather than "the West"
would have been a more appropriate way of expressing that meaning.

[toc] | [prev] | [next] | [standalone]

#77926

From	Richard Heathfield <rjh@cpax.org.uk>
Date	2015-12-05 23:06 +0000
Message-ID	<n3vqh7$4c6$1@dont-email.me>
In reply to	#77925

On 05/12/15 22:27, James Kuyper wrote:
> On 12/05/2015 05:15 PM, Richard Heathfield wrote:

<snip>

>> I'm just abiding by existing conventions. I do that a lot, even when I
>> don't necessarily agree with them 100%.
>
> Existing conventions do NOT equate "The West" with "The western
> hemisphere".

I don't abide by existing conventions *ALL* the time. :-)

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

[toc] | [prev] | [next] | [standalone]

#77927

From	James Kuyper <jameskuyper@verizon.net>
Date	2015-12-05 18:29 -0500
Message-ID	<5663736C.2020406@verizon.net>
In reply to	#77926

On 12/05/2015 06:06 PM, Richard Heathfield wrote:
> On 05/12/15 22:27, James Kuyper wrote:
>> On 12/05/2015 05:15 PM, Richard Heathfield wrote:
> 
> <snip>
> 
>>> I'm just abiding by existing conventions. I do that a lot, even when I
>>> don't necessarily agree with them 100%.
>>
>> Existing conventions do NOT equate "The West" with "The western
>> hemisphere".
> 
> I don't abide by existing conventions *ALL* the time. :-)

And in this case you are not "just abiding by existing conventions." as
claimed above.

[toc] | [prev] | [next] | [standalone]

#77928

From	Richard Heathfield <rjh@cpax.org.uk>
Date	2015-12-05 23:50 +0000
Message-ID	<n3vt2r$cuh$1@dont-email.me>
In reply to	#77927

On 05/12/15 23:29, James Kuyper wrote:
> On 12/05/2015 06:06 PM, Richard Heathfield wrote:
>> On 05/12/15 22:27, James Kuyper wrote:
>>> On 12/05/2015 05:15 PM, Richard Heathfield wrote:
>>
>> <snip>
>>
>>>> I'm just abiding by existing conventions. I do that a lot, even when I
>>>> don't necessarily agree with them 100%.
>>>
>>> Existing conventions do NOT equate "The West" with "The western
>>> hemisphere".
>>
>> I don't abide by existing conventions *ALL* the time. :-)
>
> And in this case you are not "just abiding by existing conventions." as
> claimed above.

I'd prefer to argue that I'm just choosing which conventions to observe. 
But yes, I'm pushing a joke too hard, and it isn't even remotely about 
C, so I'll drop it.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

[toc] | [prev] | [next] | [standalone]

#77943

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-06 06:38 +0000
Message-ID	<U1VE4L.uuj.uPXu7@gmail.com>
In reply to	#77924

On Sat, Dec 05, 2015 at 10:15:31PM +0000, Richard Heathfield wrote:
> On 05/12/15 21:50, Malcolm McLean wrote:
> >On Saturday, December 5, 2015 at 9:32:40 PM UTC, Richard Heathfield wrote:
> >>On 04/12/15 19:17, Steve Thompson wrote:
> >>>On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
> >>
> >><snip>
> >>
> >>>>So that is something about Unicode I'm not comfortable with. Our
> >>>>nice tidy little alphabet (perhaps one of the reasons the West has
> >>>>been ahead technologically) is swamped by these huge character sets
> >>>>from around the world, which still don't like being marshalled into
> >>>>neat little units.
> >>>
> >>>The West?  Are you forgetting the Europe is also part of "the West"?
> >>
> >>Much of it isn't.
> >>
> >>Some of Spain, most of France, and all of Belgium, the Netherlands,
> >>Germany, Italy, and so on, are in the East.
> >>
> >Depends if you regard Greenwich or Jerusalem as the centre of the
> >world.
> 
> Neither. The centre of the /world/ is around 4000 miles straight down. 
> And of course I (perfectly correctly) regard my current location as the 
> centre of the observable universe.

Oh good.  Now we can have a holy-war over who truly occupies the
center of the universe.  Once I thought it was Toronto, Canada, but as
I became enligtened through meditation over C I realized that the One
True Center of the Universe is in fact two feet below my chair.
Prepare for jihad, infidel.
 
> As for East and West, I am observing the convention that the 0 degree 
> longitude line divides the Eastern hemisphere from the Western hemisphere.
> 
> >The latter is more traditional, but the former makes
> >more sense if you want a line that doesn't hit any land round the
> >back.
> 
> I'm just abiding by existing conventions. I do that a lot, even when I 
> don't necessarily agree with them 100%.

Belgium will be so unhappy to learn they can never join the West.



Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

csiph-web

unicode is a fail

Contents

#78340

#78323

#78089

#78029

#78067

#78123

#78134

#78146

#78033

#77944

#77946

#77870

#77919

#77920

#77924

#77925

#77926

#77927

#77928

#77943