Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: unicode is a fail Date: Sat, 05 Dec 2015 16:32:22 -0800 Organization: None to speak of Lines: 47 Message-ID: References: <2qyvC0.96Q.SQT8q@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: mx02.eternal-september.org; posting-host="945944de09706c9b4e29b53c9d2efdc2"; logging-data="21517"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX189ywIfVty5p6WJct7eI9Zj" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:PzowSFGkyu9doQwr7jJA1GoY3uw= sha1:rICNLZl1FOoJE7JMvG+sZfA+0Fg= Xref: csiph.com comp.lang.c:77929 BartC writes: > On 05/12/2015 01:04, Steve Thompson wrote: >> On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote: [...] >>> (And then you have vast, sprawling 'alphabets' like Chinese which are >>> words rather than the letters used to build the words.) >> >> So go tell the Chinese (and Japanese, and Thais, and ...) that they >> should man-up and use a Western alphabet. Such schemes exist, after >> all. > > No, they can use the same alphabets, but they don't put them all into > one giant melting pot with every other. So you want users of Asian writing systems to use their own separate character set encodings, incompatible with the encodings used in Western countries. Because that way it's more convenient for you. Sorry, but the decision has already been made. Unicode combines most of the world's character sets into a single standard, and that's not going to change. Complain all you like (preferably elsewhere); it's not going to make any difference. No doubt you have some ideas for how HTML web pages can include both ASCII-encoded tag names and Chinese characters. Which means there has to be a way to combine Latin and Chinese characters in a single document anyway. > Now, I can now longer write what had been trivial string handling > routines such as capitalise, toupper, reverse, compare, left, leftn, > etc etc. All are very well defined in ASCII, but would no longer be > guaranteed to work with Unicode because most of the alphabets are so weird. Too bad. The "giant melting pot" you worry about already exists, and is used for most text transmitted over the Internet. If you want to write software that only deals with ASCII, you're absolutely free to do so, and you can do as much trivial string handling as you like. -- Keith Thompson (The_Other_Keith) kst-u@mib.org Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"