Groups > comp.lang.c > #77629 > unrolled thread

unicode is a fail

Started by	fir <profesor.fir@gmail.com>
First post	2015-12-02 08:01 -0800
Last post	2015-12-06 13:45 +0000
Articles	20 on this page of 158 — 25 participants

Back to article view | Back to comp.lang.c

  unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
    Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
                Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
                  Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
              Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
            Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
              Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
                Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
                  Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
                    Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
              Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
                Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
              Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
        Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
          Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
      Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
          Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
        Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
      Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
              Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
                  Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
        Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
        Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
          Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
            Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
          Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
          Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
            Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
            Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
              Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
                  Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
                    Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
              Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
                  Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
                    Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
                        Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
                          Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
                            Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
                      Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
                        Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
                    Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
                      Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
                        Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
                          Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
                              Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
                                Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
                                    Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
                                    Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
                                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
                                      Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
                                        Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
                                          Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
                                            Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
                                            Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
                                              Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
                                                    Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
                                                      Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
                                                        Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
                                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
                                              Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
                                                Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
                                              Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
                                            Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
                                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
                                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
                                                  Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
                                                  Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
                                                  Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
                                            Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
                              Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
                                Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
                                  Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
                                    Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
                                      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
                              Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
                      Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
                Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
              Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
                Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
                  Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
                    Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
                      Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
                        Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
                          Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
                    Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
                      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
                Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
                Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
                  Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
                Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
      Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
        Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
          Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
      Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
      Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
        Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
          Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
      Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
        OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
          Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
        Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
          OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
            Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000

Page 4 of 8 — ← Prev page 1 2 3 [4] 5 6 7 8 Next page →

#77698

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-02 19:13 -0800
Message-ID	<f68cba27-d18e-4334-afaf-45daf0aaad34@googlegroups.com>
In reply to	#77685

On Thursday, December 3, 2015 at 12:45:31 AM UTC, Bart wrote:
> On 02/12/2015 23:24, Steve Thompson wrote:
> > On Wed, Dec 02, 2015 at 08:01:52AM -0800, fir wrote:
> 
> > In the long term, US-ASCII will be restricted to select niches, such
> > as extremely tiny embedded processors.  Or simulators of legacy
> > hardware, etc.  There is no reason to think that a text encoding
> > scheme that cannot represent arbitrary language symbols will survive
> > much into the future.
> 
> An encoding scheme that can only describe English?
> 
> (Or, with a few dozen additions to fill those spare 128 slots, French, 
> Italian, Spanish, German, maybe even pin-yin.)
> 
> You're right, that is no use at all...
> 
You use escapes. Until UTF-8 became popular, and even now, you'll
see Greek characters encoded in html as "&theta;" and so on. It's
easier if the text is essentially English with just one or two 
embedded Greek symbols, although it's not a sensible method for 
encoding flowing Greek text.

Then there is a need for special fonts. Bar codes are usually created
using fonts, for example. Another interesting use of computers is 
to try to decode the Voynich manuscript. It's written in an script
which has been discovered nowhere else, but most of it can be divided
into what seem to be pretty clearly characters, with of course a few
difficulties, and the nagging suspicion that maybe a character-based 
analysis has got the total wrong end of the stick.

[toc] | [prev] | [next] | [standalone]

#77811

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-03 07:00 +0000
Message-ID	<2qyvC0.96Q.SQT8q@gmail.com>
In reply to	#77685

On Thu, Dec 03, 2015 at 12:45:10AM +0000, BartC wrote:
> On 02/12/2015 23:24, Steve Thompson wrote:
> >On Wed, Dec 02, 2015 at 08:01:52AM -0800, fir wrote:
> 
> >In the long term, US-ASCII will be restricted to select niches, such
> >as extremely tiny embedded processors.  Or simulators of legacy
> >hardware, etc.  There is no reason to think that a text encoding
> >scheme that cannot represent arbitrary language symbols will survive
> >much into the future.
> 
> An encoding scheme that can only describe English?
> 
> (Or, with a few dozen additions to fill those spare 128 slots, French, 
> Italian, Spanish, German, maybe even pin-yin.)
> 
> You're right, that is no use at all...

But which languages?  With 128 characters you can't support all of
them (but perhaps most).  If you propose to keep ASCII and use code
pages (as I think I understand the scheme) to determine which symbols
occupy those 128 extra positions at any given time, you have merely
punted the problem to another layer.  Now I am not saying you should
not use a translation layer in your code, and then do what you will
with the upper half of char, but I think you will still need to deal
with UTF8 for language strings traversing your software.  I'm not sure
that the extra work is worth it, at least for my purposes, and as it
stands I intend to bite the bullet and just deal with UTF8 wherever I
need to interpret or display language text.

That doesn't mean I will necessarily enjoy it...

Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77814

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-04 04:45 -0800
Message-ID	<b9c423bf-38c2-4db2-bfe4-e07ad212a9d8@googlegroups.com>
In reply to	#77811

On Friday, December 4, 2015 at 12:34:15 PM UTC, Steve Thompson wrote:
>
> But which languages?  With 128 characters you can't support all of
> them (but perhaps most).  If you propose to keep ASCII and use code
> pages (as I think I understand the scheme) to determine which symbols
> occupy those 128 extra positions at any given time, you have merely
> punted the problem to another layer.  Now I am not saying you should
> not use a translation layer in your code, and then do what you will
> with the upper half of char, but I think you will still need to deal
> with UTF8 for language strings traversing your software.  I'm not sure
> that the extra work is worth it, at least for my purposes, and as it
> stands I intend to bite the bullet and just deal with UTF8 wherever I
> need to interpret or display language text.
> 
If you've got a character mapped display with one byte per character, 
then you need a code page scheme.
Moving to 16 or 32 bits per cell isn't usually a good option, because
you still need raster maps for the characters, and you won't be able to
store tens of thousands of them. Also, you need changes to the low level
display driver to support such a move.

[toc] | [prev] | [next] | [standalone]

#77844

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-04 18:04 +0000
Message-ID	<7vwXAN.WTQ.UnoUE@gmail.com>
In reply to	#77814

On Fri, Dec 04, 2015 at 04:45:55AM -0800, Malcolm McLean wrote:
> On Friday, December 4, 2015 at 12:34:15 PM UTC, Steve Thompson wrote:
> >
> > But which languages?  With 128 characters you can't support all of
> > them (but perhaps most).  If you propose to keep ASCII and use code
> > pages (as I think I understand the scheme) to determine which symbols
> > occupy those 128 extra positions at any given time, you have merely
> > punted the problem to another layer.  Now I am not saying you should
> > not use a translation layer in your code, and then do what you will
> > with the upper half of char, but I think you will still need to deal
> > with UTF8 for language strings traversing your software.  I'm not sure
> > that the extra work is worth it, at least for my purposes, and as it
> > stands I intend to bite the bullet and just deal with UTF8 wherever I
> > need to interpret or display language text.
> > 
> If you've got a character mapped display with one byte per character, 
> then you need a code page scheme.
> Moving to 16 or 32 bits per cell isn't usually a good option, because
> you still need raster maps for the characters, and you won't be able to
> store tens of thousands of them. Also, you need changes to the low level
> display driver to support such a move.

Drivers and such are a specialized problem domain.  No one even
halfway sane will suggest that you should use UTF8, etc. to represent
text in that case.  I thought this discussion was mainly concerned
with the application program domain.  If I were writing software to
display text on a microcontroller LCD I'd do something different
too.



Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77820

From	BartC <bc@freeuk.com>
Date	2015-12-04 13:22 +0000
Message-ID	<n3s3tj$8qe$1@dont-email.me>
In reply to	#77811

On 03/12/2015 07:00, Steve Thompson wrote:
> On Thu, Dec 03, 2015 at 12:45:10AM +0000, BartC wrote:
>> On 02/12/2015 23:24, Steve Thompson wrote:
>>> On Wed, Dec 02, 2015 at 08:01:52AM -0800, fir wrote:
>>
>>> In the long term, US-ASCII will be restricted to select niches, such
>>> as extremely tiny embedded processors.  Or simulators of legacy
>>> hardware, etc.  There is no reason to think that a text encoding
>>> scheme that cannot represent arbitrary language symbols will survive
>>> much into the future.
>>
>> An encoding scheme that can only describe English?
>>
>> (Or, with a few dozen additions to fill those spare 128 slots, French,
>> Italian, Spanish, German, maybe even pin-yin.)
>>
>> You're right, that is no use at all...
>
> But which languages?  With 128 characters you can't support all of
> them (but perhaps most).  If you propose to keep ASCII and use code
> pages (as I think I understand the scheme) to determine which symbols
> occupy those 128 extra positions at any given time, you have merely
> punted the problem to another layer.

What is the problem? That any computer in any country should always be 
able to deal with 99.99% of the World's alphabets that it doesn't care 
about? (I normally need about 100 characters which is 0.01% of the 1 
million in Unicode.)

I suspect that most people are mainly interested in their own language!

So that is something about Unicode I'm not comfortable with. Our nice 
tidy little alphabet (perhaps one of the reasons the West has been ahead 
technologically) is swamped by these huge character sets from around the 
world, which still don't like being marshalled into neat little units.

Fortunately the same ideas were not applied to encodings such as Morse 
code, semaphore or telex, and we don't have Scrabble sets with a million 
different blocks to make it international. While dictionaries and books 
in general still tend to deal with one (or sometimes two) languages at a 
time. Not 6000.

I think some things should be kept local (like the telephone numbers in 
a country, road designations, and car registrations, although no doubt 
people will want to unify all those as well).

As it is, Unicode is unwieldy. I don't think many would disagree.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77829

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-04 07:35 -0800
Message-ID	<f2fe527f-2fd0-47dc-a9e2-41f832b357f0@googlegroups.com>
In reply to	#77820

On Friday, December 4, 2015 at 1:22:23 PM UTC, Bart wrote:
>
> What is the problem? That any computer in any country should always be 
> able to deal with 99.99% of the World's alphabets that it doesn't care 
> about? (I normally need about 100 characters which is 0.01% of the 1 
> million in Unicode.)
> 
I've just downloaded a new operating system for this computer (an Apple
mini). The image file is 6GB. There's room there for at least one font for
every major language. As it happens Chinese is no good to me, Hebrew is.
But statistically 20% of us should be Chinese, the other way round is much
more common.
But it shouldn't matter. Anyone can sit at my computer, open a document,
and if he can read that language, he can read the information.
>
> I suspect that most people are mainly interested in their own language!
>
Mainly, yes. I read far more English than I do Hebrew. But I still want 
Hebrew on occasions - if one of Rick's posts leads to an off-topic question
that depends crucially on a word used in scripture, I want to be able
to check.
> 
> So that is something about Unicode I'm not comfortable with. Our nice 
> tidy little alphabet (perhaps one of the reasons the West has been ahead 
> technologically) is swamped by these huge character sets from around the 
> world, which still don't like being marshalled into neat little units.
> 
> Fortunately the same ideas were not applied to encodings such as Morse 
> code, semaphore or telex, and we don't have Scrabble sets with a million 
> different blocks to make it international. While dictionaries and books 
> in general still tend to deal with one (or sometimes two) languages at a 
> time. Not 6000.
> 
> I think some things should be kept local (like the telephone numbers in 
> a country, road designations, and car registrations, although no doubt 
> people will want to unify all those as well).
> 
> As it is, Unicode is unwieldy. I don't think many would disagree.
> 
The tongues of men were sundered at Babel. It's a curse and not a blessing.
But we have to deal with it.

[toc] | [prev] | [next] | [standalone]

#77847

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-04 19:17 +0000
Message-ID	<y51yVe.p8Y.TmUhC@gmail.com>
In reply to	#77820

On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
> On 03/12/2015 07:00, Steve Thompson wrote:
> >On Thu, Dec 03, 2015 at 12:45:10AM +0000, BartC wrote:
> >>On 02/12/2015 23:24, Steve Thompson wrote:
> >>>On Wed, Dec 02, 2015 at 08:01:52AM -0800, fir wrote:
> >>
> >>>In the long term, US-ASCII will be restricted to select niches, such
> >>>as extremely tiny embedded processors.  Or simulators of legacy
> >>>hardware, etc.  There is no reason to think that a text encoding
> >>>scheme that cannot represent arbitrary language symbols will survive
> >>>much into the future.
> >>
> >>An encoding scheme that can only describe English?
> >>
> >>(Or, with a few dozen additions to fill those spare 128 slots, French,
> >>Italian, Spanish, German, maybe even pin-yin.)
> >>
> >>You're right, that is no use at all...
> >
> >But which languages?  With 128 characters you can't support all of
> >them (but perhaps most).  If you propose to keep ASCII and use code
> >pages (as I think I understand the scheme) to determine which symbols
> >occupy those 128 extra positions at any given time, you have merely
> >punted the problem to another layer.
> 
> What is the problem? That any computer in any country should always be 
> able to deal with 99.99% of the World's alphabets that it doesn't care 
> about? (I normally need about 100 characters which is 0.01% of the 1 
> million in Unicode.)
> 
> I suspect that most people are mainly interested in their own language!

I don't know about most people, but I am often annoyed that my
keyboard does not show a means of generating accented characters.  I
do not use foreign-language words and phrases often, but when I do it
is more than trivially annoying to used a character picker.  Which is
why I usually write 'naive' instead of 'naïve', etc.

> So that is something about Unicode I'm not comfortable with. Our nice 
> tidy little alphabet (perhaps one of the reasons the West has been ahead 
> technologically) is swamped by these huge character sets from around the 
> world, which still don't like being marshalled into neat little units.

The West?  Are you forgetting the Europe is also part of "the West"?

The technological lead of the West is another matter, and I am sorry
if you are inconvenienced by the catch-up game underway in other parts
of the world.  Greek, APL, formal logic, mathematics, etc. are all
sufficiently pervasive that their symbols merit inclusion in any
reasonable general-use character set, and on that basis any fixation
on English is bound to be terribly short-sighted.

> Fortunately the same ideas were not applied to encodings such as Morse 
> code, semaphore or telex, and we don't have Scrabble sets with a million 
> different blocks to make it international. While dictionaries and books 
> in general still tend to deal with one (or sometimes two) languages at a 
> time. Not 6000.

Again which languages?  Software I use would be prudent to include the
capacity to render English, French, German, Swedish (Scandinavian
language generally), Greek, Latin, as well as the characters
appropriate to mathematics symbols and so on.  The preference of others
is bound to be different.  My requirements might be met with a 16-bit
encoding while an Asian speaker will have a substantially different
preference.  A single encoding scheme for everyone at least avoids the
ghettoization of any one language demographic.

> I think some things should be kept local (like the telephone numbers in 
> a country, road designations, and car registrations, although no doubt 
> people will want to unify all those as well).

Multi-lingual speakers are not likely to cooperate.

> As it is, Unicode is unwieldy. I don't think many would disagree.

I do not disagree, and if something better arises I am sure many, many
programmers sigh a little sigh of relief.  Util then...

Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77848

From	supercat@casperkitty.com
Date	2015-12-04 11:49 -0800
Message-ID	<9391fc54-b5f8-42b7-93ab-f170fb2d51eb@googlegroups.com>
In reply to	#77847

On Friday, December 4, 2015 at 1:32:22 PM UTC-6, Steve Thompson wrote:
> I don't know about most people, but I am often annoyed that my
> keyboard does not show a means of generating accented characters.  I
> do not use foreign-language words and phrases often, but when I do it
> is more than trivially annoying to used a character picker.  Which is
> why I usually write 'naive' instead of 'naïve', etc.

What annoys me is that the US-International keyboard layout terms common
ASCII characters into dead-keys.  The 1984 Macintosh had a very nice keyboard
layout which used option+grave, option+apostrophe, option+shift+apostrophe,
etc. as dead keys while leaving grave, apostrophe, and quote marks alone; I
have no idea why MS didn't do likewise.

On my own machine I've fixed the problem using a free keyboard-layout
utility published by MS.  For some weird bizarre reason, though, it's not
possible to simply tell Windows "Here's a text file holding my keyboard
layout", nor can one simply use the utility to take the text file and
install it into the system.  Instead, one must use the utility to generate
an application which will in turn install the new layout into the system.
I have no idea why things are so complicated.  Still, programming in C is
a lot nicer when things like quote marks work than when they don't.

[toc] | [prev] | [next] | [standalone]

#77853

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-04 15:39 -0600
Message-ID	<n3t126$46q$1@dont-email.me>
In reply to	#77848

On 04-Dec-15 13:49, supercat@casperkitty.com wrote:
> Steve Thompson wrote:
>> I don't know about most people, but I am often annoyed that my 
>> keyboard does not show a means of generating accented characters.
>> I do not use foreign-language words and phrases often, but when I
>> do it is more than trivially annoying to used a character picker.
>> Which is why I usually write 'naive' instead of 'naïve', etc.
> 
> What annoys me is that the US-International keyboard layout terms
> common ASCII characters into dead-keys.  The 1984 Macintosh had a
> very nice keyboard layout which used option+grave, option+apostrophe,
> option+shift+apostrophe, etc. as dead keys while leaving grave,
> apostrophe, and quote marks alone; I have no idea why MS didn't do
> likewise.

MS had to follow what the hardware vendors did, whereas Apple could
create their own standard.

I've worked around the problem by having a hotkey that toggles between
US and US Intl layouts, but I'd much prefer to "fix" the latter so that
AltGr-" et al were dead keys but plain " et al weren't.  I could leave
such a layout active 24x7; I don't mind the loss of RightAlt to AltGr,
but I do mind "Oh!" turning into Öh!"when I don't expect it.

> On my own machine I've fixed the problem using a free
> keyboard-layout utility published by MS.

I've never managed to get it to load on my system, and it's completely
unsupported by MS, so no help there.  I've tried freeware alternatives,
but none of them seem to all me to mess with dead keys in particular.

> For some weird bizarre reason, though, it's not possible to simply
> tell Windows "Here's a text file holding my keyboard layout",

That would go completely against MS's design philosophy.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77855

From	supercat@casperkitty.com
Date	2015-12-04 14:19 -0800
Message-ID	<ca45dcd6-31e2-4306-8804-aead50e6ffe9@googlegroups.com>
In reply to	#77853

On Friday, December 4, 2015 at 3:39:45 PM UTC-6, Stephen Sprunk wrote:
> MS had to follow what the hardware vendors did, whereas Apple could
> create their own standard.

In what way did hardware vendors force MS to regard ' " ` ~ ^ as dead
keys when not used with AltGr?

> > On my own machine I've fixed the problem using a free
> > keyboard-layout utility published by MS.
> 
> I've never managed to get it to load on my system, and it's completely
> unsupported by MS, so no help there.  I've tried freeware alternatives,
> but none of them seem to all me to mess with dead keys in particular.

Sorry it doesn't work for you.

> > For some weird bizarre reason, though, it's not possible to simply
> > tell Windows "Here's a text file holding my keyboard layout",
> 
> That would go completely against MS's design philosophy.

MS doesn't require dancing through hoops for fonts or even device drivers.
It's possible to go into a device manager and tell windows "I have this INF
file I want you to install."  INF files aren't the most legible things in
the world, but they can still be better than an opaque executable.

[toc] | [prev] | [next] | [standalone]

#77989

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-06 12:57 -0600
Message-ID	<n4209o$606$1@dont-email.me>
In reply to	#77855

On 04-Dec-15 16:19, supercat@casperkitty.com wrote:
> Stephen Sprunk wrote:
>> MS had to follow what the hardware vendors did, whereas Apple
>> could create their own standard.
> 
> In what way did hardware vendors force MS to regard ' " ` ~ ^ as
> dead keys when not used with AltGr?

Keycaps, essentially:
https://en.wikipedia.org/wiki/File:KB_US-International.svg

If those keys were normal without AltGr and dead with AltGr, the keycaps
would need to be replaced--and then would have been incorrect when the
user exited Windows and ran some other DOS program.  And 30 years later,
they're still stuck with that decision.

Since Apple makes their own hardware, they didn't have to worry as much
such issues--and they care a lot less about backward compatibility than
Microsoft does in the first place.

>>> For some weird bizarre reason, though, it's not possible to
>>> simply tell Windows "Here's a text file holding my keyboard
>>> layout",
>> 
>> That would go completely against MS's design philosophy.
> 
> MS doesn't require dancing through hoops for fonts or even device
> drivers. It's possible to go into a device manager and tell windows
> "I have this INF file I want you to install."  INF files aren't the
> most legible things in the world, but they can still be better than
> an opaque executable.

It's rare that drivers are _just_ an INF file; usually there are lots of
VXDs and/or DLLs, and the INF lists what needs to be installed.

In this specific case, I'm not sure why an INF or similar text format
wasn't used.  It seems MS treats keyboard layouts as drivers (with all
the attending mess) rather than a unified driver that is infinitely
configurable, but often the obvious solution has non-obvious defects.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#78013

From	supercat@casperkitty.com
Date	2015-12-06 15:47 -0800
Message-ID	<eb97e662-ea1d-46af-8b53-6da02968b663@googlegroups.com>
In reply to	#77989

On Sunday, December 6, 2015 at 12:57:23 PM UTC-6, Stephen Sprunk wrote:
> On 04-Dec-15 16:19, supercat wrote:
> Keycaps, essentially:
> https://en.wikipedia.org/wiki/File:KB_US-International.svg

What's the problem?  Have one keyboard layout where the dead keys work as
they do now, for people who are used to that behavior, and one where red
keys are those which, *when typed with Alt+GR*, act as dead keys.  The
quote/apostrophe key legends would be a little quirky from that regard, but
one could say that the blue legends are simply there as a reminder of what
accents the key will produce.

On the other hand, if Windows had an applet similar to the Macintosh "Key
Caps" desk accessory that shipped in System 1.0 the keyboard legends would
be largely irrelevant anyway.  Even without any legends on the special
keys, learning that altGr-apostrophe followed by a vowel puts an aigu over
it, etc. would be a lot easier than learning all the alt-number codes for
all the different accented characters.

[toc] | [prev] | [next] | [standalone]

#77868

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-05 01:13 +0000
Message-ID	<taz0Kc.lc7.GQBOn@gmail.com>
In reply to	#77848

On Fri, Dec 04, 2015 at 11:49:00AM -0800, supercat@casperkitty.com wrote:
> On Friday, December 4, 2015 at 1:32:22 PM UTC-6, Steve Thompson wrote:
> > I don't know about most people, but I am often annoyed that my
>
> On my own machine I've fixed the problem using a free keyboard-layout
> utility published by MS.  For some weird bizarre reason, though, it's not
> possible to simply tell Windows "Here's a text file holding my keyboard
> layout", nor can one simply use the utility to take the text file and
> install it into the system.  Instead, one must use the utility to generate
> an application which will in turn install the new layout into the system.
> I have no idea why things are so complicated.  Still, programming in C is
> a lot nicer when things like quote marks work than when they don't.

On Linux there is the loadkeys(1) utility which will install a key map
from a file, and I have previously (long ago) used it to set up
function-key macros, but it is a pain in the ass and admittedly not
terribly prominent on my sysadmin to-do list.  I have a hard enough
time remembering all of my customized editor keystrokes.

OTOH, I am currently using gnome-terminal and it happily accepts
unicode numbers (similar to the DOS feature, some CTRL-SHIFT-ALT
numberpad arrangement which escapes me at the moment) and I might be
moderately content with a crib sheet for those occasions where I need
those extra symbols.

Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77871

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2015-12-05 01:59 +0000
Message-ID	<87twnx3bax.fsf@bsb.me.uk>
In reply to	#77868

Steve Thompson <stevet810@gmail.com> writes:

> On Fri, Dec 04, 2015 at 11:49:00AM -0800, supercat@casperkitty.com wrote:
>> On Friday, December 4, 2015 at 1:32:22 PM UTC-6, Steve Thompson wrote:
>> > I don't know about most people, but I am often annoyed that my
>>
>> On my own machine I've fixed the problem using a free keyboard-layout
>> utility published by MS.  For some weird bizarre reason, though, it's not
>> possible to simply tell Windows "Here's a text file holding my keyboard
>> layout", nor can one simply use the utility to take the text file and
>> install it into the system.  Instead, one must use the utility to generate
>> an application which will in turn install the new layout into the system.
>> I have no idea why things are so complicated.  Still, programming in C is
>> a lot nicer when things like quote marks work than when they don't.
>
> On Linux there is the loadkeys(1) utility which will install a key map
> from a file, and I have previously (long ago) used it to set up
> function-key macros, but it is a pain in the ass and admittedly not
> terribly prominent on my sysadmin to-do list.  I have a hard enough
> time remembering all of my customized editor keystrokes.
>
> OTOH, I am currently using gnome-terminal and it happily accepts
> unicode numbers (similar to the DOS feature, some CTRL-SHIFT-ALT
> numberpad arrangement which escapes me at the moment)

It's Shift+Ctrl+u and then hex digits.  It works in almost all programs
on my system (including Emacs, though Emacs has it's own, often simpler
methods).

> and I might be
> moderately content with a crib sheet for those occasions where I need
> those extra symbols.

The only key map I modify is to make Insert into Compose so I can get é,
ç, ¾ and so on with memorable keys (Compose ' e, Compose , c, Compose 3
4 and so on).

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#77896

From	David Brown <david.brown@hesbynett.no>
Date	2015-12-05 17:17 +0100
Message-ID	<n3v2ib$196$1@dont-email.me>
In reply to	#77871

On 05/12/15 02:59, Ben Bacarisse wrote:
> Steve Thompson <stevet810@gmail.com> writes:
>
>> On Fri, Dec 04, 2015 at 11:49:00AM -0800, supercat@casperkitty.com wrote:
>>> On Friday, December 4, 2015 at 1:32:22 PM UTC-6, Steve Thompson wrote:
>>>> I don't know about most people, but I am often annoyed that my
>>>
>>> On my own machine I've fixed the problem using a free keyboard-layout
>>> utility published by MS.  For some weird bizarre reason, though, it's not
>>> possible to simply tell Windows "Here's a text file holding my keyboard
>>> layout", nor can one simply use the utility to take the text file and
>>> install it into the system.  Instead, one must use the utility to generate
>>> an application which will in turn install the new layout into the system.
>>> I have no idea why things are so complicated.  Still, programming in C is
>>> a lot nicer when things like quote marks work than when they don't.
>>
>> On Linux there is the loadkeys(1) utility which will install a key map
>> from a file, and I have previously (long ago) used it to set up
>> function-key macros, but it is a pain in the ass and admittedly not
>> terribly prominent on my sysadmin to-do list.  I have a hard enough
>> time remembering all of my customized editor keystrokes.
>>
>> OTOH, I am currently using gnome-terminal and it happily accepts
>> unicode numbers (similar to the DOS feature, some CTRL-SHIFT-ALT
>> numberpad arrangement which escapes me at the moment)
>
> It's Shift+Ctrl+u and then hex digits.  It works in almost all programs
> on my system (including Emacs, though Emacs has it's own, often simpler
> methods).

That's completely new to me.  I think I would still be inclined to use 
"character map" for anything that could not be done with the dead keys, 
alt-gr combinations, or compose key on my keyboard.

>
>> and I might be
>> moderately content with a crib sheet for those occasions where I need
>> those extra symbols.
>
> The only key map I modify is to make Insert into Compose so I can get é,
> ç, ¾ and so on with memorable keys (Compose ' e, Compose , c, Compose 3
> 4 and so on).
>

I use "scroll lock" for my compose key.  But most non-ASCII characters 
that I need are accessible more directly on my keyboard - I don't know 
if that's because it is a Norwegian layout rather than an English 
layout.  (Obviously the Norwegian letters åøæ are more easily 
accessible, but I am thinking of things like µ π ² ° § ½ ¼ ×

[toc] | [prev] | [next] | [standalone]

#77941

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-06 06:28 +0000
Message-ID	<3Nz2vb.7Mm.SIGAI@gmail.com>
In reply to	#77871

On Sat, Dec 05, 2015 at 01:59:34AM +0000, Ben Bacarisse wrote:
> Steve Thompson <stevet810@gmail.com> writes:
> 
> > On Fri, Dec 04, 2015 at 11:49:00AM -0800, supercat@casperkitty.com wrote:
> >> On Friday, December 4, 2015 at 1:32:22 PM UTC-6, Steve Thompson wrote:
> >> > I don't know about most people, but I am often annoyed that my
> >>
> >> On my own machine I've fixed the problem using a free keyboard-layout
> >> utility published by MS.  For some weird bizarre reason, though, it's not
> >> possible to simply tell Windows "Here's a text file holding my keyboard
> >> layout", nor can one simply use the utility to take the text file and
> >> install it into the system.  Instead, one must use the utility to generate
> >> an application which will in turn install the new layout into the system.
> >> I have no idea why things are so complicated.  Still, programming in C is
> >> a lot nicer when things like quote marks work than when they don't.
> >
> > On Linux there is the loadkeys(1) utility which will install a key map
> > from a file, and I have previously (long ago) used it to set up
> > function-key macros, but it is a pain in the ass and admittedly not
> > terribly prominent on my sysadmin to-do list.  I have a hard enough
> > time remembering all of my customized editor keystrokes.
> >
> > OTOH, I am currently using gnome-terminal and it happily accepts
> > unicode numbers (similar to the DOS feature, some CTRL-SHIFT-ALT
> > numberpad arrangement which escapes me at the moment)
> 
> It's Shift+Ctrl+u and then hex digits.  It works in almost all programs
> on my system (including Emacs, though Emacs has it's own, often simpler
> methods).

That's the one.  I use a much, much smaller clone of Emacs called jove
which lacks a built-in programming language and does not support
unicode directly.  
 
> > and I might be
> > moderately content with a crib sheet for those occasions where I need
> > those extra symbols.
> 
> The only key map I modify is to make Insert into Compose so I can get é,
> ç, ¾ and so on with memorable keys (Compose ' e, Compose , c, Compose 3
> 4 and so on).

My laptop has a "Windows" key which I might repurpose but it is not
much of a priority at the moment.



Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77857

From	BartC <bc@freeuk.com>
Date	2015-12-04 23:46 +0000
Message-ID	<n3t8h6$3ip$1@dont-email.me>
In reply to	#77847

On 04/12/2015 19:17, Steve Thompson wrote:
> On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:

>> So that is something about Unicode I'm not comfortable with. Our nice
>> tidy little alphabet (perhaps one of the reasons the West has been ahead
>> technologically) is swamped by these huge character sets from around the
>> world, which still don't like being marshalled into neat little units.
>
> The West?  Are you forgetting the Europe is also part of "the West"?

No. But western Europe at least still uses small alphabets, and mostly 
they are based around A-Z.

> The technological lead of the West is another matter, and I am sorry
> if you are inconvenienced by the catch-up game underway in other parts
> of the world.  Greek, APL, formal logic, mathematics, etc. are all
> sufficiently pervasive that their symbols merit inclusion in any
> reasonable general-use character set, and on that basis any fixation
> on English is bound to be terribly short-sighted.

Fine, then we move to 16 bits, which had long been anticipated anyway, 
and gives us plenty of room for special symbols. But not if we have to 
throw in every single alphabet and writing system that anybody has ever 
heard of (and apparently plenty that no one has heard of!).

(And then you have vast, sprawling 'alphabets' like Chinese which are 
words rather than the letters used to build the words.)

It just sounds 'off'. It reminds me of those early 'text-mode' displays 
where, instead of having proper pixel-graphics, some character codes 
were set aside to display a limited range of pre-determined patterns.

To be able to display any arbitrary pattern, you need pixel-addressable 
graphics.

So we really want a more flexible of specifying any character or symbol 
without just enumerating every single one can think of.

(Imagine you were in the position of creating a new font, with a 
hundreds of thousands of to design! I've done that, but for only 100 
characters.)

> Again which languages?  Software I use would be prudent to include the
> capacity to render English, French, German, Swedish (Scandinavian
> language generally), Greek, Latin,

What's special about Latin?

  as well as the characters
> appropriate to mathematics symbols and so on.

And mathematics really requires control over layout. You will probably 
end up representing formulae in some sort of mark-up language anyway, or 
you will be writing them using a special editor that might store content 
in some binary format; whether it uses Unicode is then irrelevant.

(Actually I've tried using the correct mathematical symbols within 
programming language syntax, such as × for multiply and ² for squared (y 
= x²). But it looked too gimmicky, as well as being fiddly to type in.)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77867

From	Steve Thompson <stevet810@gmail.com>
Date	2015-12-05 01:04 +0000
Message-ID	<wFe7nL.cjz.nHu02@gmail.com>
In reply to	#77857

On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote:
> On 04/12/2015 19:17, Steve Thompson wrote:
> >On Fri, Dec 04, 2015 at 01:22:04PM +0000, BartC wrote:
> 
> >>So that is something about Unicode I'm not comfortable with. Our nice
> >>tidy little alphabet (perhaps one of the reasons the West has been ahead
> >>technologically) is swamped by these huge character sets from around the
> >>world, which still don't like being marshalled into neat little units.
> >
> >The West?  Are you forgetting the Europe is also part of "the West"?
> 
> No. But western Europe at least still uses small alphabets, and mostly 
> they are based around A-Z.

Nitpick.  Once the major European languages are included from Spanish
to Finnish and everything in-between, how many code points are left?

> >The technological lead of the West is another matter, and I am sorry
> >if you are inconvenienced by the catch-up game underway in other parts
> >of the world.  Greek, APL, formal logic, mathematics, etc. are all
> >sufficiently pervasive that their symbols merit inclusion in any
> >reasonable general-use character set, and on that basis any fixation
> >on English is bound to be terribly short-sighted.
> 
> Fine, then we move to 16 bits, which had long been anticipated anyway, 
> and gives us plenty of room for special symbols. But not if we have to 
> throw in every single alphabet and writing system that anybody has ever 
> heard of (and apparently plenty that no one has heard of!).

I rather suspect the Anthropologists will scream bloody murder if
Egyptian hieroglyphics, Linear B, and all the rest are excluded.

> (And then you have vast, sprawling 'alphabets' like Chinese which are 
> words rather than the letters used to build the words.)

So go tell the Chinese (and Japanese, and Thais, and ...) that they
should man-up and use a Western alphabet.  Such schemes exist, after
all.

> It just sounds 'off'. It reminds me of those early 'text-mode' displays 
> where, instead of having proper pixel-graphics, some character codes 
> were set aside to display a limited range of pre-determined patterns.
> 
> To be able to display any arbitrary pattern, you need pixel-addressable 
> graphics.
> 
> So we really want a more flexible of specifying any character or symbol 
> without just enumerating every single one can think of.
> 
> (Imagine you were in the position of creating a new font, with a 
> hundreds of thousands of to design! I've done that, but for only 100 
> characters.)

The font weenies will probably figure something out.  This is not my
concern.  Publishers have already invested in the languages they
print.

> >Again which languages?  Software I use would be prudent to include the
> >capacity to render English, French, German, Swedish (Scandinavian
> >language generally), Greek, Latin,
> 
> What's special about Latin?

Bad example.  Perhaps Russian is a better choice; I hear it is a great
language for cursing, comrade.  And ignoring prose for the moment,
should people's very names not be representable in their canonical
form?

>  as well as the characters
> >appropriate to mathematics symbols and so on.
> 
> And mathematics really requires control over layout. You will probably 
> end up representing formulae in some sort of mark-up language anyway, or 
> you will be writing them using a special editor that might store content 
> in some binary format; whether it uses Unicode is then irrelevant.
> 
> (Actually I've tried using the correct mathematical symbols within 
> programming language syntax, such as × for multiply and ² for squared (y 
> = x²). But it looked too gimmicky, as well as being fiddly to type in.)

Mathematics is a good example, nonetheless.  Then we have physics, all
the unit symbols (degrees Centigrade, ohms, Angstroms, and on and on
and on), and, and and and.  Without complete coverage standards bodies
and software houses are put in the position of picking and choosing
the winners and losers.  Formula markup is a problem as well, but
distinct from representing glyphs, and if you're going to start down
that road we can include diagrams and graphs as well as a supplemental
requirement for representing certain classes of idea.

The general problem at hand is the representation of written
communications, which arguably includes non-textual forms like napkin
scribbles and the like.  Unicode doesn't do anything to help represent
freehand drawing or cave paintings, but so what.  The line must be
drawn somewhere, and I think it is unreasonable to exclude NZ Maori
script merely because so few people actually use it.

Regards,

Steve Thompson

-- 
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden."  -- MysteryDog in 24hoursupport.helpdesk.

[toc] | [prev] | [next] | [standalone]

#77878

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-05 03:21 -0800
Message-ID	<622998c5-112a-4334-a76d-05f563b3fb26@googlegroups.com>
In reply to	#77867

On Saturday, December 5, 2015 at 1:32:42 AM UTC, Steve Thompson wrote:
> 
> I rather suspect the Anthropologists will scream bloody murder if
> Egyptian hieroglyphics, Linear B, and all the rest are excluded.
>  
Not really. You've got marginal, poorly documented scripts, and
if you happen to be working with one you don't really expect to
be able to fire up a word processor and type in the symbols.
However there are plenty of spare code points, and supporting oddballs
is rather fun. 
However you need to be able to support every language that a 
"consumer", which I would define as someone who is neither interested
in programming nor in the language itself except as a way of
expressing something, might want to have.
>
> > (Imagine you were in the position of creating a new font, with a 
> > hundreds of thousands of to design! I've done that, but for only 100 
> > characters.)
> 
> The font weenies will probably figure something out.  This is not my
> concern.  Publishers have already invested in the languages they
> print.
>
That's a serious issue. The free software foundation is trying to
put together a full unicode font, but it's a massive undertaking,
when I looked at it for Baby X it wasn't yet completed.
Then even if you allow the user to load a unicode-keyed font
(the route Baby X takes), you still haven't supported every language
because of the layout rules. Where does the boundary between
character representation and layout markup lie? Eg is x squared
eks, two, and marked up as  superscript, or is it eks, superscript_
two. What about root x + 1? Is that root, eks, plus, one, or openroot,
eks, plus, one, closeroot ?
  
The question is whether something that is largely functional but a bit
buggy when you start stressing it with adventures into pointed Hebrew
and the like is better or worse than something which works perfectly
but is more limited. Most professionals prefer the latter, business
life is all about presenting an image, not about offering the customer
functionality to meet his needs.

[toc] | [prev] | [next] | [standalone]

#77907

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-05 13:03 -0600
Message-ID	<n3vc93$939$1@dont-email.me>
In reply to	#77878

On 05-Dec-15 05:21, Malcolm McLean wrote:
> Steve Thompson wrote:
>> I rather suspect the Anthropologists will scream bloody murder if 
>> Egyptian hieroglyphics, Linear B, and all the rest are excluded.
>> 
> Not really. You've got marginal, poorly documented scripts, and if
> you happen to be working with one you don't really expect to be able
> to fire up a word processor and type in the symbols. However there
> are plenty of spare code points, and supporting oddballs is rather
> fun. However you need to be able to support every language that a 
> "consumer", which I would define as someone who is neither
> interested in programming nor in the language itself except as a way
> of expressing something, might want to have.

Indeed, and going from UCS-2 to UCS-4 gave us so much code space that
there's no good reason _not_ to assign a few code points to scripts like
Klingon.  OTOH, despite being a fictional language, there _are_ more
Klingon speakers now than many natural languages have left, so perhaps
that isn't as silly as it seems.

Of the ~6000 languages today, only _half_ are being taught to children
(probably the greatest ethnocide in history), and it seems like every
day there's a story about how "last living speaker" of another one has
died--and anthropologists and linguists are scrambling to capture as
much data about the remaining ones as they can while they can.

On the plus side, at this rate we'll be able to rebuild the Tower of
Babel within a few centuries.

>>> (Imagine you were in the position of creating a new font, with a
>>>  hundreds of thousands of to design! I've done that, but for only
>>> 100 characters.)
>> 
>> The font weenies will probably figure something out.  This is not
>> my concern.  Publishers have already invested in the languages
>> they print.
>> 
> That's a serious issue. The free software foundation is trying to put
> together a full unicode font, but it's a massive undertaking, when I
> looked at it for Baby X it wasn't yet completed.

Bah.  Just do what other modern GUIs do: if the selected font doesn't
have the glyph you need, check all the other installed fonts.  You need
several dozen script-specific fonts to cover every assigned code point,
but it's a lot easier than trying to build one enormous font.

> The question is whether something that is largely functional but a
> bit buggy when you start stressing it with adventures into pointed
> Hebrew and the like is better or worse than something which works
> perfectly but is more limited. Most professionals prefer the latter,
> business life is all about presenting an image, not about offering
> the customer functionality to meet his needs.

In some problem domains, no answer is better than a wrong answer, and in
others, it's the opposite.  And sometimes, getting a wrong answer
quickly is better than getting a correct one slowly.

Text layout, though, seems to be one of the areas where folks (all of
them, not just businesses) do demand perfection, and I'd agree that's at
least partly about image, but it's also about how it's such a simple
problem for humans that anyone who gets it wrong looks like an idiot,
even though it's actually a difficult problem for computers.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

Page 4 of 8 — ← Prev page 1 2 3 [4] 5 6 7 8 Next page →

csiph-web

unicode is a fail

Contents

#77698

#77811

#77814

#77844

#77820

#77829

#77847

#77848

#77853

#77855

#77989

#78013

#77868

#77871

#77896

#77941

#77857

#77867

#77878

#77907