Groups > comp.lang.c > #77357 > unrolled thread

Working efficiently with 32-bit Unicode output streams, locale etc.

Started by	"Morten W. Petersen" <morphex@gmail.com>
First post	2015-11-29 01:06 +0100
Last post	2015-12-02 09:58 -0800
Articles	20 on this page of 210 — 25 participants

Back to article view | Back to comp.lang.c

  Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
                                Re: Working efficiently with 32-bit Unicode output streams, locale     etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
                                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800

Page 2 of 11 — ← Prev page 1 [2] 3 4 … 11 Next page →

#77424

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 02:34 +0100
Message-ID	<n3g921$vlq$1@speranza.aioe.org>
In reply to	#77423

On 30.11.2015 02:15, Ian Collins wrote:
> Morten W. Petersen wrote:
>> On 30.11.2015 01:54, Ian Collins wrote:
>>> Morten W. Petersen wrote:
>>>>
>>>> I really don't want to do more than I have to, what I want to do is
>>>> create a library/program that will enable reading, writing and
>>>> manipulating XML files.  And do it securely, correctly, and fast.
>>>
>>> So stick to the encoding petty much every XML file uses (and every
>>> parser must support): UTF-8!
>>>
>>>> Everyone is on about UTF-8 it seems, and that's the world as it is
>>>> today.. UTF-16 is sort of the middle way which requires some tricks
>>>> to represent all characters while UTF-32 is what it is, but requires
>>>> more storage.
>>>
>>> UTF-8 is where the world will probably stay.
>>
>> I think that's a bold claim.. :D
>
> Have you ever encountered a UTF-32 (or UTF-16) encoded XML document?  I
> can't imagine why anyone would want to create one given the lack of
> applications that can read, let alone parse, it.  UTF-8 is universally
> popular (off Windows) because being a super-set of ASCII, just about
> anything can display it.

Well, let's say you have some organization that wants to create an
archive of lots of non-latin history, in XML.

For them, choosing XML is right, and UTF-8 uses 3 bytes on characters 
U+0800 through U+FFFF, but only 2 bytes in UTF-16.

However, UTF-16 is vulnerable to the entire string being corrupted
after invalid data has been encountered.

So this organization chooses to use UTF-32, because the unnecessary byte
there also acts as a delimiter.

This is plausible.

As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier
discussion on comp.lang.c.

-Morten

[toc] | [prev] | [next] | [standalone]

#77425 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-11-30 14:42 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc1nseF2cb8U3@mid.individual.net>
In reply to	#77424

Morten W. Petersen wrote:
> On 30.11.2015 02:15, Ian Collins wrote:
>> Morten W. Petersen wrote:
>>> On 30.11.2015 01:54, Ian Collins wrote:
>>>> Morten W. Petersen wrote:
>>>>>
>>>>> I really don't want to do more than I have to, what I want to do is
>>>>> create a library/program that will enable reading, writing and
>>>>> manipulating XML files.  And do it securely, correctly, and fast.
>>>>
>>>> So stick to the encoding petty much every XML file uses (and every
>>>> parser must support): UTF-8!
>>>>
>>>>> Everyone is on about UTF-8 it seems, and that's the world as it is
>>>>> today.. UTF-16 is sort of the middle way which requires some tricks
>>>>> to represent all characters while UTF-32 is what it is, but requires
>>>>> more storage.
>>>>
>>>> UTF-8 is where the world will probably stay.
>>>
>>> I think that's a bold claim.. :D
>>
>> Have you ever encountered a UTF-32 (or UTF-16) encoded XML document?  I
>> can't imagine why anyone would want to create one given the lack of
>> applications that can read, let alone parse, it.  UTF-8 is universally
>> popular (off Windows) because being a super-set of ASCII, just about
>> anything can display it.
>
> Well, let's say you have some organization that wants to create an
> archive of lots of non-latin history, in XML.
>
> For them, choosing XML is right, and UTF-8 uses 3 bytes on characters
> U+0800 through U+FFFF, but only 2 bytes in UTF-16.

If they had the good sense to use a compressed filesystem, that wouldn't 
matter!

> However, UTF-16 is vulnerable to the entire string being corrupted
> after invalid data has been encountered.
>
> So this organization chooses to use UTF-32, because the unnecessary byte
> there also acts as a delimiter.
>
> This is plausible.

It may be, but I'll ask again: Have you ever encountered a UTF-32 
encoded XML document?

Ian Collins

[toc] | [prev] | [next] | [standalone]

#77428

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 04:16 +0100
Message-ID	<n3gf11$9a7$1@speranza.aioe.org>
In reply to	#77425

On 30.11.2015 02:42, Ian Collins wrote:
> Morten W. Petersen wrote:
[...]
>> This is plausible.
>
> It may be, but I'll ask again: Have you ever encountered a UTF-32
> encoded XML document?

No, can't say I have.

-Morten

[toc] | [prev] | [next] | [standalone]

#77427

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-11-29 20:20 -0600
Message-ID	<n3gble$sbh$1@dont-email.me>
In reply to	#77424

On 29-Nov-15 19:34, Morten W. Petersen wrote:
> On 30.11.2015 02:15, Ian Collins wrote:
>> Morten W. Petersen wrote:
>>> On 30.11.2015 01:54, Ian Collins wrote:
>>>> UTF-8 is where the world will probably stay.
>>> 
>>> I think that's a bold claim.. :D

It's not bold at all; stats clearly show UTF-8 is now blowing away all
other encodings.  The trend in that direction has been solid for over
two decades, and there is no logical reason for it to ever reverse.

>> Have you ever encountered a UTF-32 (or UTF-16) encoded XML
>> document?  I can't imagine why anyone would want to create one
>> given the lack of applications that can read, let alone parse, it.
>> UTF-8 is universally popular (off Windows) because being a
>> super-set of ASCII, just about anything can display it.
> 
> Well, let's say you have some organization that wants to create an 
> archive of lots of non-latin history, in XML.
> 
> For them, choosing XML is right,

That's part of the stated requirements above, not a choice.

> and UTF-8 uses 3 bytes on characters U+0800 through U+FFFF, but
> only 2 bytes in UTF-16.

OTOH, UTF-8 uses only 1 byte for U+000000 through U+00007F whereas
UTF-16 uses 2 bytes; this is important for XML (or HTML) because that
range includes XML (or HTML) markup characters, whitespace, etc., with
the result that total document size is usually smaller even for scripts
in the U+000800 to U+00FFFF range.

For a detailed comparison:
https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Eight-bit_environments

Even for dense text, a general-purpose compressor results in about the
same size anyway, so if that's expected, you can forget about the few
cases where UTF-8 loses on size and focus on portability and
robustness--where UTF-8 even more clearly wins.

> However, UTF-16 is vulnerable to the entire string being corrupted 
> after invalid data has been encountered.

No; UTF-16 is self-synchronizing, just like UTF-8.  But it's a lot
easier for UTF-16 to _get_ corrupted because most people forget that
it's a variable-length encoding than do UTF-8.

> So this organization chooses to use UTF-32, because the unnecessary
> byte there also acts as a delimiter.

No, it doesn't.

> This is plausible.

No, it isn't.

There is a reason that _nobody_ uses UTF-32 for file/wire formats.

> As for the rest of the UTF-8 vs 16 and 32 debate, look at the
> earlier discussion on comp.lang.c.

Do you have a Message-ID or Subject to reference?

All such discussions I can recall have strongly favored UTF-8, unless
you're targeting _only_ Windows or Java, where UTF-16 (or UCS-2 falsely
labeled as UTF-16, hence endless bugs) is inescapable.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77429

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 04:34 +0100
Message-ID	<n3gg2j$anu$1@speranza.aioe.org>
In reply to	#77427

On 30.11.2015 03:20, Stephen Sprunk wrote:
> On 29-Nov-15 19:34, Morten W. Petersen wrote:
>> On 30.11.2015 02:15, Ian Collins wrote:
>>> Morten W. Petersen wrote:
>>>> On 30.11.2015 01:54, Ian Collins wrote:
>>>>> UTF-8 is where the world will probably stay.
>>>>
>>>> I think that's a bold claim.. :D
>
> It's not bold at all; stats clearly show UTF-8 is now blowing away all
> other encodings.  The trend in that direction has been solid for over
> two decades, and there is no logical reason for it to ever reverse.

Mm, yes.  And UTF-8 superseded ASCII and ISO-8859-1.  Are you saying
that UTF-16/32 will never be more popular than UTF-8?

>> and UTF-8 uses 3 bytes on characters U+0800 through U+FFFF, but
>> only 2 bytes in UTF-16.
>
> OTOH, UTF-8 uses only 1 byte for U+000000 through U+00007F whereas
> UTF-16 uses 2 bytes; this is important for XML (or HTML) because that
> range includes XML (or HTML) markup characters, whitespace, etc., with
> the result that total document size is usually smaller even for scripts
> in the U+000800 to U+00FFFF range.
>
> For a detailed comparison:
> https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Eight-bit_environments
>
> Even for dense text, a general-purpose compressor results in about the
> same size anyway, so if that's expected, you can forget about the few
> cases where UTF-8 loses on size and focus on portability and
> robustness--where UTF-8 even more clearly wins.

Yes that's true that XML markup is in the ASCII/UTF-8 range.

But UTF-32 does not require encoding or decoding for any character.

>> However, UTF-16 is vulnerable to the entire string being corrupted
>> after invalid data has been encountered.
>
> No; UTF-16 is self-synchronizing, just like UTF-8.  But it's a lot
> easier for UTF-16 to _get_ corrupted because most people forget that
> it's a variable-length encoding than do UTF-8.

What do you think about the odd number statement here?

https://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16

>> So this organization chooses to use UTF-32, because the unnecessary
>> byte there also acts as a delimiter.
>
> No, it doesn't.

It is a delimiter, but maybe it requires some looking back and
forth to decide what is the padding byte and what's part of
the actual Unicode character.

>> This is plausible.
>
> No, it isn't.

Why not?

>> As for the rest of the UTF-8 vs 16 and 32 debate, look at the
>> earlier discussion on comp.lang.c.
>
> Do you have a Message-ID or Subject to reference?
>
> All such discussions I can recall have strongly favored UTF-8, unless
> you're targeting _only_ Windows or Java, where UTF-16 (or UCS-2 falsely
> labeled as UTF-16, hence endless bugs) is inescapable.

I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good starting
point.

-Morten

[toc] | [prev] | [next] | [standalone]

#77430 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-11-30 17:09 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc20g8F2cb8U4@mid.individual.net>
In reply to	#77429

Morten W. Petersen wrote:
> On 30.11.2015 03:20, Stephen Sprunk wrote:
>> On 29-Nov-15 19:34, Morten W. Petersen wrote:
>>> On 30.11.2015 02:15, Ian Collins wrote:
>>>> Morten W. Petersen wrote:
>>>>> On 30.11.2015 01:54, Ian Collins wrote:
>>>>>> UTF-8 is where the world will probably stay.
>>>>>
>>>>> I think that's a bold claim.. :D
>>
>> It's not bold at all; stats clearly show UTF-8 is now blowing away all
>> other encodings.  The trend in that direction has been solid for over
>> two decades, and there is no logical reason for it to ever reverse.
>
> Mm, yes.  And UTF-8 superseded ASCII and ISO-8859-1.  Are you saying
> that UTF-16/32 will never be more popular than UTF-8?

Given there's no good reason not to prefer UTF-8, the answer there is a 
pretty solid yes.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]

#77432

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 06:17 +0100
Message-ID	<n3gm46$k8h$1@speranza.aioe.org>
In reply to	#77430

On 30.11.2015 05:09, Ian Collins wrote:
> Morten W. Petersen wrote:
>> On 30.11.2015 03:20, Stephen Sprunk wrote:
>>> On 29-Nov-15 19:34, Morten W. Petersen wrote:
>>>> On 30.11.2015 02:15, Ian Collins wrote:
>>>>> Morten W. Petersen wrote:
>>>>>> On 30.11.2015 01:54, Ian Collins wrote:
>>>>>>> UTF-8 is where the world will probably stay.
>>>>>>
>>>>>> I think that's a bold claim.. :D
>>>
>>> It's not bold at all; stats clearly show UTF-8 is now blowing away all
>>> other encodings.  The trend in that direction has been solid for over
>>> two decades, and there is no logical reason for it to ever reverse.
>>
>> Mm, yes.  And UTF-8 superseded ASCII and ISO-8859-1.  Are you saying
>> that UTF-16/32 will never be more popular than UTF-8?
>
> Given there's no good reason not to prefer UTF-8, the answer there is a
> pretty solid yes.

Well as you see in the link in the post above, it says that

"As a result, text in (for example) Chinese, Japanese or Hindi will
take more space in UTF-8 if there are more of these characters than
there are ASCII characters."

Now I had a look at an .odt export from my Google Drive, and the
content.xml there is a horrible mess; the text itself is about
3 KB, if I wrote that to a proper XHTML+CSS file, it would maybe
be 5 KB.  Unzipped the .odt file is 250KB.

We could argue back and forth about this, but I think the only real
way to settle a discussion is with real data and hard numbers, and I
don't think any of us has the time or energy to do that.

Internally, Smash XML will use 32 bits.  It might add an option to
use only 21 bits later, if someone has a real need for it.  To make
the output 32 bits by default seems like an OK solution to me.  I
like the idea of having for example bzip as a compression method
if saving space or reducing read/write times from storage is important.

That the parser must accept UTF-8 and 16 to be a parser that follows
the rules is OK.

-Morten

[toc] | [prev] | [next] | [standalone]

#77437 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-11-30 19:44 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc29hvFi96jU1@mid.individual.net>
In reply to	#77432

Morten W. Petersen wrote:
> On 30.11.2015 05:09, Ian Collins wrote:
>> Morten W. Petersen wrote:
>>>
>>> Mm, yes.  And UTF-8 superseded ASCII and ISO-8859-1.  Are you saying
>>> that UTF-16/32 will never be more popular than UTF-8?
>>
>> Given there's no good reason not to prefer UTF-8, the answer there is a
>> pretty solid yes.
>
> Well as you see in the link in the post above, it says that
>
> "As a result, text in (for example) Chinese, Japanese or Hindi will
> take more space in UTF-8 if there are more of these characters than
> there are ASCII characters."

With file compression, that is irrelevant.

> Now I had a look at an .odt export from my Google Drive, and the
> content.xml there is a horrible mess; the text itself is about
> 3 KB, if I wrote that to a proper XHTML+CSS file, it would maybe
> be 5 KB.  Unzipped the .odt file is 250KB.

That's XML for you.  That's why XML office formats use zip files.

> We could argue back and forth about this, but I think the only real
> way to settle a discussion is with real data and hard numbers, and I
> don't think any of us has the time or energy to do that.

The best "real data and hard numbers" is the ratio between UTF-8 encoded 
XML documents and the rest!

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]

#77435

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-11-29 23:36 -0600
Message-ID	<n3gn5a$jju$1@dont-email.me>
In reply to	#77429

On 29-Nov-15 21:34, Morten W. Petersen wrote:
> On 30.11.2015 03:20, Stephen Sprunk wrote:
>> It's not bold at all; stats clearly show UTF-8 is now blowing away
>> all other encodings.  The trend in that direction has been solid
>> for over two decades, and there is no logical reason for it to ever
>> reverse.
> 
> Mm, yes.  And UTF-8 superseded ASCII and ISO-8859-1.  Are you saying 
> that UTF-16/32 will never be more popular than UTF-8?

For file/wire formats, yes, I'm saying that.   We have tried that
experiment, and UTF-8 has clearly won.

UTF-32 (and, for a time, UTF-16) may persist a bit longer for internal
uses, particularly for string manipulation (though the vast majority of
string handling treats them as opaque blobs and is just fine with
UTF-8), but that's it.

>> Even for dense text, a general-purpose compressor results in about
>> the same size anyway, so if that's expected, you can forget about
>> the few cases where UTF-8 loses on size and focus on portability
>> and robustness--where UTF-8 even more clearly wins.
> 
> Yes that's true that XML markup is in the ASCII/UTF-8 range.

It's in the ASCII range; _all_ code points are in the "UTF-8 range",
making the latter a meaningless term.

> But UTF-32 does not require encoding or decoding for any character.

UTF-32 still requires deals with BOMs and LE vs BE.  ITYM that UTF-32
means a code unit equals a code point, but that is of dubious value in
the vast majority of circumstances since neither matches what _users_
consider a "character", i.e. a grapheme cluster.

>>> However, UTF-16 is vulnerable to the entire string being
>>> corrupted after invalid data has been encountered.
>> 
>> No; UTF-16 is self-synchronizing, just like UTF-8.  But it's a lot 
>> easier for UTF-16 to _get_ corrupted because most people forget
>> that it's a variable-length encoding than do UTF-8.
> 
> What do you think about the odd number statement here?
> 
> https://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16

Ah, true; if you _add or remove_ an odd number of bytes, that is a
serious problem for UTF-16, but for a reasonably long text, you will
eventually encounter an invalid surrogate pair, which would allow you to
resynchronize--though most decoders probably don't bother.

UTF-8 doesn't have that problem; it will always resynchronize on the
very next character since leading and trailing bytes are distinct.

>>> As for the rest of the UTF-8 vs 16 and 32 debate, look at the 
>>> earlier discussion on comp.lang.c.
>> 
>> Do you have a Message-ID or Subject to reference?
>> 
>> All such discussions I can recall have strongly favored UTF-8,
>> unless you're targeting _only_ Windows or Java, where UTF-16 (or
>> UCS-2 falsely labeled as UTF-16, hence endless bugs) is
>> inescapable.
> 
> I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good
> starting point.

It seems like that particular debate comes down to people from certain
countries unhappy that their script requires 3 bytes per code point in
UTF-8 but only 2 bytes per code point in UTF-16, and your answer was to
make _all_ scripts require 4 bytes per code point.

Sometimes politics force us to do dumb things, and if that's the case
then so be it, but that doesn't make it not a dumb thing.

Note that, even in those countries, UTF-8 has clearly eclipsed all other
encodings in actual use, politics be damned.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77436

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 07:39 +0100
Message-ID	<n3gqu2$rvn$1@speranza.aioe.org>
In reply to	#77435

On 30.11.2015 06:36, Stephen Sprunk wrote:
> On 29-Nov-15 21:34, Morten W. Petersen wrote:
[...]
>> I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good
>> starting point.
>
> It seems like that particular debate comes down to people from certain
> countries unhappy that their script requires 3 bytes per code point in
> UTF-8 but only 2 bytes per code point in UTF-16, and your answer was to
> make _all_ scripts require 4 bytes per code point.
>
> Sometimes politics force us to do dumb things, and if that's the case
> then so be it, but that doesn't make it not a dumb thing.
>
> Note that, even in those countries, UTF-8 has clearly eclipsed all other
> encodings in actual use, politics be damned.

Mhm.  Well UTF-8 favors certain characters over others as you say.  If
you compress that, or a UTF-32 document, the size should be about the
same.

I think it's more fair that any given character takes the same amount
of space uncompressed, and then compression can be applied if it is
necessary to save space.

I think it's a good, clean design to accept UTF 8, 16 and 32, and then
output in 32.  Internally things are 32 bits, and no code internally
has to "work around" strings in UTF-8.

Interestingly enough, this page

http://w3techs.com/technologies/overview/site_element/all

says that only 2/3rds of websites use compression.

-Morten

[toc] | [prev] | [next] | [standalone]

#77477

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-11-30 13:56 -0600
Message-ID	<n3i9h3$ff3$1@dont-email.me>
In reply to	#77436

On 30-Nov-15 00:39, Morten W. Petersen wrote:
> On 30.11.2015 06:36, Stephen Sprunk wrote:
>> It seems like that particular debate comes down to people from
>> certain countries unhappy that their script requires 3 bytes per
>> code point in UTF-8 but only 2 bytes per code point in UTF-16, and
>> your answer was to make _all_ scripts require 4 bytes per code
>> point.
>> 
>> Sometimes politics force us to do dumb things, and if that's the
>> case then so be it, but that doesn't make it not a dumb thing.
>> 
>> Note that, even in those countries, UTF-8 has clearly eclipsed all
>> other encodings in actual use, politics be damned.
> 
> Mhm.  Well UTF-8 favors certain characters over others as you say. If
> you compress that, or a UTF-32 document, the size should be about the
> same.

The compressed size is a rough measure of entropy, which won't be
affected (much) by the encoding scheme of the uncompressed data.

> I think it's more fair that any given character takes the same
> amount of space uncompressed, and then compression can be applied if
> it is necessary to save space.

Yes, some people think the only "fair" solution to poverty is to make
_everyone_ poor.  Most people disagree, particularly the non-poor.

> I think it's a good, clean design to accept UTF 8, 16 and 32, and
> then output in 32. 

Why, when _every_ tool that might consume your output expects UTF-8, and
no other tool is going to produce anything but UTF-8 as your input?

> Internally things are 32 bits, and no code internally has to "work
> around" strings in UTF-8.

The internal representation you choose is another matter entirely;
UTF-32 is a reasonable choice in many circumstances, with a conversion
from/to UTF-8 on input/output.  In years past, I'd have suggested also
supporting other encodings for input/output, but now that's pointless.

> Interestingly enough, this page
> 
> http://w3techs.com/technologies/overview/site_element/all
> 
> says that only 2/3rds of websites use compression.

It's not clear exactly what that's measuring, and their "technologies
overview" is surprisingly unhelpful.  In particular, there are several
possible types and methods of compression, and it's not clear they're
measuring all of them.

Also, by volume, the vast majority of Web content is static image and
video files that are already highly compressed; some web sites compress
static text content, but many don't bother because it's not worth the
effort, and compressing dynamic text content is a pain.

OTOH, one click away on that web site is this:
http://w3techs.com/technologies/overview/character_encoding/all

UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), while
UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at all_.

In reality, this part of the debate was over 10+ years ago when Google
announced that UTF-8 had reached majority status, i.e. more popular than
all other encodings _combined_.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77496

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-12-01 09:17 +0100
Message-ID	<n3jl18$1em$1@speranza.aioe.org>
In reply to	#77477

On 30.11.2015 20:56, Stephen Sprunk wrote:
> On 30-Nov-15 00:39, Morten W. Petersen wrote:
>> On 30.11.2015 06:36, Stephen Sprunk wrote:
>>> It seems like that particular debate comes down to people from
>>> certain countries unhappy that their script requires 3 bytes per
>>> code point in UTF-8 but only 2 bytes per code point in UTF-16, and
>>> your answer was to make _all_ scripts require 4 bytes per code
>>> point.
>>>
>>> Sometimes politics force us to do dumb things, and if that's the
>>> case then so be it, but that doesn't make it not a dumb thing.
>>>
>>> Note that, even in those countries, UTF-8 has clearly eclipsed all
>>> other encodings in actual use, politics be damned.
>>
>> Mhm.  Well UTF-8 favors certain characters over others as you say. If
>> you compress that, or a UTF-32 document, the size should be about the
>> same.
>
> The compressed size is a rough measure of entropy, which won't be
> affected (much) by the encoding scheme of the uncompressed data.

Yeah, something like that.

>> I think it's more fair that any given character takes the same
>> amount of space uncompressed, and then compression can be applied if
>> it is necessary to save space.
>
> Yes, some people think the only "fair" solution to poverty is to make
> _everyone_ poor.  Most people disagree, particularly the non-poor.

I think that's a very bad analogy. :)

>> Interestingly enough, this page
>>
>> http://w3techs.com/technologies/overview/site_element/all
>>
>> says that only 2/3rds of websites use compression.
>
> It's not clear exactly what that's measuring, and their "technologies
> overview" is surprisingly unhelpful.  In particular, there are several
> possible types and methods of compression, and it's not clear they're
> measuring all of them.

Well one of the arguments against UTF-32 is that it uses more space,
with compression that's no longer a valid argument.

> Also, by volume, the vast majority of Web content is static image and
> video files that are already highly compressed; some web sites compress
> static text content, but many don't bother because it's not worth the
> effort, and compressing dynamic text content is a pain.

I've worked with dynamic web content for a long time, and there
compression is also an option.  Any decent setup should be able
to compress dynamic content that is served.

> OTOH, one click away on that web site is this:
> http://w3techs.com/technologies/overview/character_encoding/all
>
> UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), while
> UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at all_.
>
> In reality, this part of the debate was over 10+ years ago when Google
> announced that UTF-8 had reached majority status, i.e. more popular than
> all other encodings _combined_.

I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32 in
the time to come.

With my parser/writer/DOM library, I'm chipping in towards that.

-Morten

[toc] | [prev] | [next] | [standalone]

#77651

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-02 13:40 -0600
Message-ID	<n3nhat$mho$1@dont-email.me>
In reply to	#77496

On 01-Dec-15 02:17, Morten W. Petersen wrote:
> On 30.11.2015 20:56, Stephen Sprunk wrote:
>> On 30-Nov-15 00:39, Morten W. Petersen wrote:
>>> I think it's more fair that any given character takes the same 
>>> amount of space uncompressed, and then compression can be applied
>>> if it is necessary to save space.
>> 
>> Yes, some people think the only "fair" solution to poverty is to
>> make _everyone_ poor.  Most people disagree, particularly the
>> non-poor.
> 
> I think that's a very bad analogy. :)

Actually, I think it's perfect.  You see that some scripts use more
bytes than others, so your solution is to move _all_ script into the
worst case scenario.

>>> Interestingly enough, this page
>>> 
>>> http://w3techs.com/technologies/overview/site_element/all
>>> 
>>> says that only 2/3rds of websites use compression.
>> 
>> It's not clear exactly what that's measuring, and their
>> "technologies overview" is surprisingly unhelpful.  In particular,
>> there are several possible types and methods of compression, and
>> it's not clear they're measuring all of them.
> 
> Well one of the arguments against UTF-32 is that it uses more space, 
> with compression that's no longer a valid argument.

According to that site, 1/3 of web sites don't use compression, so it's
still a valid argument for them.

But I would agree that the size issue is mostly a side show; it's the
simplest argument, so that's usually the one that comes out first, but
there are other problems with UTF-32, and Unicode as a whole has many
other problems--far more serious ones--that are encoding-independent.

Anyone who treats Unicode as just a wider version of ASCII is in for a
rude awakening.

>> Also, by volume, the vast majority of Web content is static image
>> and video files that are already highly compressed; some web sites
>> compress static text content, but many don't bother because it's
>> not worth the effort, and compressing dynamic text content is a
>> pain.
> 
> I've worked with dynamic web content for a long time, and there 
> compression is also an option.  Any decent setup should be able to
> compress dynamic content that is served.

Compressing text takes a lot more CPU power than just dumping it into a
socket.  Doing it once for a static page is no big deal, but doing it
for every page view is another story entirely, and the extra CPU power
required can easily translate into bigger hardware bills--and unlike
compressing static image/video content, it won't reduce your bandwidth
bills enough to compensate.

>> OTOH, one click away on that web site is this: 
>> http://w3techs.com/technologies/overview/character_encoding/all
>> 
>> UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo),
>> while UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at
>> all_.
> 
> I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32
> in the time to come.

Since it's sitting at 0% today (and has been since invented 20+ years
ago), we'll certainly not be seeing _less_ of it in the future, but I
don't see any reason to expect _more_ of it either.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77783

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-12-04 00:34 +0100
Message-ID	<n3qjhn$v9u$1@speranza.aioe.org>
In reply to	#77651

On 02.12.2015 20:40, Stephen Sprunk wrote:
> On 01-Dec-15 02:17, Morten W. Petersen wrote:
>> On 30.11.2015 20:56, Stephen Sprunk wrote:
>>> On 30-Nov-15 00:39, Morten W. Petersen wrote:
>>>> I think it's more fair that any given character takes the same
>>>> amount of space uncompressed, and then compression can be applied
>>>> if it is necessary to save space.
>>>
>>> Yes, some people think the only "fair" solution to poverty is to
>>> make _everyone_ poor.  Most people disagree, particularly the
>>> non-poor.
>>
>> I think that's a very bad analogy. :)
>
> Actually, I think it's perfect.  You see that some scripts use more
> bytes than others, so your solution is to move _all_ script into the
> worst case scenario.

I don't see any point in arguing on this any further, economics is
a very complex subject, while programming and data is fairly simple.

>> Well one of the arguments against UTF-32 is that it uses more space,
>> with compression that's no longer a valid argument.
>
> According to that site, 1/3 of web sites don't use compression, so it's
> still a valid argument for them.
>
> But I would agree that the size issue is mostly a side show; it's the
> simplest argument, so that's usually the one that comes out first, but
> there are other problems with UTF-32, and Unicode as a whole has many
> other problems--far more serious ones--that are encoding-independent.
>
> Anyone who treats Unicode as just a wider version of ASCII is in for a
> rude awakening.

What are these problems with UTF-32?

>> I've worked with dynamic web content for a long time, and there
>> compression is also an option.  Any decent setup should be able to
>> compress dynamic content that is served.
>
> Compressing text takes a lot more CPU power than just dumping it into a
> socket.  Doing it once for a static page is no big deal, but doing it
> for every page view is another story entirely, and the extra CPU power
> required can easily translate into bigger hardware bills--and unlike
> compressing static image/video content, it won't reduce your bandwidth
> bills enough to compensate.

Well, for a site with dynamic content, there is already some
CPU-utilization, while for example "resources" such as CSS files,
JavaScript files or data files can be compressed & cached, saved as
compressed files etc.

I think it's a fair guess that most sites with a lot of dynamic
elements, will spend at least tenfolds more CPU-time generating
dynamic content than compressing it.

>>> OTOH, one click away on that web site is this:
>>> http://w3techs.com/technologies/overview/character_encoding/all
>>>
>>> UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo),
>>> while UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at
>>> all_.
>>
>> I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32
>> in the time to come.
>
> Since it's sitting at 0% today (and has been since invented 20+ years
> ago), we'll certainly not be seeing _less_ of it in the future, but I
> don't see any reason to expect _more_ of it either.

Well, for my project UTF-32 is right; whether or not UTF-32 will be
pushed by (powerful) groups or become something that's important to
individual people remains to be seen.

-Morten

[toc] | [prev] | [next] | [standalone]

#77788

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-03 16:03 -0800
Message-ID	<5e187f54-668f-4a9b-8980-a2bed846456d@googlegroups.com>
In reply to	#77783

On Thursday, December 3, 2015 at 11:34:25 PM UTC, Morten W. Petersen wrote:
>
> Well, for a site with dynamic content, there is already some
> CPU-utilization, while for example "resources" such as CSS files,
> JavaScript files or data files can be compressed & cached, saved as
> compressed files etc.
> 
> I think it's a fair guess that most sites with a lot of dynamic
> elements, will spend at least tenfolds more CPU-time generating
> dynamic content than compressing it.
> 
These days, an html page is effectively an application, the web server
effectively the disk drive, although it's a bit more intelligent
than a PC disk drive, more a database back end (which it often
literally is).
But virtually all the action occurs on the client side. The user
types characters, such as I'm doing here, and the machine looks up
an font and maintains a raster. It also scrolls and even does 
auto-correction (annoying). Then I press post and an couple of
kilobytes of data go to the eerver and are put on usenet. But
that's one call.

[toc] | [prev] | [next] | [standalone]

#77439

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-11-29 23:07 -0800
Message-ID	<e037cc57-2024-491d-a992-8e821cd8014b@googlegroups.com>
In reply to	#77424

On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote:
> On 30.11.2015 02:15, Ian Collins wrote:
> 
> Well, let's say you have some organization that wants to create an
> archive of lots of non-latin history, in XML.
> 
> For them, choosing XML is right, and UTF-8 uses 3 bytes on characters 
> U+0800 through U+FFFF, but only 2 bytes in UTF-16.
> 
> However, UTF-16 is vulnerable to the entire string being corrupted
> after invalid data has been encountered.
> 
> So this organization chooses to use UTF-32, because the unnecessary byte
> there also acts as a delimiter.
> 
> This is plausible.
> 
> As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier
> discussion on comp.lang.c.
> 
The debate isn't entirely over.
Some Indians (Hindu, not red) don't like UTF-8 because Indian characters
are represented by longer sequences, which they see as giving second
status to their culture. And of course UTF-8 arrays don't easily support
random access. And Microsoft has gone the UTF-16 route, as has Java.

But the consensus is moving to UTF-8. Certainly it's my own view that
the other encoding should be treated as a nuisance, and only converted
to at the last moment to interface with systems that insist on them.

[toc] | [prev] | [next] | [standalone]

#77440

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 08:20 +0100
Message-ID	<n3gtas$9k$1@speranza.aioe.org>
In reply to	#77439

On 30.11.2015 08:07, Malcolm McLean wrote:
> On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote:
[...]
>> As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier
>> discussion on comp.lang.c.
>>
> The debate isn't entirely over.
> Some Indians (Hindu, not red) don't like UTF-8 because Indian characters
> are represented by longer sequences, which they see as giving second
> status to their culture. And of course UTF-8 arrays don't easily support
> random access. And Microsoft has gone the UTF-16 route, as has Java.
>
> But the consensus is moving to UTF-8. Certainly it's my own view that
> the other encoding should be treated as a nuisance, and only converted
> to at the last moment to interface with systems that insist on them.

Yes, I think UTF-8 is better than all the different encodings that have
been out there.  But that UTF-16 and UTF-32 have their place, and I see
it as natural that they will become mainstream.

"Cultural imperialism" is a term someone use;  I have a good command of
English and can even think about things in English naturally, then
again, I don't want to see all aspects of American culture survive
(and influence others), for example executing criminals, or "Hollywood
justice" as I like to call it.

There can be drawn many examples where technical choices which are
logical and simple can also have negative effects.

-Morten

[toc] | [prev] | [next] | [standalone]

#77441

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-11-29 23:40 -0800
Message-ID	<75850e42-6305-4846-8dcd-0e6c5975dfde@googlegroups.com>
In reply to	#77440

On Monday, November 30, 2015 at 7:20:07 AM UTC, Morten W. Petersen wrote:
> On 30.11.2015 08:07, Malcolm McLean wrote:
> > On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote:
> [...]
> >> As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier
> >> discussion on comp.lang.c.
> >>
> > The debate isn't entirely over.
> > Some Indians (Hindu, not red) don't like UTF-8 because Indian characters
> > are represented by longer sequences, which they see as giving second
> > status to their culture. And of course UTF-8 arrays don't easily support
> > random access. And Microsoft has gone the UTF-16 route, as has Java.
> >
> > But the consensus is moving to UTF-8. Certainly it's my own view that
> > the other encoding should be treated as a nuisance, and only converted
> > to at the last moment to interface with systems that insist on them.
> 
> Yes, I think UTF-8 is better than all the different encodings that have
> been out there.  But that UTF-16 and UTF-32 have their place, and I see
> it as natural that they will become mainstream.
> 
> "Cultural imperialism" is a term someone use;  I have a good command of
> English and can even think about things in English naturally, then
> again, I don't want to see all aspects of American culture survive
> (and influence others), for example executing criminals, or "Hollywood
> justice" as I like to call it.
> 
> There can be drawn many examples where technical choices which are
> logical and simple can also have negative effects.
> 
But in fact no poet or novelist is going to write in English rather
then Hindi because he can save a bit of money on computer memory.
If we were still in the days when 64K was the limit for a big machine
and "computer power is measured in kilobytes, the ZX81 comes with
one kilobyte" it might be different. But storage and memory is so 
cheap these days that it's not a consideration.

I don't think UTF-16 or UTF-32 have a real place, except that at some
point you need to convert from UTF-8 to code points. Two encodings
for string data are not twice as good as one.

[toc] | [prev] | [next] | [standalone]

#77442

From	"Morten W. Petersen" <morphex@gmail.com>
Date	2015-11-30 08:48 +0100
Message-ID	<n3guvo$370$1@speranza.aioe.org>
In reply to	#77441

On 30.11.2015 08:40, Malcolm McLean wrote:
> On Monday, November 30, 2015 at 7:20:07 AM UTC, Morten W. Petersen wrote:
[...]
>> Yes, I think UTF-8 is better than all the different encodings that have
>> been out there.  But that UTF-16 and UTF-32 have their place, and I see
>> it as natural that they will become mainstream.
>>
>> "Cultural imperialism" is a term someone use;  I have a good command of
>> English and can even think about things in English naturally, then
>> again, I don't want to see all aspects of American culture survive
>> (and influence others), for example executing criminals, or "Hollywood
>> justice" as I like to call it.
>>
>> There can be drawn many examples where technical choices which are
>> logical and simple can also have negative effects.
>>
> But in fact no poet or novelist is going to write in English rather
> then Hindi because he can save a bit of money on computer memory.
> If we were still in the days when 64K was the limit for a big machine
> and "computer power is measured in kilobytes, the ZX81 comes with
> one kilobyte" it might be different. But storage and memory is so
> cheap these days that it's not a consideration.
>
> I don't think UTF-16 or UTF-32 have a real place, except that at some
> point you need to convert from UTF-8 to code points. Two encodings
> for string data are not twice as good as one.

Hm yes.  Then again, to get one Unicode character from a UTF-8 stream,
you first have to read it, check it, and expand it if necessary.

To get one Unicode character from a UTF-32 stream, you read 4 bytes
and add them up.

One is a lot simpler than the other..  I like simple.  And given
that UTF8 and UTF32 streams are roughly the same size compressed,
and compression is cheap and available - doesn't that make UTF-32
a little bit simpler and more politically correct?

-Morten

[toc] | [prev] | [next] | [standalone]

#77443 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-11-30 20:52 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc2dh2Fi96mU1@mid.individual.net>
In reply to	#77442

Morten W. Petersen wrote:
>
> One is a lot simpler than the other..  I like simple.  And given
> that UTF8 and UTF32 streams are roughly the same size compressed,
> and compression is cheap and available - doesn't that make UTF-32
> a little bit simpler and more politically correct?

Does it matter is no one uses it?

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]

Page 2 of 11 — ← Prev page 1 [2] 3 4 … 11 Next page →

csiph-web

Working efficiently with 32-bit Unicode output streams, locale etc.

Contents

#77424

#77425 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

#77428

#77427

#77429

#77430 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

#77432

#77437 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

#77435

#77436

#77477

#77496

#77651

#77783

#77788

#77439

#77440

#77441

#77442

#77443 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.