Groups > comp.lang.c > #77357 > unrolled thread

Working efficiently with 32-bit Unicode output streams, locale etc.

Started by	"Morten W. Petersen" <morphex@gmail.com>
First post	2015-11-29 01:06 +0100
Last post	2015-12-02 09:58 -0800
Articles	20 on this page of 210 — 25 participants

Back to article view | Back to comp.lang.c

  Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
                                Re: Working efficiently with 32-bit Unicode output streams, locale     etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
                                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800

Page 4 of 11 — ← Prev page 1 2 3 [4] 5 6 … 11 Next page →

#77575

From	BartC <bc@freeuk.com>
Date	2015-12-01 21:32 +0000
Message-ID	<n3l3hs$bc1$1@dont-email.me>
In reply to	#77569

On 01/12/2015 20:53, Keith Thompson wrote:
> BartC <bc@freeuk.com> writes:
> [...]
>> If I run this code, where it prints the first 4 'somethings' of the string:
>>
>>       printf("%.4s","£100pw");
>>
>> Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
>
> The pound sign in your article is printed in my newsreader (actually in
> GNU Emacs) as \243.  Your article headers include:
>
>      Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Apparently my system (I'm using Linux) isn't configured to understand
> windows-1252, so it falls back to displaying the character in octal.
>
> I see you're using Thunderbird on Windows.  Is there any way you can
> configure it to post using UTF-8?

I don't know how to do that. But if I switch 'Character encoding' from 
Western to Unicode, then all my £ signs above turn into little black 
diamonds!

But I'm sure I've never had trouble sending £ symbols and people being 
able to read them at the other end. I've viewed my post above via 
googlegroups on both Windows and Ubuntu, using Thunderbird in each case, 
and the £s are visible.

(Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal 
in both Unicode and Windows-1252.)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77576

From	Keith Thompson <kst-u@mib.org>
Date	2015-12-01 13:55 -0800
Message-ID	<lnd1upu93u.fsf@kst-u.example.com>
In reply to	#77575

BartC <bc@freeuk.com> writes:
> On 01/12/2015 20:53, Keith Thompson wrote:
>> BartC <bc@freeuk.com> writes:
>> [...]
>>> If I run this code, where it prints the first 4 'somethings' of the string:
>>>
>>>       printf("%.4s","£100pw");
>>>
>>> Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
>>
>> The pound sign in your article is printed in my newsreader (actually in
>> GNU Emacs) as \243.  Your article headers include:
>>
>>      Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>> Apparently my system (I'm using Linux) isn't configured to understand
>> windows-1252, so it falls back to displaying the character in octal.
>>
>> I see you're using Thunderbird on Windows.  Is there any way you can
>> configure it to post using UTF-8?
>
> I don't know how to do that. But if I switch 'Character encoding' from
> Western to Unicode, then all my £ signs above turn into little black
> diamonds!
>
> But I'm sure I've never had trouble sending £ symbols and people being
> able to read them at the other end. I've viewed my post above via
> googlegroups on both Windows and Ubuntu, using Thunderbird in each
> case, and the £s are visible.
>
> (Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal
> in both Unicode and Windows-1252.)

And now the pound signs are showing up correctly for me.  Odd.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]

#77802

From	raltbos@xs4all.nl (Richard Bos)
Date	2015-12-04 10:30 +0000
Message-ID	<56616a63.975250@news.xs4all.nl>
In reply to	#77576

Keith Thompson <kst-u@mib.org> wrote:

> BartC <bc@freeuk.com> writes:

> > (Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal
> > in both Unicode and Windows-1252.)
> 
> And now the pound signs are showing up correctly for me.  Odd.

Not odd at all. Usenet is _still_, in its essentials, a 7-bit medium,
despite what blinkered advocates on either side of the Windows-Linux
divide want it to default to handling.
Well, boo-hoo to them: by default, and when things go wrong - which they
do when you try to force the matter in two contradictory ways - Usenet
defaults to ASCII. Let them learn to deal with that, and leave the rest
of us to view it in peace (and ASCII).

Richard

[toc] | [prev] | [next] | [standalone]

#77587

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-01 18:46 -0600
Message-ID	<n3letg$pjd$1@dont-email.me>
In reply to	#77575

On 01-Dec-15 15:32, BartC wrote:
> On 01/12/2015 20:53, Keith Thompson wrote:
>> I see you're using Thunderbird on Windows.  Is there any way you
>> can configure it to post using UTF-8?
> 
> I don't know how to do that. But if I switch 'Character encoding'
> from Western to Unicode, then all my £ signs above turn into little
> black diamonds!

There are two very different "Character encoding" menu settings; the one
for reading windows/panes forces Thunderbird to reinterpret the existing
bytes as different characters, which often results in mojibake, while
the one for writing windows tells it to use different bytes to represent
the same characters.

It sounds like you used the former, not the latter.

> But I'm sure I've never had trouble sending £ symbols and people
> being able to read them at the other end. I've viewed my post above
> via googlegroups on both Windows and Ubuntu, using Thunderbird in
> each case, and the £s are visible.

That works as long as the reader understands the encoding specified in
the headers--or can guess the correct one if unspecified.  Guessing has
obvious practical limitations, but UTF-8 is virtually unmistakeable,
unlike other encodings, which is yet another plus.

> (Apart from which. the code for £ is 243 octal, A3 hex and 163
> decimal in both Unicode and Windows-1252.)

Unicode is not an encoding.

"£" (U+00A3) is encoded as 0xC2 0xA3 in UTF-8, 0xA3 0x00 in UTF-16LE,
0x00 0xA3 in UTF-16BE, etc.

I don't have a Windows-1252 table handy, but it's almost the same as
ISO-8859-1, so it's not surprising that "£" is 0xA3 in both, and
Unicode's first 256 code points match ISO-8859-1 by design.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77577

From	Say, what? <<nothing@nowhere.nohow>>
Date	2015-12-01 14:07 -0800
Message-ID	<2015120114070915576-nothing@nowhere.nohow>
In reply to	#77569

I think this is posted in UTF-8.

#include <stdio.h>
#include <string.h>

int main(void) {
    char s[]="£100 = €140";
    unsigned char c;
    int i,r;

    r = printf("%s\n",s);
    printf("printf returned %i\n",r);
    printf("String length: %lu\n", strlen(s));

    for (i = 0; i < strlen(s); ++i){
        c = s[i];
        printf("%2d: %03d %02X <%c>\n",i,c,c,c);
    }
}

Output:

£100 = €140
printf returned 15
String length: 14
 0: 194 C2 <\302>
 1: 163 A3 <\243>
 2: 049 31 <1>
 3: 048 30 <0>
 4: 048 30 <0>
 5: 032 20 < >
 6: 061 3D <=>
 7: 032 20 < >
 8: 226 E2 <\342>
 9: 130 82 <\202>
10: 172 AC <\254>
11: 049 31 <1>
12: 052 34 <4>
13: 048 30 <0>

[toc] | [prev] | [next] | [standalone]

#77579

From	BartC <bc@freeuk.com>
Date	2015-12-01 23:54 +0000
Message-ID	<n3lbr1$f8j$1@dont-email.me>
In reply to	#77577

On 01/12/2015 22:07, Say wrote:
> I think this is posted in UTF-8.

>     char s[]="£100 = €140";
>     unsigned char c;
>     int i,r;
>
>     r = printf("%s\n",s);
>     printf("printf returned %i\n",r);
>     printf("String length: %lu\n", strlen(s));

> Output:
>
> £100 = €140
> printf returned 15
> String length: 14

Don't forget the \n in the first printf. Without \n, it will return 14, 
the same as the strlen.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77591

From	Say, what? <<nothing@nowhere.nohow>>
Date	2015-12-01 17:13 -0800
Message-ID	<2015120117131851364-nothing@nowhere.nohow>
In reply to	#77579

On 2015-12-01 23:54:20 +0000, BartC said:

> On 01/12/2015 22:07, Say wrote:
>> I think this is posted in UTF-8.
> 
>> char s[]="£100 = €140";
>> unsigned char c;
>> int i,r;
>> 
>> r = printf("%s\n",s);
>> printf("printf returned %i\n",r);
>> printf("String length: %lu\n", strlen(s));
> 
>> Output:
>> 
>> £100 = €140
>> printf returned 15
>> String length: 14
> 
> Don't forget the \n in the first printf. Without \n, it will return 14, 
> the same as the strlen.

The 15 vs. 14 might be startling until you remember the printf counts 
everything sent up to the '\0', this includes anything resulting from 
the format string.

The output was produced on OS X 10.10.5 using UTF-8 source and native C 
build by Xcode 7.1.1 (clang). Absolutely effortless to produce in this 
context. No messy code pages.

I inspected the source with 0xED to be sure it was encoding the source 
code text as UTF-8. But notice the printf returned 15 -- 14 octets plus 
the '\n', not 11 chars you stated you expected from inspection of the s 
array.

[toc] | [prev] | [next] | [standalone]

#77517

From	Martin Shobe <martin.shobe@yahoo.com>
Date	2015-12-01 09:08 -0600
Message-ID	<n3kd1c$cnt$1@dont-email.me>
In reply to	#77502

On 12/1/2015 6:38 AM, BartC wrote:
> On 01/12/2015 03:05, Stephen Sprunk wrote:
>> On 30-Nov-15 16:32, BartC wrote:
>
>>> I understand that /fully/ supporting Unicode is full of problems
>>> even using UTF32.
>>
>> Indeed; encoding is honestly the least of your problems, so just use
>> UTF-8 like everyone else and move on to the _hard_ stuff.
>
>>> Meanwhile I still occasionally come across problems with the
>>> representation of £ or €; maybe they should fix those first before
>>> we worry about ancient scripts or rare Chinese ideograms.
>>
>> If you're getting mojibake or replacement characters, that is usually
>> due to folks using some ancient encoding rather than something modern
>> and sensible, e.g. UTF-8.
>
> This is a typical problem I would get (source code was UTF8):
>
> #include <stdio.h>
> #include <string.h>
>
> int main(void) {
> char s[]="£100 = €140";
> unsigned char c;
> int i;
>
>      printf("%s\n",s);
>
>      for (i=0; i<strlen(s); ++i){
>          c = s[i];
>          printf("%2d: %03d %02X <%c>\n",i,c,c,c);
>      }
> }
>
> I want to print the individual characters in the string. Compiled with
> gcc, I get (using Windows console set to code page 65001):
>
> £100 = €140
>   0: 194 C2 <�>
>   1: 163 A3 <�>
>   2: 049 31 <1>
>   3: 048 30 <0>
>   4: 048 30 <0>
>   5: 032 20 < >
>   6: 061 3D <=>
>   7: 032 20 < >
>   8: 226 E2 <�>
>   9: 130 82 <�>
> 10: 172 AC <�>
> 11: 049 31 <1>
> 12: 052 34 <4>
> 13: 048 30 <0>
>
> I get 13 'characters' output instead of the 11 I expect. The £ and €
> characters are replaced by sequences of those funny black diamonds (you
> might see some other error character).

Why did you expect 11? In your loop, you aren't printing code points, 
but octets and there are 13 of those.

Martin Shobe

[toc] | [prev] | [next] | [standalone]

#77556

From	BartC <bc@freeuk.com>
Date	2015-12-01 20:02 +0000
Message-ID	<n3ku7i$kjd$1@dont-email.me>
In reply to	#77517

On 01/12/2015 15:08, Martin Shobe wrote:
> On 12/1/2015 6:38 AM, BartC wrote:

>> char s[]="£100 = €140";
>>      for (i=0; i<strlen(s); ++i){
>>          c = s[i];

>> I want to print the individual characters in the string. Compiled with
>> gcc, I get (using Windows console set to code page 65001):
>>
>> £100 = €140
>>   0: 194 C2 <�>
>>   1: 163 A3 <�>
>>   2: 049 31 <1>
>>   3: 048 30 <0>
>>   4: 048 30 <0>
>>   5: 032 20 < >
>>   6: 061 3D <=>
>>   7: 032 20 < >
>>   8: 226 E2 <�>
>>   9: 130 82 <�>
>> 10: 172 AC <�>
>> 11: 049 31 <1>
>> 12: 052 34 <4>
>> 13: 048 30 <0>
>>
>> I get 13 'characters' output instead of the 11 I expect. The £ and €
>> characters are replaced by sequences of those funny black diamonds (you
>> might see some other error character).
>
> Why did you expect 11?

Because there are 11 characters in "£100 = €140", not 13 (or 14 actually).

  In your loop, you aren't printing code points,
> but octets and there are 13 of those.

This is the problem I have with people saying that UTF8 can be be used 
transparently.

With an 8-bit coding (eg. ASCII plus 128 selected characters), the bytes 
in the data and the characters or code-points they represent have a 1:1 
correspondence. (The difference between character and code-point is 
something I would have to go and look up.)

Any code that makes that assumption can risk programs not working as 
expected.

With 16-bit or 32-bit strings, I would expect output more like the 
following:

    0:  163 00A3 <£>
    1:   49 0031 <1>
    2:   48 0030 <0>
    3:   48 0030 <0>
    4:   32 0020 < >
    5:   61 003D <=>
    6:   32 0020 < >
    7: 8364 20EC <€>
    8:   49 0031 <1>
    9:   52 0034 <4>
   10:   48 0030 <0>

So, how would you, given the same "£100 = €140" UTF8 string, write the C 
code to enumerate all the characters or code-points rather than the bytes?

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77578

From	Martin Shobe <martin.shobe@yahoo.com>
Date	2015-12-01 17:03 -0600
Message-ID	<n3l8s5$4v5$1@dont-email.me>
In reply to	#77556

On 12/1/2015 2:02 PM, BartC wrote:
> On 01/12/2015 15:08, Martin Shobe wrote:
>> On 12/1/2015 6:38 AM, BartC wrote:
>
>>> char s[]="£100 = €140";
>>>      for (i=0; i<strlen(s); ++i){
>>>          c = s[i];
>
>>> I want to print the individual characters in the string. Compiled with
>>> gcc, I get (using Windows console set to code page 65001):
>>>
>>> £100 = €140
>>>   0: 194 C2 <�>
>>>   1: 163 A3 <�>
>>>   2: 049 31 <1>
>>>   3: 048 30 <0>
>>>   4: 048 30 <0>
>>>   5: 032 20 < >
>>>   6: 061 3D <=>
>>>   7: 032 20 < >
>>>   8: 226 E2 <�>
>>>   9: 130 82 <�>
>>> 10: 172 AC <�>
>>> 11: 049 31 <1>
>>> 12: 052 34 <4>
>>> 13: 048 30 <0>
>>>
>>> I get 13 'characters' output instead of the 11 I expect. The £ and €
>>> characters are replaced by sequences of those funny black diamonds (you
>>> might see some other error character).
>>
>> Why did you expect 11?
>
> Because there are 11 characters in "£100 = €140", not 13 (or 14 actually).

But you told C to print an octet, not a character.

>   In your loop, you aren't printing code points,
>> but octets and there are 13 of those.
>
> This is the problem I have with people saying that UTF8 can be be used
> transparently.
>
> With an 8-bit coding (eg. ASCII plus 128 selected characters), the bytes
> in the data and the characters or code-points they represent have a 1:1
> correspondence. (The difference between character and code-point is
> something I would have to go and look up.)
>
> Any code that makes that assumption can risk programs not working as
> expected.
>
> With 16-bit or 32-bit strings, I would expect output more like the
> following:
>
>     0:  163 00A3 <£>
>     1:   49 0031 <1>
>     2:   48 0030 <0>
>     3:   48 0030 <0>
>     4:   32 0020 < >
>     5:   61 003D <=>
>     6:   32 0020 < >
>     7: 8364 20EC <€>
>     8:   49 0031 <1>
>     9:   52 0034 <4>
>    10:   48 0030 <0>

I wouldn't. Not all characters are encoded using a single code-point. 
While in your example it does, you can't, in general, rely on that.

> So, how would you, given the same "£100 = €140" UTF8 string, write the C
> code to enumerate all the characters or code-points rather than the bytes?

Code points aren't characters as you mean it either. If you want that, 
you will have to make your code aware of the differences between octets, 
code-points, and "characters". C I/O is too low level to understand such.

Martin Shobe

[toc] | [prev] | [next] | [standalone]

#77582

From	BartC <bc@freeuk.com>
Date	2015-12-02 00:17 +0000
Message-ID	<n3ld6f$k49$1@dont-email.me>
In reply to	#77578

On 01/12/2015 23:03, Martin Shobe wrote:
> On 12/1/2015 2:02 PM, BartC wrote:
>> On 01/12/2015 15:08, Martin Shobe wrote:

>>> Why did you expect 11?
>>
>> Because there are 11 characters in "£100 = €140", not 13 (or 14
>> actually).
>
> But you told C to print an octet, not a character.

I told it to print a value in %c format. In other words, a character.

>
>> So, how would you, given the same "£100 = €140" UTF8 string, write the C
>> code to enumerate all the characters or code-points rather than the
>> bytes?
>
> Code points aren't characters as you mean it either. If you want that,
> you will have to make your code aware of the differences between octets,
> code-points, and "characters". C I/O is too low level to understand such.

So considerable amounts of code that was happily mixing up bytes, 
octets, characters and code-points for decades, no longer works with the 
advent of UTF8.

This is my point. Too many people are saying it will just work 
transparently. It might do if you are just inputting a bunch of bytes 
ending with 0 or EOF, and outputting the same data without doing 
anything to it, not even counting how many 'characters' have been processed.

But sometimes you need to do a bit more with it. Then I'm saying that 
using wide-character strings is easier that trying to grapple with UTF8.

You can even use the same algorithms with 16- or 32-bit character 
strings as with 8-bit. (Of course, many are still going to delight in 
pin-pointing all sorts of complications with Unicode where the simplest 
operations are fraught with problems. But then, many others don't care.)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77588

From	Malcolm McLean <malcolm.mclean5@btinternet.com>
Date	2015-12-01 16:53 -0800
Message-ID	<e6fdb173-64b3-497f-996d-c944404a3764@googlegroups.com>
In reply to	#77582

On Wednesday, December 2, 2015 at 12:17:55 AM UTC, Bart wrote:
> 
> This is my point. Too many people are saying it will just work 
> transparently. It might do if you are just inputting a bunch of bytes 
> ending with 0 or EOF, and outputting the same data without doing 
> anything to it, not even counting how many 'characters' have been 
> processed.
> 
> But sometimes you need to do a bit more with it. Then I'm saying that 
> using wide-character strings is easier that trying to grapple with UTF8.
> 
> You can even use the same algorithms with 16- or 32-bit character 
> strings as with 8-bit. (Of course, many are still going to delight in 
> pin-pointing all sorts of complications with Unicode where the simplest 
> operations are fraught with problems. But then, many others don't care.)
> 
UTF-8 is designed to be backwards compatible with ascii both on the
binary level and at the code level, but you can't achieve both.
Almost any algorithm that works on ascii strings will still work
if you increase the width of a "char" and add extra symbols, with
the exception of algorithms that take a histogram of 128 / 256
possible values (even if theoretically they work they often
become unviable when character space gets too large), but then
you don't have binary compatibility.

The main place UTF-8 falls down is at the final output stage,
when characters have to be converted to glyphs, but there are 
other cases. A wildcard matcher will match * correctly but
not ?. Parameterised calls to strchr() will fail if the passed
character is not in the ascii subset. And programming scripts
have special problems, especially if the programming language
is doing string handling.

Then the grapheme cluster problem means that UTF-32 is only a
partial solution. Not getting Hebrew pointing correct isn't
such a disaster as unpointed Hebrew is still readable, often
acceptable, and Hebrew as a whole is usually a very small market.

[toc] | [prev] | [next] | [standalone]

#77594

From	Martin Shobe <martin.shobe@yahoo.com>
Date	2015-12-01 21:17 -0600
Message-ID	<n3lno6$eh0$1@dont-email.me>
In reply to	#77582

On 12/1/2015 6:17 PM, BartC wrote:
> On 01/12/2015 23:03, Martin Shobe wrote:
>> On 12/1/2015 2:02 PM, BartC wrote:
>>> On 01/12/2015 15:08, Martin Shobe wrote:
>
>>>> Why did you expect 11?
>>>
>>> Because there are 11 characters in "£100 = €140", not 13 (or 14
>>> actually).
>>
>> But you told C to print an octet, not a character.
>
> I told it to print a value in %c format. In other words, a character.

Then this is where the misunderstanding is. Some of what you gave it 
were not characters as you call them.

>>> So, how would you, given the same "£100 = €140" UTF8 string, write the C
>>> code to enumerate all the characters or code-points rather than the
>>> bytes?
>>
>> Code points aren't characters as you mean it either. If you want that,
>> you will have to make your code aware of the differences between octets,
>> code-points, and "characters". C I/O is too low level to understand such.
>
> So considerable amounts of code that was happily mixing up bytes,
> octets, characters and code-points for decades, no longer works with the
> advent of UTF8.

I don't know how much code is. I'm pretty sure that code that assumes 
octets are characters will break, like what you did above. Code that 
treats strings as strings and doesn't need to know which octets are part 
of which characters will still work. Most of the code I've written falls 
into the latter category, but others may have different experiences.

> This is my point. Too many people are saying it will just work
> transparently. It might do if you are just inputting a bunch of bytes
> ending with 0 or EOF, and outputting the same data without doing
> anything to it, not even counting how many 'characters' have been
> processed.

> But sometimes you need to do a bit more with it. Then I'm saying that
> using wide-character strings is easier that trying to grapple with UTF8.

Not really, you still have to deal with the fact that code-points aren't 
characters either.

> You can even use the same algorithms with 16- or 32-bit character
> strings as with 8-bit. (Of course, many are still going to delight in
> pin-pointing all sorts of complications with Unicode where the simplest
> operations are fraught with problems. But then, many others don't care.)
>

Martin Shobe

[toc] | [prev] | [next] | [standalone]

#77625

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-02 09:37 -0600
Message-ID	<n3n33g$qs7$1@dont-email.me>
In reply to	#77582

On 01-Dec-15 18:17, BartC wrote:
> On 01/12/2015 23:03, Martin Shobe wrote:
>> On 12/1/2015 2:02 PM, BartC wrote:
>>> On 01/12/2015 15:08, Martin Shobe wrote:
>>>> Why did you expect 11?
>>> 
>>> Because there are 11 characters in "£100 = €140", not 13 (or 14 
>>> actually).
>> 
>> But you told C to print an octet, not a character.
> 
> I told it to print a value in %c format. In other words, a
> character.

For historical reasons, C conflates "bytes" and "characters".

In various contexts, "character" is used mean any of "byte", "code
unit", "code point", "glyph", "grapheme" or "grapheme cluster"--and
sometimes more than one of them in the _same_ context.  And if that
wasn't confusing enough, sometimes it even means "non-character"!

>>> So, how would you, given the same "£100 = €140" UTF8 string,
>>> write the C code to enumerate all the characters or code-points
>>> rather than the bytes?
>> 
>> Code points aren't characters as you mean it either. If you want
>> that, you will have to make your code aware of the differences
>> between octets, code-points, and "characters". C I/O is too low
>> level to understand such.
> 
> So considerable amounts of code that was happily mixing up bytes, 
> octets, characters and code-points for decades, no longer works with
> the advent of UTF8.

As long as you don't _split_ strings, which includes extracting
individual bytes from them, UTF-8 is completely transparent.

> This is my point. Too many people are saying it will just work 
> transparently. It might do if you are just inputting a bunch of
> bytes ending with 0 or EOF, and outputting the same data without
> doing anything to it,

Most code does not split strings; it treats them as opaque blobs or, at
most, concatenates them (which is safe with UTF-8).

> not even counting how many 'characters' have been processed.

Most code gets this part wrong anyway, at least for most meanings of
"character".

> But sometimes you need to do a bit more with it. Then I'm saying
> that using wide-character strings is easier that trying to grapple
> with UTF8.

Yes, which is why UTF-32 is an acceptable _memory_ representation for
strings.  OTOH, UTF-16 has all the problems of UTF-8 and more yet none
of the benefits.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77627

From	James Kuyper <jameskuyper@verizon.net>
Date	2015-12-02 10:59 -0500
Message-ID	<n3n4dd$vn7$1@dont-email.me>
In reply to	#77625

On 12/02/2015 10:37 AM, Stephen Sprunk wrote:
...
> For historical reasons, C conflates "bytes" and "characters".
> 
> In various contexts, "character" is used mean any of "byte", "code
> unit", "code point", "glyph", "grapheme" or "grapheme cluster"--and
> sometimes more than one of them in the _same_ context.  And if that
> wasn't confusing enough, sometimes it even means "non-character"!

"byte
addressable unit of data storage large enough to hold any member of the
basic character set of the execution environment" (3.6p1)

So a byte is an amount of data storage. It can hold a character, but is
not the same thing as a character - which is good, because if they were
the same thing, that definition would be saying that a byte must be big
enough to store a byte, which is a pretty meaningless definition.

"character
〈abstract〉 member of a set of elements used for the organization,
control, or representation of data." (3.7p1)

I'm afraid that's a little too abstract for my taste.

"character
single-byte character
(C) bit representation that fits in a byte." (3.7.1p1)

There are separate definitions for "wide character" and "multi-byte
character".
So a C character is something that will fit in a byte - but is not the
same thing as a byte - which is good, because otherwise it would be
saying that a character must fit in a character, a pretty meaningless
definition.

The distinction made by the standard between a byte and a C character
seems pretty clear to me. Can you provide any citations of places where
"byte" is used when "character" was meant, or vice versa?
I'm not suggesting that there are no such citations - it's a big
document, and both terms are used quite frequently in that document, I'd
be more surprised if there were no errors. However, if there are any,
they should be brought to the attention to the editor of the standard,
Larry Jones, so they can be fixed.

I'm not aware of any location where C talks about "code point", "glyph",
"grapheme" or "grapheme cluster" - nor any location where it uses either
"byte" or "character" when it should instead have used one of those
terms. Could you identify some locations where such a mistake was made?
-- 
James Kuyper

[toc] | [prev] | [next] | [standalone]

#77644

From	BartC <bc@freeuk.com>
Date	2015-12-02 17:43 +0000
Message-ID	<n3nafp$pd0$1@dont-email.me>
In reply to	#77625

On 02/12/2015 15:37, Stephen Sprunk wrote:
> On 01-Dec-15 18:17, BartC wrote:

>> This is my point. Too many people are saying it will just work
>> transparently. It might do if you are just inputting a bunch of
>> bytes ending with 0 or EOF, and outputting the same data without
>> doing anything to it,
>
> Most code does not split strings; it treats them as opaque blobs or, at
> most, concatenates them (which is safe with UTF-8).

I don't believe that. Perhaps in a scripting language that might be the 
case: it might be too slow for that, or relies on built-in functionality 
to do all that stuff that needs to be done with strings. But that 
functionality might well be implemented in C.

>> not even counting how many 'characters' have been processed.
>
> Most code gets this part wrong anyway, at least for most meanings of
> "character".

I don't agree with this either! Of course most of us don't have friends 
who like to sign their emails with obscure non-BMP names (perhaps chosen 
precisely /because/ they are non-BMP and therefore different).

Most of the time a character count is going to be right, unless you're 
going to suggest it's wrong because we don't understand what a 
'character' is. I think most people have a pretty good idea!

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77649

From	Stephen Sprunk <stephen@sprunk.org>
Date	2015-12-02 13:22 -0600
Message-ID	<n3ng9m$i2e$1@dont-email.me>
In reply to	#77644

On 02-Dec-15 11:43, BartC wrote:
> On 02/12/2015 15:37, Stephen Sprunk wrote:
>> On 01-Dec-15 18:17, BartC wrote:
>>> This is my point. Too many people are saying it will just work 
>>> transparently. It might do if you are just inputting a bunch of 
>>> bytes ending with 0 or EOF, and outputting the same data without 
>>> doing anything to it,
>> 
>> Most code does not split strings; it treats them as opaque blobs
>> or, at most, concatenates them (which is safe with UTF-8).
> 
> I don't believe that. Perhaps in a scripting language that might be
> the case: it might be too slow for that, or relies on built-in
> functionality to do all that stuff that needs to be done with
> strings. But that functionality might well be implemented in C.

... and it is straightforward do so when needed, but if you're going to
do a lot of it, or deal with encodings other than UTF-8 (which are
becoming less relevant every day), then it's probably worth going to
UTF-32.  But even that doesn't solve _all_ of your problems because you
still have to deal with combining characters, non-characters, etc.

>>> not even counting how many 'characters' have been processed.
>> 
>> Most code gets this part wrong anyway, at least for most meanings
>> of "character".
> 
> I don't agree with this either! Of course most of us don't have
> friends who like to sign their emails with obscure non-BMP names
> (perhaps chosen precisely /because/ they are non-BMP and therefore
> different).

I seriously doubt their ancestors chose their surname based on a
prediction that, thousands of years in the future, the Unicode
Consortium would assign it a code point outside the BMP.

> Most of the time a character count is going to be right, unless
> you're going to suggest it's wrong because we don't understand what
> a 'character' is. I think most people have a pretty good idea!

That's the problem, actually: _nobody_ knows exactly what "character"
means.  It means something different to everyone and even different
things to the same person depending on context.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]

#77661 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-12-03 09:32 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc92reFi96mU6@mid.individual.net>
In reply to	#77644

BartC wrote:
> On 02/12/2015 15:37, Stephen Sprunk wrote:
>> On 01-Dec-15 18:17, BartC wrote:
>
>>> This is my point. Too many people are saying it will just work
>>> transparently. It might do if you are just inputting a bunch of
>>> bytes ending with 0 or EOF, and outputting the same data without
>>> doing anything to it,
>>
>> Most code does not split strings; it treats them as opaque blobs or, at
>> most, concatenates them (which is safe with UTF-8).
>
> I don't believe that. Perhaps in a scripting language that might be the
> case: it might be too slow for that, or relies on built-in functionality
> to do all that stuff that needs to be done with strings. But that
> functionality might well be implemented in C.

Maybe "most code does not split strings in a naïve manner" would be 
better?  Looking at my own code, where a string gets split, it gets 
splat after a search for a delimiter.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]

#77665

From	BartC <bc@freeuk.com>
Date	2015-12-02 21:12 +0000
Message-ID	<n3nmo8$e5j$1@dont-email.me>
In reply to	#77661

On 02/12/2015 20:32, Ian Collins wrote:
> BartC wrote:

>> I don't believe that. Perhaps in a scripting language that might be the
>> case: it might be too slow for that, or relies on built-in functionality
>> to do all that stuff that needs to be done with strings. But that
>> functionality might well be implemented in C.
>
> Maybe "most code does not split strings in a naïve manner" would be
> better?  Looking at my own code, where a string gets split, it gets
> splat after a search for a delimiter.

What's wrong in wanting to do things in a naive manner? Ie. in a simple 
and obvious way.

When character sets such as ASCII came along, it provided a limited and 
stylised way of representing English in a computer (compared with what 
was available in type-setting for example, or with handwriting).

Yet an enormous amount was possible. Systems tended to be written around 
the limitations (so using *, / and . for multiple, divide and decimal 
point), which seems to have worked very well. (As has the typewriter 
actually.)

It was possible to easily write programs to manipulate characters, words 
and lines because everything was so obvious.

Now, we need a single, unified character set so we need to go beyond 8 
bits to 16 and 21 (or 32). Fine. But we're also no longer allowed to 
treat things so simply. Because some alphabets don't share the same 
concepts of upper and lower case; characters might be represented in 
multiple code points; the same glyph is associated with many code 
points; etc etc.

In short, it's become impossible. Many of the complications of 
type-setting, as well as too many real-world considerations, have been 
introduced, when they really belong at a different level.

And, to top it all, we are expected to cope with a variable length 
encoding too!

 > Maybe "most code does not split strings in a naïve manner" would be
 > better?  Looking at my own code, where a string gets split, it gets
 > splat after a search for a delimiter.

That's too complicated. In a language you want to keep it simple as 
possible.

Suppose I have a string S and want to copy the initial character into 
the last, so that "Bart" => "BarB". Even in C, I want to be able to 
write (after ensuring S isn't empty):

   S[strlen(S)-1] = S[0];

when S uses 8-bit elements and the string can be represented as such, 
and to also be able to write:

   S[strlen(S)-1] = S[0];

when S uses 16-bit or 32-bit elements.

With a variable-length encoding however, it's completely different. The 
resulting string might be longer, meaning additional problems of memory 
management. Or the string exists in a field of a struct which might not 
be big enough.

It's like taking existing code which works perfectly well with 
characters, and changing it to work with variable-length words.

Who wants to code like that? I think UTF8 is a fine compression scheme 
for storing text on disk, otherwise...

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#77668 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

From	Ian Collins <ian-news@hotmail.com>
Date	2015-12-03 10:36 +1300
Subject	Re: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID	<dc96i7Fi96mU8@mid.individual.net>
In reply to	#77665

BartC wrote:
> On 02/12/2015 20:32, Ian Collins wrote:
>> BartC wrote:
>
>>> I don't believe that. Perhaps in a scripting language that might be the
>>> case: it might be too slow for that, or relies on built-in functionality
>>> to do all that stuff that needs to be done with strings. But that
>>> functionality might well be implemented in C.
>>
>> Maybe "most code does not split strings in a naïve manner" would be
>> better?  Looking at my own code, where a string gets split, it gets
>> splat after a search for a delimiter.
>
> What's wrong in wanting to do things in a naive manner? Ie. in a simple
> and obvious way.

Did you parse what I wrote?  Is looking for a delimiter and splitting 
the string around it anything other than simple?  If you want to go one 
step further, splitting the string a a fixed point, you have to know 
where.  If you know where, you know the format of the data.

<snip>

> That's too complicated. In a language you want to keep it simple as
> possible.

So you advocate random splitting of strings?

> Suppose I have a string S and want to copy the initial character into
> the last, so that "Bart" => "BarB". Even in C, I want to be able to
> write (after ensuring S isn't empty):
>
>     S[strlen(S)-1] = S[0];

How often in real code would you want to do that?

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]

Page 4 of 11 — ← Prev page 1 2 3 [4] 5 6 … 11 Next page →

csiph-web

Working efficiently with 32-bit Unicode output streams, locale etc.

Contents

#77575

#77576

#77802

#77587

#77577

#77579

#77591

#77517

#77556

#77578

#77582

#77588

#77594

#77625

#77627

#77644

#77649

#77661 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

#77665

#77668 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.