Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #77357 > unrolled thread

Working efficiently with 32-bit Unicode output streams, locale etc.

Started by"Morten W. Petersen" <morphex@gmail.com>
First post2015-11-29 01:06 +0100
Last post2015-12-02 09:58 -0800
Articles 20 on this page of 210 — 25 participants

Back to article view | Back to comp.lang.c


Contents

  Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
                              Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
                                Re: Working efficiently with 32-bit Unicode output streams, locale     etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
                                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
                                                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
                                                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
                                                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
                                                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
                                                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
                                                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
                                                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
                                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
                                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
                                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
                                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
                                                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
                                                        Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
                                                          Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
                                                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
                                                Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
                                              Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
                                                Re: Working efficiently with 32-bit Unicode output streams, locale   etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
                                            Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
                                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
                      Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
              Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
                Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
                  Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
                    Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
        Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
    Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800

Page 1 of 11  [1] 2 3 … 11  Next page →


#77357 — Working efficiently with 32-bit Unicode output streams, locale etc.

From"Morten W. Petersen" <morphex@gmail.com>
Date2015-11-29 01:06 +0100
SubjectWorking efficiently with 32-bit Unicode output streams, locale etc.
Message-ID<n3dfgs$a24$1@speranza.aioe.org>
Hi there.

By now I should be a little bit known for working on my XML library
Smash XML :), here

https://github.com/morphex/smash_xml

Now I'm at the point where I'm writing some output (write XML files)
functionality..  I want to output in an UTF32-LE encoding, which means
each "character" takes up 32 bits or 4 regular chars.

I see that it can be expected that fputwc on modern *NIX-platforms
will work with 32-bit characters, and that the standard encoding on
Windows is 16-bit LE Unicode.

Internally the project uses integers at least 32 bits in size for
each Unicode symbol.

So, how do I go about writing the data?  Another thing I've been
thinking of is that fputc for example is a function call for each
character that gets put in a stream..  Is it a lot more efficient to
for example gather up 1024 characters and use fwrite?

I also need to get feedback on my terminal when running the test code,
so that 32-bit Unicode is written to the terminal and readable, what's
a cross-platform way to handle that?

-Morten

[toc] | [next] | [standalone]


#77363

FromNobody <nobody@nowhere.invalid>
Date2015-11-29 02:01 +0000
Message-ID<pan.2015.11.29.02.01.16.669000@nowhere.invalid>
In reply to#77357
On Sun, 29 Nov 2015 01:06:19 +0100, Morten W. Petersen wrote:

> Now I'm at the point where I'm writing some output (write XML files)
> functionality..  I want to output in an UTF32-LE encoding, which means
> each "character" takes up 32 bits or 4 regular chars.
> 
> I see that it can be expected that fputwc on modern *NIX-platforms will
> work with 32-bit characters, and that the standard encoding on Windows is
> 16-bit LE Unicode.

Hang on a minute. fputwc() accepts wide characters as input, but what gets
written depends upon the locale's encoding (based upon the LC_CTYPE
category of the current locale). Unless the system has a locale which uses
UTF-32-LE as its encoding (which is unlikely), or you're using
extensions (such as glibc accepting "ccs=<encoding>" in the mode parameter
to fopen()), you aren't going to be able to write UTF-32-LE using fputwc().

Apart from that, writing XML using UTF-32 is likely to severely limit its
usefulness. There's a reason why almost everyone uses UTF-8 for Unicode
(the main exception being Microsoft, who like to use UTF-16-LE so that
they can dump the in-memory representation directly to disk).

> Internally the project uses integers at least 32 bits in size for each
> Unicode symbol.
> 
> So, how do I go about writing the data?  Another thing I've been thinking
> of is that fputc for example is a function call for each character that
> gets put in a stream..  Is it a lot more efficient to for example gather
> up 1024 characters and use fwrite?

If you want to avoid the overhead of a function call per byte, on Linux
you can use fputc_unlocked(), which is an inline function wrapper around
the _IO_putc_unlocked() macro (which just manipulates the FILE structure
directly). The main overhead of fputc() is that it has to lock the stream
in order to avoid race conditions when a stream is accessed from multiple
threads concurrently; fputc_unlocked() avoids that.

> I also need to get feedback on my terminal when running the test code, so
> that 32-bit Unicode is written to the terminal and readable, what's a
> cross-platform way to handle that?

First, you need to use UTF-8. Neither Windows nor Linux support UTF-16 or
UTF-32 on a console. Setting up the console to use UTF-8 requires
platform-specific code. E.g. Windows has the pseudo-codepage 65001 for
UTF-8 ("pseudo" because 65001 isn't valid in most contexts where a
codepage is used). The Linux console (VT) can be put into UTF-8 mode with
the sequence "ESC % G"; X11-based terminal emulators all have their own
ways of dealing with it.

[toc] | [prev] | [next] | [standalone]


#77364

From"Morten W. Petersen" <morphex@gmail.com>
Date2015-11-29 03:31 +0100
Message-ID<n3do1g$qpm$1@speranza.aioe.org>
In reply to#77363
On 29.11.2015 03:01, Nobody wrote:
> On Sun, 29 Nov 2015 01:06:19 +0100, Morten W. Petersen wrote:
>
>> Now I'm at the point where I'm writing some output (write XML files)
>> functionality..  I want to output in an UTF32-LE encoding, which means
>> each "character" takes up 32 bits or 4 regular chars.
>>
>> I see that it can be expected that fputwc on modern *NIX-platforms will
>> work with 32-bit characters, and that the standard encoding on Windows is
>> 16-bit LE Unicode.
>
> Hang on a minute. fputwc() accepts wide characters as input, but what gets
> written depends upon the locale's encoding (based upon the LC_CTYPE
> category of the current locale). Unless the system has a locale which uses
> UTF-32-LE as its encoding (which is unlikely), or you're using
> extensions (such as glibc accepting "ccs=<encoding>" in the mode parameter
> to fopen()), you aren't going to be able to write UTF-32-LE using fputwc().
>
> Apart from that, writing XML using UTF-32 is likely to severely limit its
> usefulness. There's a reason why almost everyone uses UTF-8 for Unicode
> (the main exception being Microsoft, who like to use UTF-16-LE so that
> they can dump the in-memory representation directly to disk).

OK.  Yes I was wondering about that, as I didn't see anything relating
to UTF-32 in the Linux locales.

Isn't a big deal to convert the locale text contents to UTF-32, but
there might be other things there as well that need handling.

>> Internally the project uses integers at least 32 bits in size for each
>> Unicode symbol.
>>
>> So, how do I go about writing the data?  Another thing I've been thinking
>> of is that fputc for example is a function call for each character that
>> gets put in a stream..  Is it a lot more efficient to for example gather
>> up 1024 characters and use fwrite?
>
> If you want to avoid the overhead of a function call per byte, on Linux
> you can use fputc_unlocked(), which is an inline function wrapper around
> the _IO_putc_unlocked() macro (which just manipulates the FILE structure
> directly). The main overhead of fputc() is that it has to lock the stream
> in order to avoid race conditions when a stream is accessed from multiple
> threads concurrently; fputc_unlocked() avoids that.

Aha, OK.  Still it would seem that working on the data in memory and
then writing the finished sequence to file using fwrite would have
less call overhead.  But maybe it is a bit more overhead to program it.

>> I also need to get feedback on my terminal when running the test code, so
>> that 32-bit Unicode is written to the terminal and readable, what's a
>> cross-platform way to handle that?
>
> First, you need to use UTF-8. Neither Windows nor Linux support UTF-16 or
> UTF-32 on a console. Setting up the console to use UTF-8 requires
> platform-specific code. E.g. Windows has the pseudo-codepage 65001 for
> UTF-8 ("pseudo" because 65001 isn't valid in most contexts where a
> codepage is used). The Linux console (VT) can be put into UTF-8 mode with
> the sequence "ESC % G"; X11-based terminal emulators all have their own
> ways of dealing with it.

OK. It's a bit of a chicken-and-egg problem this encoding thing, but
I guess I could talk to some Linux distro people and see if they are
interested in supporting UTF-32 as an option.

-Morten

[toc] | [prev] | [next] | [standalone]


#77367

FromStephen Sprunk <stephen@sprunk.org>
Date2015-11-29 00:09 -0600
Message-ID<n3e4m1$6k9$1@dont-email.me>
In reply to#77364
On 28-Nov-15 20:31, Morten W. Petersen wrote:
> On 29.11.2015 03:01, Nobody wrote:
>> On Sun, 29 Nov 2015 01:06:19 +0100, Morten W. Petersen wrote:
>>> I also need to get feedback on my terminal when running the test
>>> code, so that 32-bit Unicode is written to the terminal and 
>>> readable, what's a cross-platform way to handle that?
>> 
>> First, you need to use UTF-8. Neither Windows nor Linux support 
>> UTF-16 or UTF-32 on a console. Setting up the console to use UTF-8 
>> requires platform-specific code. ...
> 
> OK. It's a bit of a chicken-and-egg problem this encoding thing, but
> I guess I could talk to some Linux distro people and see if they are
> interested in supporting UTF-32 as an option.

Why use UTF-32 at all when UTF-8 is already there and works better
anyway?  This seems like a solution in search of a problem.

Everyone knows how to deal with UTF-8.  It takes a bit of jumping
through hoops on Windows or Java, which thinks "Unicode" means UTF-16
(or maybe UCS-2, hence endless bugs), but using UTF-32 there is even
more painful than UTF-8, so why put yourself through that?

Also, your users is going to expect UTF-8 files or at worst UTF-16 (or
maybe UCS-2) files; _none_ of them will expect UTF-32 files.  You're
going to have to support UTF-8 anyway if you want anyone to use your
library, so why not do it now and avoid the hassles of UTF-32?

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]


#77369

FromRobert Wessel <robertwessel2@yahoo.com>
Date2015-11-29 00:22 -0600
Message-ID<lv5l5bt2or2642fbvm0svc889fs0v3h5qp@4ax.com>
In reply to#77364
On Sun, 29 Nov 2015 03:31:44 +0100, "Morten W. Petersen"
<morphex@gmail.com> wrote:

>On 29.11.2015 03:01, Nobody wrote:
>> On Sun, 29 Nov 2015 01:06:19 +0100, Morten W. Petersen wrote:
>>
>>> Now I'm at the point where I'm writing some output (write XML files)
>>> functionality..  I want to output in an UTF32-LE encoding, which means
>>> each "character" takes up 32 bits or 4 regular chars.
>>>
>>> I see that it can be expected that fputwc on modern *NIX-platforms will
>>> work with 32-bit characters, and that the standard encoding on Windows is
>>> 16-bit LE Unicode.
>>
>> Hang on a minute. fputwc() accepts wide characters as input, but what gets
>> written depends upon the locale's encoding (based upon the LC_CTYPE
>> category of the current locale). Unless the system has a locale which uses
>> UTF-32-LE as its encoding (which is unlikely), or you're using
>> extensions (such as glibc accepting "ccs=<encoding>" in the mode parameter
>> to fopen()), you aren't going to be able to write UTF-32-LE using fputwc().
>>
>> Apart from that, writing XML using UTF-32 is likely to severely limit its
>> usefulness. There's a reason why almost everyone uses UTF-8 for Unicode
>> (the main exception being Microsoft, who like to use UTF-16-LE so that
>> they can dump the in-memory representation directly to disk).
>
>OK.  Yes I was wondering about that, as I didn't see anything relating
>to UTF-32 in the Linux locales.
>
>Isn't a big deal to convert the locale text contents to UTF-32, but
>there might be other things there as well that need handling.
>
>>> Internally the project uses integers at least 32 bits in size for each
>>> Unicode symbol.
>>>
>>> So, how do I go about writing the data?  Another thing I've been thinking
>>> of is that fputc for example is a function call for each character that
>>> gets put in a stream..  Is it a lot more efficient to for example gather
>>> up 1024 characters and use fwrite?
>>
>> If you want to avoid the overhead of a function call per byte, on Linux
>> you can use fputc_unlocked(), which is an inline function wrapper around
>> the _IO_putc_unlocked() macro (which just manipulates the FILE structure
>> directly). The main overhead of fputc() is that it has to lock the stream
>> in order to avoid race conditions when a stream is accessed from multiple
>> threads concurrently; fputc_unlocked() avoids that.
>
>Aha, OK.  Still it would seem that working on the data in memory and
>then writing the finished sequence to file using fwrite would have
>less call overhead.  But maybe it is a bit more overhead to program it.
>
>>> I also need to get feedback on my terminal when running the test code, so
>>> that 32-bit Unicode is written to the terminal and readable, what's a
>>> cross-platform way to handle that?
>>
>> First, you need to use UTF-8. Neither Windows nor Linux support UTF-16 or
>> UTF-32 on a console. Setting up the console to use UTF-8 requires
>> platform-specific code. E.g. Windows has the pseudo-codepage 65001 for
>> UTF-8 ("pseudo" because 65001 isn't valid in most contexts where a
>> codepage is used). The Linux console (VT) can be put into UTF-8 mode with
>> the sequence "ESC % G"; X11-based terminal emulators all have their own
>> ways of dealing with it.
>
>OK. It's a bit of a chicken-and-egg problem this encoding thing, but
>I guess I could talk to some Linux distro people and see if they are
>interested in supporting UTF-32 as an option.


I think you can safely assume the answer will be "no".  Use UTF-8.  If
you have UTF-32 internally, generating UTF-8 will be trivial.

[toc] | [prev] | [next] | [standalone]


#77404

FromRichard Damon <Richard@Damon-Family.org>
Date2015-11-29 14:31 -0500
Message-ID<YnI6y.140896$2K.125720@fx09.iad>
In reply to#77364
On 11/28/15 9:31 PM, Morten W. Petersen wrote:

> OK.  Yes I was wondering about that, as I didn't see anything relating
> to UTF-32 in the Linux locales.
>

One issue may be that UTF-32 is normally called UCS-4. The UTF's are 
generally thought of as the multi-unit encoding methods, and the UCS's 
are the fixed length encoding. (UCS-2 is the early version of Unicode 
with only the BMP, UCS-4 allowed the number of characters to be expanded).

[toc] | [prev] | [next] | [standalone]


#77416

FromNobody <nobody@nowhere.invalid>
Date2015-11-29 23:51 +0000
Message-ID<pan.2015.11.29.23.51.27.24000@nowhere.invalid>
In reply to#77364
On Sun, 29 Nov 2015 03:31:44 +0100, Morten W. Petersen wrote:

>> Unless the system has a locale which uses UTF-32-LE as its encoding
>> (which is unlikely)

I probably should have made it more clear that my use of "unlikely" was
sarcasm. It's actually impossible to have a UTF-32 (or UTF-16) locale
on Unix.

The locale encoding has to be minimally compatible with US-ASCII
(ISO-646-US) so that the byte sequence used to represent e.g. "/bin/sh"
doesn't vary with the locale.

Aside from that, UTF-32 has the annoying property that all valid Unicode
characters will have null ('\0') bytes in their encoding (the space of
Unicode code points only goes up to U+10FFFF, so the high byte is always
null), which means that a string encoded in UTF-32 can't be used for
filenames, command-line arguments, environment variables, etc, or passed
to or returned from any function which uses null-terminated byte strings.

[toc] | [prev] | [next] | [standalone]


#77417

From"Morten W. Petersen" <morphex@gmail.com>
Date2015-11-30 01:21 +0100
Message-ID<n3g4q5$ou3$1@speranza.aioe.org>
In reply to#77416
On 30.11.2015 00:51, Nobody wrote:
> On Sun, 29 Nov 2015 03:31:44 +0100, Morten W. Petersen wrote:
>
>>> Unless the system has a locale which uses UTF-32-LE as its encoding
>>> (which is unlikely)
>
> I probably should have made it more clear that my use of "unlikely" was
> sarcasm. It's actually impossible to have a UTF-32 (or UTF-16) locale
> on Unix.
>
> The locale encoding has to be minimally compatible with US-ASCII
> (ISO-646-US) so that the byte sequence used to represent e.g. "/bin/sh"
> doesn't vary with the locale.

Well this is interesting..  I know the #!/bin/sh deal with files that
have been marked as executable.

Read a bit about the hash/shebang now, and I see what the deal is, it
looks like a BOM even for UTF-8 isn't supported in those cases.

I guess it's a matter of someone getting the job done, enabling support
for UTF-16 or UTF-32 in executable script files on Unix.

> Aside from that, UTF-32 has the annoying property that all valid Unicode
> characters will have null ('\0') bytes in their encoding (the space of
> Unicode code points only goes up to U+10FFFF, so the high byte is always
> null), which means that a string encoded in UTF-32 can't be used for
> filenames, command-line arguments, environment variables, etc, or passed
> to or returned from any function which uses null-terminated byte strings.

Yes, that's true.  But represented in 4-byte chunks, Unicode strings
can also be terminated by NULL.

-Morten

[toc] | [prev] | [next] | [standalone]


#77449

FromKeith Thompson <kst-u@mib.org>
Date2015-11-30 00:41 -0800
Message-ID<lntwo3x4jq.fsf@kst-u.example.com>
In reply to#77417
"Morten W. Petersen" <morphex@gmail.com> writes:
> On 30.11.2015 00:51, Nobody wrote:
[...]
>> The locale encoding has to be minimally compatible with US-ASCII
>> (ISO-646-US) so that the byte sequence used to represent e.g. "/bin/sh"
>> doesn't vary with the locale.
>
> Well this is interesting..  I know the #!/bin/sh deal with files that
> have been marked as executable.
>
> Read a bit about the hash/shebang now, and I see what the deal is, it
> looks like a BOM even for UTF-8 isn't supported in those cases.
>
> I guess it's a matter of someone getting the job done, enabling support
> for UTF-16 or UTF-32 in executable script files on Unix.
  
That would fill a much needed gap.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]


#77450

FromStephen Sprunk <stephen@sprunk.org>
Date2015-11-30 03:16 -0600
Message-ID<n3h419$o8s$1@dont-email.me>
In reply to#77417
On 29-Nov-15 18:21, Morten W. Petersen wrote:
> On 30.11.2015 00:51, Nobody wrote:
>> On Sun, 29 Nov 2015 03:31:44 +0100, Morten W. Petersen wrote:
>> 
>>>> Unless the system has a locale which uses UTF-32-LE as its
>>>> encoding (which is unlikely)
>> 
>> I probably should have made it more clear that my use of "unlikely"
>> was sarcasm. It's actually impossible to have a UTF-32 (or UTF-16)
>> locale on Unix.
>> 
>> The locale encoding has to be minimally compatible with US-ASCII 
>> (ISO-646-US) so that the byte sequence used to represent e.g.
>> "/bin/sh" doesn't vary with the locale.

There are hundreds of similar problems; that's one of the simplest.

> Well this is interesting..  I know the #!/bin/sh deal with files
> that have been marked as executable.
> 
> Read a bit about the hash/shebang now, and I see what the deal is,
> it looks like a BOM even for UTF-8 isn't supported in those cases.

UTF-8 doesn't _need_ a BOM because it's a byte-oriented, rather than
word-oriented, encoding; there is no endianness question.

The only reason the so-called "UTF-8 BOM" exists is that Microsft's
tools assumed BOM-less text files were UCS-2LE, but that was corrected
long ago and is now unnecessary.  Today, one can simply assume that
_all_ text files are UTF-8 unless there is a specific reason to think
otherwise--and that's best for everyone.

> I guess it's a matter of someone getting the job done, enabling
> support for UTF-16 or UTF-32 in executable script files on Unix.

That would require changing POSIX and then getting every vendor of a
POSIX-compatible OS to update their software, which seems unlikely.

There is simply no reason to do that; UTF-8 is already better, and it
didn't require altering _any_ standards or software, in large part
because it had that as a deliberate design goal, unlike UTF-16/32.

>> Aside from that, UTF-32 has the annoying property that all valid
>> Unicode characters will have null ('\0') bytes in their encoding
>> (the space of Unicode code points only goes up to U+10FFFF, so the
>> high byte is always null), which means that a string encoded in
>> UTF-32 can't be used for filenames, command-line arguments,
>> environment variables, etc, or passed to or returned from any
>> function which uses null-terminated byte strings.
> 
> Yes, that's true.  But represented in 4-byte chunks, Unicode strings 
> can also be terminated by NULL.

It means that _all_ string functions (and there are millions of them in
the wild) have to be completely rewritten to deal with all those
embedded NUL (not NULL) characters.  Microsoft, for instance, had to
duplicate every function in the entire Windows API to support UCS-2,
whereas POSIX simply allowed UTF-8 in its existing API and moved on.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]


#77376

FromJorgen Grahn <grahn+nntp@snipabacken.se>
Date2015-11-29 08:28 +0000
Message-ID<slrnn5ldq6.5q5.grahn+nntp@frailea.sa.invalid>
In reply to#77363
On Sun, 2015-11-29, Nobody wrote:
> On Sun, 29 Nov 2015 01:06:19 +0100, Morten W. Petersen wrote:
...
>> I also need to get feedback on my terminal when running the test code, so
>> that 32-bit Unicode is written to the terminal and readable, what's a
>> cross-platform way to handle that?
>
> First, you need to use UTF-8. Neither Windows nor Linux support UTF-16 or
> UTF-32 on a console. Setting up the console to use UTF-8 requires
> platform-specific code. E.g. Windows has the pseudo-codepage 65001 for
> UTF-8 ("pseudo" because 65001 isn't valid in most contexts where a
> codepage is used). The Linux console (VT) can be put into UTF-8 mode with
> the sequence "ESC % G"; X11-based terminal emulators all have their own
> ways of dealing with it.

The Linux part of that sounds fishy.  I haven't fully understood it, but
it seems there are two ways programs can reason about this:

- "If the user told me (via $LC_CTYPE) to output UTF-8, then I can assume
  the terminal can deal with it."
- "It's an UTF-8 world now, so this system supports UTF-8."

I don't think programs are supposed to reconfigure the terminal.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]


#77378

FromStephen Sprunk <stephen@sprunk.org>
Date2015-11-29 02:54 -0600
Message-ID<n3eebd$vbp$1@dont-email.me>
In reply to#77363
On 28-Nov-15 20:01, Nobody wrote:
> Windows has the pseudo-codepage 65001 for UTF-8 ("pseudo" because
> 65001 isn't valid in most contexts where a codepage is used).

AFAIK, CP_UTF7 (65000) and CP_UTF8 (65001) are valid nearly anywhere
that Windows expects an "ANSI" code page.  The main exception is that
you can't set them as the system default; the GUI doesn't list them as
options, and if you hack the registry, the OS crashes during boot.  But
you can happily use them in your own apps--and let Windows do all the
work of converting to/from the UTF-16 (or maybe UCS-2) widechars that it
forces on you.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]


#77365

FromIan Collins <ian-news@hotmail.com>
Date2015-11-29 16:30 +1300
Message-ID<dbv9quFea62U7@mid.individual.net>
In reply to#77357
Morten W. Petersen wrote:
> Hi there.
>
> By now I should be a little bit known for working on my XML library
> Smash XML :), here
>
> https://github.com/morphex/smash_xml
>
> Now I'm at the point where I'm writing some output (write XML files)
> functionality..  I want to output in an UTF32-LE encoding, which means
> each "character" takes up 32 bits or 4 regular chars.

Why do that when just about everyone else uses UTF-8?  I don't think 
I've seen an XML document that uses anything else.  Even MS Office XML 
uses UTF-8.

<snip>

> So, how do I go about writing the data?  Another thing I've been
> thinking of is that fputc for example is a function call for each
> character that gets put in a stream..  Is it a lot more efficient to
> for example gather up 1024 characters and use fwrite?

With UTF-8, it's easy just to build up a "string" and write it with fwrite.

> I also need to get feedback on my terminal when running the test code,
> so that 32-bit Unicode is written to the terminal and readable, what's
> a cross-platform way to handle that?

I don't know about Windows, but everywhere else, UTF-8 :)

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#77371

FromMalcolm McLean <malcolm.mclean5@btinternet.com>
Date2015-11-28 23:53 -0800
Message-ID<0407abc1-4ce3-4213-91f2-987a3620bbc8@googlegroups.com>
In reply to#77365
On Sunday, November 29, 2015 at 3:30:55 AM UTC, Ian Collins wrote:
> Morten W. Petersen wrote:
> > Hi there.
> >
> > By now I should be a little bit known for working on my XML library
> > Smash XML :), here
> >
> > https://github.com/morphex/smash_xml
> >
> > Now I'm at the point where I'm writing some output (write XML files)
> > functionality..  I want to output in an UTF32-LE encoding, which means
> > each "character" takes up 32 bits or 4 regular chars.
> 
> Why do that when just about everyone else uses UTF-8?  I don't think 
> I've seen an XML document that uses anything else.  Even MS Office XML 
> uses UTF-8.
> 
Since UTF-32 is allowed, a general purpose reader must support it.
Since it is never used, such a reader can only be tested if you first 
produce a UTF-32 writer.

(I think the only easy way is to dump the UTF-32 as binary then open it
in a sophisticated word-processor).

[toc] | [prev] | [next] | [standalone]


#77375

FromStephen Sprunk <stephen@sprunk.org>
Date2015-11-29 02:23 -0600
Message-ID<n3echd$qaj$1@dont-email.me>
In reply to#77371
On 29-Nov-15 01:53, Malcolm McLean wrote:
> On Sunday, November 29, 2015 at 3:30:55 AM UTC, Ian Collins wrote:
>> Morten W. Petersen wrote:
>>> Now I'm at the point where I'm writing some output (write XML 
>>> files) functionality..  I want to output in an UTF32-LE
>>> encoding, which means each "character" takes up 32 bits or 4
>>> regular chars.
>> 
>> Why do that when just about everyone else uses UTF-8?  I don't 
>> think I've seen an XML document that uses anything else.  Even MS 
>> Office XML uses UTF-8.
> 
> Since UTF-32 is allowed, a general purpose reader must support it. 
> Since it is never used, such a reader can only be tested if you
> first produce a UTF-32 writer.

Do you really want to be the one testing the quality of every other XML
reader on the planet's UTF-32 implementation?  Or should you just use
UTF-8 like everyone else and _know_ will work in _every_ reader?

There is some (IMHO not enough) justification for using UTF-16LE if
you're primarily targeting Windows/Java readers, but not UTF-32.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking

[toc] | [prev] | [next] | [standalone]


#77377

FromMalcolm McLean <malcolm.mclean5@btinternet.com>
Date2015-11-29 00:30 -0800
Message-ID<834d72b5-230d-4ff6-a558-5885932e6b6b@googlegroups.com>
In reply to#77375
On Sunday, November 29, 2015 at 8:23:31 AM UTC, Stephen Sprunk wrote:
> On 29-Nov-15 01:53, Malcolm McLean wrote:
> > On Sunday, November 29, 2015 at 3:30:55 AM UTC, Ian Collins wrote:
> >> Morten W. Petersen wrote:
> >>> Now I'm at the point where I'm writing some output (write XML 
> >>> files) functionality..  I want to output in an UTF32-LE
> >>> encoding, which means each "character" takes up 32 bits or 4
> >>> regular chars.
> >> 
> >> Why do that when just about everyone else uses UTF-8?  I don't 
> >> think I've seen an XML document that uses anything else.  Even MS 
> >> Office XML uses UTF-8.
> > 
> > Since UTF-32 is allowed, a general purpose reader must support it. 
> > Since it is never used, such a reader can only be tested if you
> > first produce a UTF-32 writer.
> 
> Do you really want to be the one testing the quality of every other XML
> reader on the planet's UTF-32 implementation?  Or should you just use
> UTF-8 like everyone else and _know_ will work in _every_ reader?
> 
> There is some (IMHO not enough) justification for using UTF-16LE if
> you're primarily targeting Windows/Java readers, but not UTF-32.
> 
You haven't been following the ng over the a last few weeks.
Morten is developing an XML parser. This is the first we've heard
about the XML writer (as is typical, a writer is a lot easier
to develop than a reader).

[toc] | [prev] | [next] | [standalone]


#77419

From"Morten W. Petersen" <morphex@gmail.com>
Date2015-11-30 01:33 +0100
Message-ID<n3g5eq$pvg$1@speranza.aioe.org>
In reply to#77377
On 29.11.2015 09:30, Malcolm McLean wrote:
> On Sunday, November 29, 2015 at 8:23:31 AM UTC, Stephen Sprunk wrote:
>> On 29-Nov-15 01:53, Malcolm McLean wrote:
[...]
>> Do you really want to be the one testing the quality of every other XML
>> reader on the planet's UTF-32 implementation?  Or should you just use
>> UTF-8 like everyone else and _know_ will work in _every_ reader?
>>
>> There is some (IMHO not enough) justification for using UTF-16LE if
>> you're primarily targeting Windows/Java readers, but not UTF-32.
>>
> You haven't been following the ng over the a last few weeks.
> Morten is developing an XML parser. This is the first we've heard
> about the XML writer (as is typical, a writer is a lot easier
> to develop than a reader).

I really don't want to do more than I have to, what I want to do is
create a library/program that will enable reading, writing and
manipulating XML files.  And do it securely, correctly, and fast.

Everyone is on about UTF-8 it seems, and that's the world as it is
today.. UTF-16 is sort of the middle way which requires some tricks
to represent all characters while UTF-32 is what it is, but requires
more storage.

There has been a long discussion about UTF-x on this newsgroup earlier,
and that discussion shows the reasoning behind everything.

There might be UTF-8 and UTF-16 reading/writing support tacked on later
in the development process, but for example bzipping an XML file I
assume would produce files of roughly equal size if it is in UTF-8,
16 or 32, and bzip is old and looks to be safe from IP claims.

-Morten

[toc] | [prev] | [next] | [standalone]


#77421 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

FromIan Collins <ian-news@hotmail.com>
Date2015-11-30 13:54 +1300
SubjectRe: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID<dc1l2pF2cb8U1@mid.individual.net>
In reply to#77419
Morten W. Petersen wrote:
>
> I really don't want to do more than I have to, what I want to do is
> create a library/program that will enable reading, writing and
> manipulating XML files.  And do it securely, correctly, and fast.

So stick to the encoding petty much every XML file uses (and every 
parser must support): UTF-8!

> Everyone is on about UTF-8 it seems, and that's the world as it is
> today.. UTF-16 is sort of the middle way which requires some tricks
> to represent all characters while UTF-32 is what it is, but requires
> more storage.

UTF-8 is where the world will probably stay.

> There has been a long discussion about UTF-x on this newsgroup earlier,
> and that discussion shows the reasoning behind everything.
>
> There might be UTF-8 and UTF-16 reading/writing support tacked on later
> in the development process, but for example bzipping an XML file I
> assume would produce files of roughly equal size if it is in UTF-8,
> 16 or 32, and bzip is old and looks to be safe from IP claims.

If you are writing a conforming parser, it must support UTF-8 (and 
UTF-16, but I've yet to encounter a UTF-16 encoded document).

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#77422

From"Morten W. Petersen" <morphex@gmail.com>
Date2015-11-30 02:03 +0100
Message-ID<n3g77s$sor$1@speranza.aioe.org>
In reply to#77421
On 30.11.2015 01:54, Ian Collins wrote:
> Morten W. Petersen wrote:
>>
>> I really don't want to do more than I have to, what I want to do is
>> create a library/program that will enable reading, writing and
>> manipulating XML files.  And do it securely, correctly, and fast.
>
> So stick to the encoding petty much every XML file uses (and every
> parser must support): UTF-8!
>
>> Everyone is on about UTF-8 it seems, and that's the world as it is
>> today.. UTF-16 is sort of the middle way which requires some tricks
>> to represent all characters while UTF-32 is what it is, but requires
>> more storage.
>
> UTF-8 is where the world will probably stay.

I think that's a bold claim.. :D

>> There has been a long discussion about UTF-x on this newsgroup earlier,
>> and that discussion shows the reasoning behind everything.
>>
>> There might be UTF-8 and UTF-16 reading/writing support tacked on later
>> in the development process, but for example bzipping an XML file I
>> assume would produce files of roughly equal size if it is in UTF-8,
>> 16 or 32, and bzip is old and looks to be safe from IP claims.
>
> If you are writing a conforming parser, it must support UTF-8 (and
> UTF-16, but I've yet to encounter a UTF-16 encoded document).

Yes, conformance is important, so there will be some sort of solution
there if the library is to be used as a conforming parser.

-Morten

[toc] | [prev] | [next] | [standalone]


#77423 — Re: Working efficiently with 32-bit Unicode output streams, locale etc.

FromIan Collins <ian-news@hotmail.com>
Date2015-11-30 14:15 +1300
SubjectRe: Working efficiently with 32-bit Unicode output streams, locale etc.
Message-ID<dc1m91F2cb8U2@mid.individual.net>
In reply to#77422
Morten W. Petersen wrote:
> On 30.11.2015 01:54, Ian Collins wrote:
>> Morten W. Petersen wrote:
>>>
>>> I really don't want to do more than I have to, what I want to do is
>>> create a library/program that will enable reading, writing and
>>> manipulating XML files.  And do it securely, correctly, and fast.
>>
>> So stick to the encoding petty much every XML file uses (and every
>> parser must support): UTF-8!
>>
>>> Everyone is on about UTF-8 it seems, and that's the world as it is
>>> today.. UTF-16 is sort of the middle way which requires some tricks
>>> to represent all characters while UTF-32 is what it is, but requires
>>> more storage.
>>
>> UTF-8 is where the world will probably stay.
>
> I think that's a bold claim.. :D

Have you ever encountered a UTF-32 (or UTF-16) encoded XML document?  I 
can't imagine why anyone would want to create one given the lack of 
applications that can read, let alone parse, it.  UTF-8 is universally 
popular (off Windows) because being a super-set of ASCII, just about 
anything can display it.

>>> There has been a long discussion about UTF-x on this newsgroup earlier,
>>> and that discussion shows the reasoning behind everything.
>>>
>>> There might be UTF-8 and UTF-16 reading/writing support tacked on later
>>> in the development process, but for example bzipping an XML file I
>>> assume would produce files of roughly equal size if it is in UTF-8,
>>> 16 or 32, and bzip is old and looks to be safe from IP claims.
>>
>> If you are writing a conforming parser, it must support UTF-8 (and
>> UTF-16, but I've yet to encounter a UTF-16 encoded document).
>
> Yes, conformance is important, so there will be some sort of solution
> there if the library is to be used as a conforming parser.

I suggest you look there first rather than wasting time with an exotic 
format no one uses...


-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


Page 1 of 11  [1] 2 3 … 11  Next page →

Back to top | Article view | comp.lang.c


csiph-web