Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77357 > unrolled thread
| Started by | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| First post | 2015-11-29 01:06 +0100 |
| Last post | 2015-12-02 09:58 -0800 |
| Articles | 20 on this page of 210 — 25 participants |
Back to article view | Back to comp.lang.c
Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800
Page 2 of 11 — ← Prev page 1 [2] 3 4 … 11 Next page →
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 02:34 +0100 |
| Message-ID | <n3g921$vlq$1@speranza.aioe.org> |
| In reply to | #77423 |
On 30.11.2015 02:15, Ian Collins wrote: > Morten W. Petersen wrote: >> On 30.11.2015 01:54, Ian Collins wrote: >>> Morten W. Petersen wrote: >>>> >>>> I really don't want to do more than I have to, what I want to do is >>>> create a library/program that will enable reading, writing and >>>> manipulating XML files. And do it securely, correctly, and fast. >>> >>> So stick to the encoding petty much every XML file uses (and every >>> parser must support): UTF-8! >>> >>>> Everyone is on about UTF-8 it seems, and that's the world as it is >>>> today.. UTF-16 is sort of the middle way which requires some tricks >>>> to represent all characters while UTF-32 is what it is, but requires >>>> more storage. >>> >>> UTF-8 is where the world will probably stay. >> >> I think that's a bold claim.. :D > > Have you ever encountered a UTF-32 (or UTF-16) encoded XML document? I > can't imagine why anyone would want to create one given the lack of > applications that can read, let alone parse, it. UTF-8 is universally > popular (off Windows) because being a super-set of ASCII, just about > anything can display it. Well, let's say you have some organization that wants to create an archive of lots of non-latin history, in XML. For them, choosing XML is right, and UTF-8 uses 3 bytes on characters U+0800 through U+FFFF, but only 2 bytes in UTF-16. However, UTF-16 is vulnerable to the entire string being corrupted after invalid data has been encountered. So this organization chooses to use UTF-32, because the unnecessary byte there also acts as a delimiter. This is plausible. As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier discussion on comp.lang.c. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-11-30 14:42 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc1nseF2cb8U3@mid.individual.net> |
| In reply to | #77424 |
Morten W. Petersen wrote: > On 30.11.2015 02:15, Ian Collins wrote: >> Morten W. Petersen wrote: >>> On 30.11.2015 01:54, Ian Collins wrote: >>>> Morten W. Petersen wrote: >>>>> >>>>> I really don't want to do more than I have to, what I want to do is >>>>> create a library/program that will enable reading, writing and >>>>> manipulating XML files. And do it securely, correctly, and fast. >>>> >>>> So stick to the encoding petty much every XML file uses (and every >>>> parser must support): UTF-8! >>>> >>>>> Everyone is on about UTF-8 it seems, and that's the world as it is >>>>> today.. UTF-16 is sort of the middle way which requires some tricks >>>>> to represent all characters while UTF-32 is what it is, but requires >>>>> more storage. >>>> >>>> UTF-8 is where the world will probably stay. >>> >>> I think that's a bold claim.. :D >> >> Have you ever encountered a UTF-32 (or UTF-16) encoded XML document? I >> can't imagine why anyone would want to create one given the lack of >> applications that can read, let alone parse, it. UTF-8 is universally >> popular (off Windows) because being a super-set of ASCII, just about >> anything can display it. > > Well, let's say you have some organization that wants to create an > archive of lots of non-latin history, in XML. > > For them, choosing XML is right, and UTF-8 uses 3 bytes on characters > U+0800 through U+FFFF, but only 2 bytes in UTF-16. If they had the good sense to use a compressed filesystem, that wouldn't matter! > However, UTF-16 is vulnerable to the entire string being corrupted > after invalid data has been encountered. > > So this organization chooses to use UTF-32, because the unnecessary byte > there also acts as a delimiter. > > This is plausible. It may be, but I'll ask again: Have you ever encountered a UTF-32 encoded XML document? Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 04:16 +0100 |
| Message-ID | <n3gf11$9a7$1@speranza.aioe.org> |
| In reply to | #77425 |
On 30.11.2015 02:42, Ian Collins wrote: > Morten W. Petersen wrote: [...] >> This is plausible. > > It may be, but I'll ask again: Have you ever encountered a UTF-32 > encoded XML document? No, can't say I have. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-29 20:20 -0600 |
| Message-ID | <n3gble$sbh$1@dont-email.me> |
| In reply to | #77424 |
On 29-Nov-15 19:34, Morten W. Petersen wrote: > On 30.11.2015 02:15, Ian Collins wrote: >> Morten W. Petersen wrote: >>> On 30.11.2015 01:54, Ian Collins wrote: >>>> UTF-8 is where the world will probably stay. >>> >>> I think that's a bold claim.. :D It's not bold at all; stats clearly show UTF-8 is now blowing away all other encodings. The trend in that direction has been solid for over two decades, and there is no logical reason for it to ever reverse. >> Have you ever encountered a UTF-32 (or UTF-16) encoded XML >> document? I can't imagine why anyone would want to create one >> given the lack of applications that can read, let alone parse, it. >> UTF-8 is universally popular (off Windows) because being a >> super-set of ASCII, just about anything can display it. > > Well, let's say you have some organization that wants to create an > archive of lots of non-latin history, in XML. > > For them, choosing XML is right, That's part of the stated requirements above, not a choice. > and UTF-8 uses 3 bytes on characters U+0800 through U+FFFF, but > only 2 bytes in UTF-16. OTOH, UTF-8 uses only 1 byte for U+000000 through U+00007F whereas UTF-16 uses 2 bytes; this is important for XML (or HTML) because that range includes XML (or HTML) markup characters, whitespace, etc., with the result that total document size is usually smaller even for scripts in the U+000800 to U+00FFFF range. For a detailed comparison: https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Eight-bit_environments Even for dense text, a general-purpose compressor results in about the same size anyway, so if that's expected, you can forget about the few cases where UTF-8 loses on size and focus on portability and robustness--where UTF-8 even more clearly wins. > However, UTF-16 is vulnerable to the entire string being corrupted > after invalid data has been encountered. No; UTF-16 is self-synchronizing, just like UTF-8. But it's a lot easier for UTF-16 to _get_ corrupted because most people forget that it's a variable-length encoding than do UTF-8. > So this organization chooses to use UTF-32, because the unnecessary > byte there also acts as a delimiter. No, it doesn't. > This is plausible. No, it isn't. There is a reason that _nobody_ uses UTF-32 for file/wire formats. > As for the rest of the UTF-8 vs 16 and 32 debate, look at the > earlier discussion on comp.lang.c. Do you have a Message-ID or Subject to reference? All such discussions I can recall have strongly favored UTF-8, unless you're targeting _only_ Windows or Java, where UTF-16 (or UCS-2 falsely labeled as UTF-16, hence endless bugs) is inescapable. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 04:34 +0100 |
| Message-ID | <n3gg2j$anu$1@speranza.aioe.org> |
| In reply to | #77427 |
On 30.11.2015 03:20, Stephen Sprunk wrote: > On 29-Nov-15 19:34, Morten W. Petersen wrote: >> On 30.11.2015 02:15, Ian Collins wrote: >>> Morten W. Petersen wrote: >>>> On 30.11.2015 01:54, Ian Collins wrote: >>>>> UTF-8 is where the world will probably stay. >>>> >>>> I think that's a bold claim.. :D > > It's not bold at all; stats clearly show UTF-8 is now blowing away all > other encodings. The trend in that direction has been solid for over > two decades, and there is no logical reason for it to ever reverse. Mm, yes. And UTF-8 superseded ASCII and ISO-8859-1. Are you saying that UTF-16/32 will never be more popular than UTF-8? >> and UTF-8 uses 3 bytes on characters U+0800 through U+FFFF, but >> only 2 bytes in UTF-16. > > OTOH, UTF-8 uses only 1 byte for U+000000 through U+00007F whereas > UTF-16 uses 2 bytes; this is important for XML (or HTML) because that > range includes XML (or HTML) markup characters, whitespace, etc., with > the result that total document size is usually smaller even for scripts > in the U+000800 to U+00FFFF range. > > For a detailed comparison: > https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Eight-bit_environments > > Even for dense text, a general-purpose compressor results in about the > same size anyway, so if that's expected, you can forget about the few > cases where UTF-8 loses on size and focus on portability and > robustness--where UTF-8 even more clearly wins. Yes that's true that XML markup is in the ASCII/UTF-8 range. But UTF-32 does not require encoding or decoding for any character. >> However, UTF-16 is vulnerable to the entire string being corrupted >> after invalid data has been encountered. > > No; UTF-16 is self-synchronizing, just like UTF-8. But it's a lot > easier for UTF-16 to _get_ corrupted because most people forget that > it's a variable-length encoding than do UTF-8. What do you think about the odd number statement here? https://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16 >> So this organization chooses to use UTF-32, because the unnecessary >> byte there also acts as a delimiter. > > No, it doesn't. It is a delimiter, but maybe it requires some looking back and forth to decide what is the padding byte and what's part of the actual Unicode character. >> This is plausible. > > No, it isn't. Why not? >> As for the rest of the UTF-8 vs 16 and 32 debate, look at the >> earlier discussion on comp.lang.c. > > Do you have a Message-ID or Subject to reference? > > All such discussions I can recall have strongly favored UTF-8, unless > you're targeting _only_ Windows or Java, where UTF-16 (or UCS-2 falsely > labeled as UTF-16, hence endless bugs) is inescapable. I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good starting point. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-11-30 17:09 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc20g8F2cb8U4@mid.individual.net> |
| In reply to | #77429 |
Morten W. Petersen wrote: > On 30.11.2015 03:20, Stephen Sprunk wrote: >> On 29-Nov-15 19:34, Morten W. Petersen wrote: >>> On 30.11.2015 02:15, Ian Collins wrote: >>>> Morten W. Petersen wrote: >>>>> On 30.11.2015 01:54, Ian Collins wrote: >>>>>> UTF-8 is where the world will probably stay. >>>>> >>>>> I think that's a bold claim.. :D >> >> It's not bold at all; stats clearly show UTF-8 is now blowing away all >> other encodings. The trend in that direction has been solid for over >> two decades, and there is no logical reason for it to ever reverse. > > Mm, yes. And UTF-8 superseded ASCII and ISO-8859-1. Are you saying > that UTF-16/32 will never be more popular than UTF-8? Given there's no good reason not to prefer UTF-8, the answer there is a pretty solid yes. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 06:17 +0100 |
| Message-ID | <n3gm46$k8h$1@speranza.aioe.org> |
| In reply to | #77430 |
On 30.11.2015 05:09, Ian Collins wrote: > Morten W. Petersen wrote: >> On 30.11.2015 03:20, Stephen Sprunk wrote: >>> On 29-Nov-15 19:34, Morten W. Petersen wrote: >>>> On 30.11.2015 02:15, Ian Collins wrote: >>>>> Morten W. Petersen wrote: >>>>>> On 30.11.2015 01:54, Ian Collins wrote: >>>>>>> UTF-8 is where the world will probably stay. >>>>>> >>>>>> I think that's a bold claim.. :D >>> >>> It's not bold at all; stats clearly show UTF-8 is now blowing away all >>> other encodings. The trend in that direction has been solid for over >>> two decades, and there is no logical reason for it to ever reverse. >> >> Mm, yes. And UTF-8 superseded ASCII and ISO-8859-1. Are you saying >> that UTF-16/32 will never be more popular than UTF-8? > > Given there's no good reason not to prefer UTF-8, the answer there is a > pretty solid yes. Well as you see in the link in the post above, it says that "As a result, text in (for example) Chinese, Japanese or Hindi will take more space in UTF-8 if there are more of these characters than there are ASCII characters." Now I had a look at an .odt export from my Google Drive, and the content.xml there is a horrible mess; the text itself is about 3 KB, if I wrote that to a proper XHTML+CSS file, it would maybe be 5 KB. Unzipped the .odt file is 250KB. We could argue back and forth about this, but I think the only real way to settle a discussion is with real data and hard numbers, and I don't think any of us has the time or energy to do that. Internally, Smash XML will use 32 bits. It might add an option to use only 21 bits later, if someone has a real need for it. To make the output 32 bits by default seems like an OK solution to me. I like the idea of having for example bzip as a compression method if saving space or reducing read/write times from storage is important. That the parser must accept UTF-8 and 16 to be a parser that follows the rules is OK. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-11-30 19:44 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc29hvFi96jU1@mid.individual.net> |
| In reply to | #77432 |
Morten W. Petersen wrote: > On 30.11.2015 05:09, Ian Collins wrote: >> Morten W. Petersen wrote: >>> >>> Mm, yes. And UTF-8 superseded ASCII and ISO-8859-1. Are you saying >>> that UTF-16/32 will never be more popular than UTF-8? >> >> Given there's no good reason not to prefer UTF-8, the answer there is a >> pretty solid yes. > > Well as you see in the link in the post above, it says that > > "As a result, text in (for example) Chinese, Japanese or Hindi will > take more space in UTF-8 if there are more of these characters than > there are ASCII characters." With file compression, that is irrelevant. > Now I had a look at an .odt export from my Google Drive, and the > content.xml there is a horrible mess; the text itself is about > 3 KB, if I wrote that to a proper XHTML+CSS file, it would maybe > be 5 KB. Unzipped the .odt file is 250KB. That's XML for you. That's why XML office formats use zip files. > We could argue back and forth about this, but I think the only real > way to settle a discussion is with real data and hard numbers, and I > don't think any of us has the time or energy to do that. The best "real data and hard numbers" is the ratio between UTF-8 encoded XML documents and the rest! -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-29 23:36 -0600 |
| Message-ID | <n3gn5a$jju$1@dont-email.me> |
| In reply to | #77429 |
On 29-Nov-15 21:34, Morten W. Petersen wrote: > On 30.11.2015 03:20, Stephen Sprunk wrote: >> It's not bold at all; stats clearly show UTF-8 is now blowing away >> all other encodings. The trend in that direction has been solid >> for over two decades, and there is no logical reason for it to ever >> reverse. > > Mm, yes. And UTF-8 superseded ASCII and ISO-8859-1. Are you saying > that UTF-16/32 will never be more popular than UTF-8? For file/wire formats, yes, I'm saying that. We have tried that experiment, and UTF-8 has clearly won. UTF-32 (and, for a time, UTF-16) may persist a bit longer for internal uses, particularly for string manipulation (though the vast majority of string handling treats them as opaque blobs and is just fine with UTF-8), but that's it. >> Even for dense text, a general-purpose compressor results in about >> the same size anyway, so if that's expected, you can forget about >> the few cases where UTF-8 loses on size and focus on portability >> and robustness--where UTF-8 even more clearly wins. > > Yes that's true that XML markup is in the ASCII/UTF-8 range. It's in the ASCII range; _all_ code points are in the "UTF-8 range", making the latter a meaningless term. > But UTF-32 does not require encoding or decoding for any character. UTF-32 still requires deals with BOMs and LE vs BE. ITYM that UTF-32 means a code unit equals a code point, but that is of dubious value in the vast majority of circumstances since neither matches what _users_ consider a "character", i.e. a grapheme cluster. >>> However, UTF-16 is vulnerable to the entire string being >>> corrupted after invalid data has been encountered. >> >> No; UTF-16 is self-synchronizing, just like UTF-8. But it's a lot >> easier for UTF-16 to _get_ corrupted because most people forget >> that it's a variable-length encoding than do UTF-8. > > What do you think about the odd number statement here? > > https://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16 Ah, true; if you _add or remove_ an odd number of bytes, that is a serious problem for UTF-16, but for a reasonably long text, you will eventually encounter an invalid surrogate pair, which would allow you to resynchronize--though most decoders probably don't bother. UTF-8 doesn't have that problem; it will always resynchronize on the very next character since leading and trailing bytes are distinct. >>> As for the rest of the UTF-8 vs 16 and 32 debate, look at the >>> earlier discussion on comp.lang.c. >> >> Do you have a Message-ID or Subject to reference? >> >> All such discussions I can recall have strongly favored UTF-8, >> unless you're targeting _only_ Windows or Java, where UTF-16 (or >> UCS-2 falsely labeled as UTF-16, hence endless bugs) is >> inescapable. > > I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good > starting point. It seems like that particular debate comes down to people from certain countries unhappy that their script requires 3 bytes per code point in UTF-8 but only 2 bytes per code point in UTF-16, and your answer was to make _all_ scripts require 4 bytes per code point. Sometimes politics force us to do dumb things, and if that's the case then so be it, but that doesn't make it not a dumb thing. Note that, even in those countries, UTF-8 has clearly eclipsed all other encodings in actual use, politics be damned. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 07:39 +0100 |
| Message-ID | <n3gqu2$rvn$1@speranza.aioe.org> |
| In reply to | #77435 |
On 30.11.2015 06:36, Stephen Sprunk wrote: > On 29-Nov-15 21:34, Morten W. Petersen wrote: [...] >> I think Message-ID <o5pfx.77838$hH6.62666@fx22.iad> is a good >> starting point. > > It seems like that particular debate comes down to people from certain > countries unhappy that their script requires 3 bytes per code point in > UTF-8 but only 2 bytes per code point in UTF-16, and your answer was to > make _all_ scripts require 4 bytes per code point. > > Sometimes politics force us to do dumb things, and if that's the case > then so be it, but that doesn't make it not a dumb thing. > > Note that, even in those countries, UTF-8 has clearly eclipsed all other > encodings in actual use, politics be damned. Mhm. Well UTF-8 favors certain characters over others as you say. If you compress that, or a UTF-32 document, the size should be about the same. I think it's more fair that any given character takes the same amount of space uncompressed, and then compression can be applied if it is necessary to save space. I think it's a good, clean design to accept UTF 8, 16 and 32, and then output in 32. Internally things are 32 bits, and no code internally has to "work around" strings in UTF-8. Interestingly enough, this page http://w3techs.com/technologies/overview/site_element/all says that only 2/3rds of websites use compression. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-30 13:56 -0600 |
| Message-ID | <n3i9h3$ff3$1@dont-email.me> |
| In reply to | #77436 |
On 30-Nov-15 00:39, Morten W. Petersen wrote: > On 30.11.2015 06:36, Stephen Sprunk wrote: >> It seems like that particular debate comes down to people from >> certain countries unhappy that their script requires 3 bytes per >> code point in UTF-8 but only 2 bytes per code point in UTF-16, and >> your answer was to make _all_ scripts require 4 bytes per code >> point. >> >> Sometimes politics force us to do dumb things, and if that's the >> case then so be it, but that doesn't make it not a dumb thing. >> >> Note that, even in those countries, UTF-8 has clearly eclipsed all >> other encodings in actual use, politics be damned. > > Mhm. Well UTF-8 favors certain characters over others as you say. If > you compress that, or a UTF-32 document, the size should be about the > same. The compressed size is a rough measure of entropy, which won't be affected (much) by the encoding scheme of the uncompressed data. > I think it's more fair that any given character takes the same > amount of space uncompressed, and then compression can be applied if > it is necessary to save space. Yes, some people think the only "fair" solution to poverty is to make _everyone_ poor. Most people disagree, particularly the non-poor. > I think it's a good, clean design to accept UTF 8, 16 and 32, and > then output in 32. Why, when _every_ tool that might consume your output expects UTF-8, and no other tool is going to produce anything but UTF-8 as your input? > Internally things are 32 bits, and no code internally has to "work > around" strings in UTF-8. The internal representation you choose is another matter entirely; UTF-32 is a reasonable choice in many circumstances, with a conversion from/to UTF-8 on input/output. In years past, I'd have suggested also supporting other encodings for input/output, but now that's pointless. > Interestingly enough, this page > > http://w3techs.com/technologies/overview/site_element/all > > says that only 2/3rds of websites use compression. It's not clear exactly what that's measuring, and their "technologies overview" is surprisingly unhelpful. In particular, there are several possible types and methods of compression, and it's not clear they're measuring all of them. Also, by volume, the vast majority of Web content is static image and video files that are already highly compressed; some web sites compress static text content, but many don't bother because it's not worth the effort, and compressing dynamic text content is a pain. OTOH, one click away on that web site is this: http://w3techs.com/technologies/overview/character_encoding/all UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), while UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at all_. In reality, this part of the debate was over 10+ years ago when Google announced that UTF-8 had reached majority status, i.e. more popular than all other encodings _combined_. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-12-01 09:17 +0100 |
| Message-ID | <n3jl18$1em$1@speranza.aioe.org> |
| In reply to | #77477 |
On 30.11.2015 20:56, Stephen Sprunk wrote: > On 30-Nov-15 00:39, Morten W. Petersen wrote: >> On 30.11.2015 06:36, Stephen Sprunk wrote: >>> It seems like that particular debate comes down to people from >>> certain countries unhappy that their script requires 3 bytes per >>> code point in UTF-8 but only 2 bytes per code point in UTF-16, and >>> your answer was to make _all_ scripts require 4 bytes per code >>> point. >>> >>> Sometimes politics force us to do dumb things, and if that's the >>> case then so be it, but that doesn't make it not a dumb thing. >>> >>> Note that, even in those countries, UTF-8 has clearly eclipsed all >>> other encodings in actual use, politics be damned. >> >> Mhm. Well UTF-8 favors certain characters over others as you say. If >> you compress that, or a UTF-32 document, the size should be about the >> same. > > The compressed size is a rough measure of entropy, which won't be > affected (much) by the encoding scheme of the uncompressed data. Yeah, something like that. >> I think it's more fair that any given character takes the same >> amount of space uncompressed, and then compression can be applied if >> it is necessary to save space. > > Yes, some people think the only "fair" solution to poverty is to make > _everyone_ poor. Most people disagree, particularly the non-poor. I think that's a very bad analogy. :) >> Interestingly enough, this page >> >> http://w3techs.com/technologies/overview/site_element/all >> >> says that only 2/3rds of websites use compression. > > It's not clear exactly what that's measuring, and their "technologies > overview" is surprisingly unhelpful. In particular, there are several > possible types and methods of compression, and it's not clear they're > measuring all of them. Well one of the arguments against UTF-32 is that it uses more space, with compression that's no longer a valid argument. > Also, by volume, the vast majority of Web content is static image and > video files that are already highly compressed; some web sites compress > static text content, but many don't bother because it's not worth the > effort, and compressing dynamic text content is a pain. I've worked with dynamic web content for a long time, and there compression is also an option. Any decent setup should be able to compress dynamic content that is served. > OTOH, one click away on that web site is this: > http://w3techs.com/technologies/overview/character_encoding/all > > UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), while > UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at all_. > > In reality, this part of the debate was over 10+ years ago when Google > announced that UTF-8 had reached majority status, i.e. more popular than > all other encodings _combined_. I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32 in the time to come. With my parser/writer/DOM library, I'm chipping in towards that. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 13:40 -0600 |
| Message-ID | <n3nhat$mho$1@dont-email.me> |
| In reply to | #77496 |
On 01-Dec-15 02:17, Morten W. Petersen wrote: > On 30.11.2015 20:56, Stephen Sprunk wrote: >> On 30-Nov-15 00:39, Morten W. Petersen wrote: >>> I think it's more fair that any given character takes the same >>> amount of space uncompressed, and then compression can be applied >>> if it is necessary to save space. >> >> Yes, some people think the only "fair" solution to poverty is to >> make _everyone_ poor. Most people disagree, particularly the >> non-poor. > > I think that's a very bad analogy. :) Actually, I think it's perfect. You see that some scripts use more bytes than others, so your solution is to move _all_ script into the worst case scenario. >>> Interestingly enough, this page >>> >>> http://w3techs.com/technologies/overview/site_element/all >>> >>> says that only 2/3rds of websites use compression. >> >> It's not clear exactly what that's measuring, and their >> "technologies overview" is surprisingly unhelpful. In particular, >> there are several possible types and methods of compression, and >> it's not clear they're measuring all of them. > > Well one of the arguments against UTF-32 is that it uses more space, > with compression that's no longer a valid argument. According to that site, 1/3 of web sites don't use compression, so it's still a valid argument for them. But I would agree that the size issue is mostly a side show; it's the simplest argument, so that's usually the one that comes out first, but there are other problems with UTF-32, and Unicode as a whole has many other problems--far more serious ones--that are encoding-independent. Anyone who treats Unicode as just a wider version of ASCII is in for a rude awakening. >> Also, by volume, the vast majority of Web content is static image >> and video files that are already highly compressed; some web sites >> compress static text content, but many don't bother because it's >> not worth the effort, and compressing dynamic text content is a >> pain. > > I've worked with dynamic web content for a long time, and there > compression is also an option. Any decent setup should be able to > compress dynamic content that is served. Compressing text takes a lot more CPU power than just dumping it into a socket. Doing it once for a static page is no big deal, but doing it for every page view is another story entirely, and the extra CPU power required can easily translate into bigger hardware bills--and unlike compressing static image/video content, it won't reduce your bandwidth bills enough to compensate. >> OTOH, one click away on that web site is this: >> http://w3techs.com/technologies/overview/character_encoding/all >> >> UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), >> while UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at >> all_. > > I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32 > in the time to come. Since it's sitting at 0% today (and has been since invented 20+ years ago), we'll certainly not be seeing _less_ of it in the future, but I don't see any reason to expect _more_ of it either. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-12-04 00:34 +0100 |
| Message-ID | <n3qjhn$v9u$1@speranza.aioe.org> |
| In reply to | #77651 |
On 02.12.2015 20:40, Stephen Sprunk wrote: > On 01-Dec-15 02:17, Morten W. Petersen wrote: >> On 30.11.2015 20:56, Stephen Sprunk wrote: >>> On 30-Nov-15 00:39, Morten W. Petersen wrote: >>>> I think it's more fair that any given character takes the same >>>> amount of space uncompressed, and then compression can be applied >>>> if it is necessary to save space. >>> >>> Yes, some people think the only "fair" solution to poverty is to >>> make _everyone_ poor. Most people disagree, particularly the >>> non-poor. >> >> I think that's a very bad analogy. :) > > Actually, I think it's perfect. You see that some scripts use more > bytes than others, so your solution is to move _all_ script into the > worst case scenario. I don't see any point in arguing on this any further, economics is a very complex subject, while programming and data is fairly simple. >> Well one of the arguments against UTF-32 is that it uses more space, >> with compression that's no longer a valid argument. > > According to that site, 1/3 of web sites don't use compression, so it's > still a valid argument for them. > > But I would agree that the size issue is mostly a side show; it's the > simplest argument, so that's usually the one that comes out first, but > there are other problems with UTF-32, and Unicode as a whole has many > other problems--far more serious ones--that are encoding-independent. > > Anyone who treats Unicode as just a wider version of ASCII is in for a > rude awakening. What are these problems with UTF-32? >> I've worked with dynamic web content for a long time, and there >> compression is also an option. Any decent setup should be able to >> compress dynamic content that is served. > > Compressing text takes a lot more CPU power than just dumping it into a > socket. Doing it once for a static page is no big deal, but doing it > for every page view is another story entirely, and the extra CPU power > required can easily translate into bigger hardware bills--and unlike > compressing static image/video content, it won't reduce your bandwidth > bills enough to compensate. Well, for a site with dynamic content, there is already some CPU-utilization, while for example "resources" such as CSS files, JavaScript files or data files can be compressed & cached, saved as compressed files etc. I think it's a fair guess that most sites with a lot of dynamic elements, will spend at least tenfolds more CPU-time generating dynamic content than compressing it. >>> OTOH, one click away on that web site is this: >>> http://w3techs.com/technologies/overview/character_encoding/all >>> >>> UTF-8 clearly dominates at 85.7% (and steadily rising ~0.3%/mo), >>> while UTF-16 is at "less than 0.1%" and UTF-32 doesn't show up _at >>> all_. >> >> I'm not saying UTF-8 was wrong, but I think we'll see more of UTF-32 >> in the time to come. > > Since it's sitting at 0% today (and has been since invented 20+ years > ago), we'll certainly not be seeing _less_ of it in the future, but I > don't see any reason to expect _more_ of it either. Well, for my project UTF-32 is right; whether or not UTF-32 will be pushed by (powerful) groups or become something that's important to individual people remains to be seen. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-03 16:03 -0800 |
| Message-ID | <5e187f54-668f-4a9b-8980-a2bed846456d@googlegroups.com> |
| In reply to | #77783 |
On Thursday, December 3, 2015 at 11:34:25 PM UTC, Morten W. Petersen wrote: > > Well, for a site with dynamic content, there is already some > CPU-utilization, while for example "resources" such as CSS files, > JavaScript files or data files can be compressed & cached, saved as > compressed files etc. > > I think it's a fair guess that most sites with a lot of dynamic > elements, will spend at least tenfolds more CPU-time generating > dynamic content than compressing it. > These days, an html page is effectively an application, the web server effectively the disk drive, although it's a bit more intelligent than a PC disk drive, more a database back end (which it often literally is). But virtually all the action occurs on the client side. The user types characters, such as I'm doing here, and the machine looks up an font and maintains a raster. It also scrolls and even does auto-correction (annoying). Then I press post and an couple of kilobytes of data go to the eerver and are put on usenet. But that's one call.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-11-29 23:07 -0800 |
| Message-ID | <e037cc57-2024-491d-a992-8e821cd8014b@googlegroups.com> |
| In reply to | #77424 |
On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote: > On 30.11.2015 02:15, Ian Collins wrote: > > Well, let's say you have some organization that wants to create an > archive of lots of non-latin history, in XML. > > For them, choosing XML is right, and UTF-8 uses 3 bytes on characters > U+0800 through U+FFFF, but only 2 bytes in UTF-16. > > However, UTF-16 is vulnerable to the entire string being corrupted > after invalid data has been encountered. > > So this organization chooses to use UTF-32, because the unnecessary byte > there also acts as a delimiter. > > This is plausible. > > As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier > discussion on comp.lang.c. > The debate isn't entirely over. Some Indians (Hindu, not red) don't like UTF-8 because Indian characters are represented by longer sequences, which they see as giving second status to their culture. And of course UTF-8 arrays don't easily support random access. And Microsoft has gone the UTF-16 route, as has Java. But the consensus is moving to UTF-8. Certainly it's my own view that the other encoding should be treated as a nuisance, and only converted to at the last moment to interface with systems that insist on them.
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 08:20 +0100 |
| Message-ID | <n3gtas$9k$1@speranza.aioe.org> |
| In reply to | #77439 |
On 30.11.2015 08:07, Malcolm McLean wrote: > On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote: [...] >> As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier >> discussion on comp.lang.c. >> > The debate isn't entirely over. > Some Indians (Hindu, not red) don't like UTF-8 because Indian characters > are represented by longer sequences, which they see as giving second > status to their culture. And of course UTF-8 arrays don't easily support > random access. And Microsoft has gone the UTF-16 route, as has Java. > > But the consensus is moving to UTF-8. Certainly it's my own view that > the other encoding should be treated as a nuisance, and only converted > to at the last moment to interface with systems that insist on them. Yes, I think UTF-8 is better than all the different encodings that have been out there. But that UTF-16 and UTF-32 have their place, and I see it as natural that they will become mainstream. "Cultural imperialism" is a term someone use; I have a good command of English and can even think about things in English naturally, then again, I don't want to see all aspects of American culture survive (and influence others), for example executing criminals, or "Hollywood justice" as I like to call it. There can be drawn many examples where technical choices which are logical and simple can also have negative effects. -Morten
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-11-29 23:40 -0800 |
| Message-ID | <75850e42-6305-4846-8dcd-0e6c5975dfde@googlegroups.com> |
| In reply to | #77440 |
On Monday, November 30, 2015 at 7:20:07 AM UTC, Morten W. Petersen wrote: > On 30.11.2015 08:07, Malcolm McLean wrote: > > On Monday, November 30, 2015 at 1:34:09 AM UTC, Morten W. Petersen wrote: > [...] > >> As for the rest of the UTF-8 vs 16 and 32 debate, look at the earlier > >> discussion on comp.lang.c. > >> > > The debate isn't entirely over. > > Some Indians (Hindu, not red) don't like UTF-8 because Indian characters > > are represented by longer sequences, which they see as giving second > > status to their culture. And of course UTF-8 arrays don't easily support > > random access. And Microsoft has gone the UTF-16 route, as has Java. > > > > But the consensus is moving to UTF-8. Certainly it's my own view that > > the other encoding should be treated as a nuisance, and only converted > > to at the last moment to interface with systems that insist on them. > > Yes, I think UTF-8 is better than all the different encodings that have > been out there. But that UTF-16 and UTF-32 have their place, and I see > it as natural that they will become mainstream. > > "Cultural imperialism" is a term someone use; I have a good command of > English and can even think about things in English naturally, then > again, I don't want to see all aspects of American culture survive > (and influence others), for example executing criminals, or "Hollywood > justice" as I like to call it. > > There can be drawn many examples where technical choices which are > logical and simple can also have negative effects. > But in fact no poet or novelist is going to write in English rather then Hindi because he can save a bit of money on computer memory. If we were still in the days when 64K was the limit for a big machine and "computer power is measured in kilobytes, the ZX81 comes with one kilobyte" it might be different. But storage and memory is so cheap these days that it's not a consideration. I don't think UTF-16 or UTF-32 have a real place, except that at some point you need to convert from UTF-8 to code points. Two encodings for string data are not twice as good as one.
[toc] | [prev] | [next] | [standalone]
| From | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| Date | 2015-11-30 08:48 +0100 |
| Message-ID | <n3guvo$370$1@speranza.aioe.org> |
| In reply to | #77441 |
On 30.11.2015 08:40, Malcolm McLean wrote: > On Monday, November 30, 2015 at 7:20:07 AM UTC, Morten W. Petersen wrote: [...] >> Yes, I think UTF-8 is better than all the different encodings that have >> been out there. But that UTF-16 and UTF-32 have their place, and I see >> it as natural that they will become mainstream. >> >> "Cultural imperialism" is a term someone use; I have a good command of >> English and can even think about things in English naturally, then >> again, I don't want to see all aspects of American culture survive >> (and influence others), for example executing criminals, or "Hollywood >> justice" as I like to call it. >> >> There can be drawn many examples where technical choices which are >> logical and simple can also have negative effects. >> > But in fact no poet or novelist is going to write in English rather > then Hindi because he can save a bit of money on computer memory. > If we were still in the days when 64K was the limit for a big machine > and "computer power is measured in kilobytes, the ZX81 comes with > one kilobyte" it might be different. But storage and memory is so > cheap these days that it's not a consideration. > > I don't think UTF-16 or UTF-32 have a real place, except that at some > point you need to convert from UTF-8 to code points. Two encodings > for string data are not twice as good as one. Hm yes. Then again, to get one Unicode character from a UTF-8 stream, you first have to read it, check it, and expand it if necessary. To get one Unicode character from a UTF-32 stream, you read 4 bytes and add them up. One is a lot simpler than the other.. I like simple. And given that UTF8 and UTF32 streams are roughly the same size compressed, and compression is cheap and available - doesn't that make UTF-32 a little bit simpler and more politically correct? -Morten
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-11-30 20:52 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc2dh2Fi96mU1@mid.individual.net> |
| In reply to | #77442 |
Morten W. Petersen wrote: > > One is a lot simpler than the other.. I like simple. And given > that UTF8 and UTF32 streams are roughly the same size compressed, > and compression is cheap and available - doesn't that make UTF-32 > a little bit simpler and more politically correct? Does it matter is no one uses it? -- Ian Collins
[toc] | [prev] | [next] | [standalone]
Page 2 of 11 — ← Prev page 1 [2] 3 4 … 11 Next page →
Back to top | Article view | comp.lang.c
csiph-web