Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77357 > unrolled thread
| Started by | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| First post | 2015-11-29 01:06 +0100 |
| Last post | 2015-12-02 09:58 -0800 |
| Articles | 20 on this page of 210 — 25 participants |
Back to article view | Back to comp.lang.c
Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800
Page 3 of 11 — ← Prev page 1 2 [3] 4 5 … 11 Next page →
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-11-30 21:04 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc2e82Fi96mU2@mid.individual.net> |
| In reply to | #77443 |
Ian Collins wrote: > Morten W. Petersen wrote: >> >> One is a lot simpler than the other.. I like simple. And given >> that UTF8 and UTF32 streams are roughly the same size compressed, >> and compression is cheap and available - doesn't that make UTF-32 >> a little bit simpler and more politically correct? > > Does it matter is no one uses it? s/is/if/ -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-11-30 00:34 -0800 |
| Message-ID | <37a35f56-fb1a-4501-bec3-8266b4e0a41b@googlegroups.com> |
| In reply to | #77442 |
On Monday, November 30, 2015 at 7:48:19 AM UTC, Morten W. Petersen wrote: > On 30.11.2015 08:40, Malcolm McLean wrote: > > Hm yes. Then again, to get one Unicode character from a UTF-8 stream, > you first have to read it, check it, and expand it if necessary. > > To get one Unicode character from a UTF-32 stream, you read 4 bytes > and add them up. > Yes, but it's one small routine. Once its written and debugged, that's it. Problem solved. Speed is unlikely to be an issue. > > One is a lot simpler than the other.. I like simple. And given > that UTF8 and UTF32 streams are roughly the same size compressed, > and compression is cheap and available - doesn't that make UTF-32 > a little bit simpler and more politically correct? > The advantage of UTF-8 is that it's backwards compatible, and that on a system where memory for strings is an issue, it's likely that most of those strings are a mixture of English and programming symbols, so the coding takes up one byte per character whilst still offering facilities for occasional extended characters. The other advantage is that, in an effort to be all things to all men, UTF-16 and UTF-32 allowed either byte ordering, which is just a nuisance to readers. The political objections are really without merit. Hindu culture is served, not hindered, by having one standard coding that works, and supports Indian scripts. If internationalisation fails because of too many incompatible unicode standards, then you'll find that English-only systems persist for longer.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-30 03:50 -0600 |
| Message-ID | <n3h61a$v9h$1@dont-email.me> |
| In reply to | #77439 |
On 30-Nov-15 01:07, Malcolm McLean wrote: > Morten W. Petersen wrote: >> On 30.11.2015 02:15, Ian Collins wrote: >> Well, let's say you have some organization that wants to create an >> archive of lots of non-latin history, in XML. >> >> For them, choosing XML is right, and UTF-8 uses 3 bytes on >> characters U+0800 through U+FFFF, but only 2 bytes in UTF-16. >> ... >> As for the rest of the UTF-8 vs 16 and 32 debate, look at the >> earlier discussion on comp.lang.c. > > The debate isn't entirely over. Yes, it is. Even for scripts where UTF-8 results in more bytes than UTF-16, UTF-8 has become the dominant choice by users, and that shift is accelerating, not declining--much less reversing. So, the politicians can complain all they want, but the people they claim to represent clearly disagree with them. > Some Indians (Hindu, not red) don't like UTF-8 because Indian > characters are represented by longer sequences, which they see as > giving second status to their culture. Let's see what all is in that range: U+0800..U+083F Samaritan U+0840..U+085F Mandaic U+08A0..U+08FF Arabic Extended-A U+0900..U+097F Devanagari U+0980..U+09FF Bengali U+0A00..U+0A7F Gurmukhi U+0A80..U+0AFF Gujarati U+0B00..U+0B7F Oriya U+0B80..U+0BFF Tamil U+0C00..U+0C7F Telugu U+0C80..U+0CFF Kannada U+0D00..U+0D7F Malayalam U+0D80..U+0DFF Sinhala U+0E00..U+0E7F Thai U+0E80..U+0EFF Lao U+0F00..U+0FFF Tibetan U+1000..U+109F Myanmar U+10A0..U+10FF Georgian U+1100..U+11FF Hangul Jamo U+1200..U+137F Ethiopic U+1380..U+139F Ethiopic Supplement U+13A0..U+13FF Cherokee U+1400..U+167F Unified Canadian Aboriginal Syllabics U+1680..U+169F Ogham U+16A0..U+16FF Runic U+1700..U+171F Tagalog U+1720..U+173F Hanunoo U+1740..U+175F Buhid U+1760..U+177F Tagbanwa U+1780..U+17FF Khmer U+1800..U+18AF Mongolian U+18B0..U+18FF Unified Canadian Aboriginal Syllabics Extended U+1900..U+194F Limbu U+1950..U+197F Tai Le U+1980..U+19DF New Tai Lue U+19E0..U+19FF Khmer Symbols U+1A00..U+1A1F Buginese U+1A20..U+1AAF Tai Tham U+1AB0..U+1AFF Combining Diacritical Marks Extended U+1B00..U+1B7F Balinese U+1B80..U+1BBF Sundanese U+1BC0..U+1BFF Batak U+1C00..U+1C4F Lepcha U+1C50..U+1C7F Ol Chiki U+1CC0..U+1CCF Sundanese Supplement U+1CD0..U+1CFF Vedic Extensions U+1D00..U+1D7F Phonetic Extensions U+1D80..U+1DBF Phonetic Extensions Supplement U+1DC0..U+1DFF Combining Diacritical Marks Supplement U+1E00..U+1EFF Latin Extended Additional U+1F00..U+1FFF Greek Extended U+2000..U+206F General Punctuation U+2070..U+209F Superscripts and Subscripts U+20A0..U+20CF Currency Symbols U+20D0..U+20FF Combining Diacritical Marks for Symbols U+2100..U+214F Letterlike Symbols U+2150..U+218F Number Forms U+2190..U+21FF Arrows U+2200..U+22FF Mathematical Operators U+2300..U+23FF Miscellaneous Technical U+2400..U+243F Control Pictures U+2440..U+245F Optical Character Recognition U+2460..U+24FF Enclosed Alphanumerics U+2500..U+257F Box Drawing U+2580..U+259F Block Elements U+25A0..U+25FF Geometric Shapes U+2600..U+26FF Miscellaneous Symbols U+2700..U+27BF Dingbats U+27C0..U+27EF Miscellaneous Mathematical Symbols-A U+27F0..U+27FF Supplemental Arrows-A U+2800..U+28FF Braille Patterns U+2900..U+297F Supplemental Arrows-B U+2980..U+29FF Miscellaneous Mathematical Symbols-B U+2A00..U+2AFF Supplemental Mathematical Operators U+2B00..U+2BFF Miscellaneous Symbols and Arrows U+2C00..U+2C5F Glagolitic U+2C60..U+2C7F Latin Extended-C U+2C80..U+2CFF Coptic U+2D00..U+2D2F Georgian Supplement U+2D30..U+2D7F Tifinagh U+2D80..U+2DDF Ethiopic Extended U+2DE0..U+2DFF Cyrillic Extended-A U+2E00..U+2E7F Supplemental Punctuation U+2E80..U+2EFF CJK Radicals Supplement U+2F00..U+2FDF Kangxi Radicals U+2FF0..U+2FFF Ideographic Description Characters U+3000..U+303F CJK Symbols and Punctuation U+3040..U+309F Hiragana U+30A0..U+30FF Katakana U+3100..U+312F Bopomofo U+3130..U+318F Hangul Compatibility Jamo U+3190..U+319F Kanbun U+31A0..U+31BF Bopomofo Extended U+31C0..U+31EF CJK Strokes U+31F0..U+31FF Katakana Phonetic Extensions U+3200..U+32FF Enclosed CJK Letters and Months U+3300..U+33FF CJK Compatibility U+3400..U+4DBF CJK Unified Ideographs Extension A U+4DC0..U+4DFF Yijing Hexagram Symbols U+4E00..U+9FFF CJK Unified Ideographs U+A000..U+A48F Yi Syllables U+A490..U+A4CF Yi Radicals U+A4D0..U+A4FF Lisu U+A500..U+A63F Vai U+A640..U+A69F Cyrillic Extended-B U+A6A0..U+A6FF Bamum U+A700..U+A71F Modifier Tone Letters U+A720..U+A7FF Latin Extended-D U+A800..U+A82F Syloti Nagri U+A830..U+A83F Common Indic Number Forms U+A840..U+A87F Phags-pa U+A880..U+A8DF Saurashtra U+A8E0..U+A8FF Devanagari Extended U+A900..U+A92F Kayah Li U+A930..U+A95F Rejang U+A960..U+A97F Hangul Jamo Extended-A U+A980..U+A9DF Javanese U+A9E0..U+A9FF Myanmar Extended-B U+AA00..U+AA5F Cham U+AA60..U+AA7F Myanmar Extended-A U+AA80..U+AADF Tai Viet U+AAE0..U+AAFF Meetei Mayek Extensions U+AB00..U+AB2F Ethiopic Extended-A U+AB30..U+AB6F Latin Extended-E U+AB70..U+ABFF Cherokee Supplement U+ABC0..U+ABFF Meetei Mayek U+AC00..U+D7AF Hangul Syllables U+D7B0..U+D7FF Hangul Jamo Extended-B U+D800..U+DB7F High Surrogates U+DB80..U+DBFF High Private Use Surrogates U+DC00..U+DFFF Low Surrogates U+E000..U+F8FF Private Use Area U+F900..U+FAFF CJK Compatibility Ideographs U+FB00..U+FB4F Alphabetic Presentation Forms U+FB50..U+FDFF Arabic Presentation Forms-A U+FE00..U+FE0F Variation Selectors U+FE10..U+FE1F Vertical Forms U+FE20..U+FE2F Combining Half Marks U+FE30..U+FE4F CJK Compatibility Forms U+FE50..U+FE6F Small Form Variants U+FE70..U+FEFF Arabic Presentation Forms-B U+FF00..U+FFEF Halfwidth and Fullwidth Forms U+FFF0..U+FFFF Specials Looks like there is a _lot_ more in there than just India. Also, there was never a decision to make the above second-class; the blocks were assigned back in the UCS-2 days (prior to the invention of UTF-8) in the order that encoding for each script was standardized, and nobody complained at the time. If the Unicode Consortium had been aware of the UTF-8 length issue, I'm sure they would have put more common scripts like Devanagari below U+0800 and less common ones like IPA, Armenian, Syriac, Thaana and NKo above U+0800, but _nobody_ knew. > And of course UTF-8 arrays don't easily support random access. The vast majority of code either treats strings as opaque blobs or traverses them sequentially. True random access is extremely rare. Keep in mind that UTF-16 doesn't allow random access either since it's also a variable-length encoding, which many people forget about--and that is a very common cause of UTF-16 bugs. > And Microsoft has gone the UTF-16 route, as has Java. Both went the UCS-2 route and found themselves painted into a corner when additional planes were added. Relabeling their UCS-2 support as UTF-16 support was seen as less painful than switching to either UTF-8 or UTF-32/UCS-4, but that hasn't worked out so well in practice. > But the consensus is moving to UTF-8. Certainly it's my own view > that the other encoding should be treated as a nuisance, and only > converted to at the last moment to interface with systems that > insist on them. Agreed. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-11-30 12:16 +0000 |
| Message-ID | <n3hejb$ujn$1@dont-email.me> |
| In reply to | #77452 |
On 30/11/2015 09:50, Stephen Sprunk wrote: > On 30-Nov-15 01:07, Malcolm McLean wrote: >> And of course UTF-8 arrays don't easily support random access. > > The vast majority of code either treats strings as opaque blobs or > traverses them sequentially. True random access is extremely rare. Not in my code it isn't. (Suppose you implement a language or even a library which allows access to the nth character of a string. You don't have any say in whether the user will always access sequentially or at random. Which is the best string representation?) Even with serial access, it's not so easy to iterate over a string to access each character (or codepoint etc) in turn if using UTF8. Code needs to be UTF8-aware. Some serial access is also from the end of a string. Wide-character strings make sense in a program, resorting to UTF8 for reading, writing, or dealing UTF8 APIs like the POSIX you mentioned. It's also possible to have separate concepts of normal 'string' and a 'serial-string', with the latter being an opaque type that you can only operate on with functions or by treating it as a byte-array. With of course conversions between the two. > Keep in mind that UTF-16 doesn't allow random access either since it's > also a variable-length encoding, which many people forget about--and > that is a very common cause of UTF-16 bugs. You'd choose between 8-bit and 32-bit strings. Except that MS uses 16-bit (not a big deal, just another conversion. I already have to deal with that because most of my strings aren't zero-terminated, but Windows and C API string parameters usually are.) >> And Microsoft has gone the UTF-16 route, as has Java. I suspect that 16-bit strings would be fine in most cases. (A discussion elsewhere was about how it was impossible for a (programming) language to be case-insensitive because there is the odd character in one or two languages which has mismatched lower and upper case versions.) -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-11-30 06:11 -0800 |
| Message-ID | <b3311303-0209-4624-b292-e16f470f1cfb@googlegroups.com> |
| In reply to | #77456 |
On Monday, November 30, 2015 at 12:17:23 PM UTC, Bart wrote: > On 30/11/2015 09:50, Stephen Sprunk wrote: > > On 30-Nov-15 01:07, Malcolm McLean wrote: > > >> And of course UTF-8 arrays don't easily support random access. > > > > The vast majority of code either treats strings as opaque blobs or > > traverses them sequentially. True random access is extremely rare. > > Not in my code it isn't. (Suppose you implement a language or even a > library which allows access to the nth character of a string. You don't > have any say in whether the user will always access sequentially or at > random. Which is the best string representation?) > > Even with serial access, it's not so easy to iterate over a string to > access each character (or codepoint etc) in turn if using UTF8. Code > needs to be UTF8-aware. > > Some serial access is also from the end of a string. > > Wide-character strings make sense in a program, resorting to UTF8 for > reading, writing, or dealing UTF8 APIs like the POSIX you mentioned. > > It's also possible to have separate concepts of normal 'string' and a > 'serial-string', with the latter being an opaque type that you can only > operate on with functions or by treating it as a byte-array. With of > course conversions between the two. > Starting with Java, most language have kept string as immutable. Which is viable as long as strings are short or read-only. UTF-8 is good match for that. You can't have random read access, but you don't need to support random write access. Functions like wildcard matchers need rewriting for UTF-8. It's actually a bit of a dangerous situation as they work on the English test cases programmers can actually read. Simple string don't stand up to heavy use, however. Text editors can't store text in a simple buffer, you need a linked list of lines, even today.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-30 13:23 -0600 |
| Message-ID | <n3i7ik$797$1@dont-email.me> |
| In reply to | #77460 |
On 30-Nov-15 08:11, Malcolm McLean wrote: > Functions like wildcard matchers need rewriting for UTF-8. It's > actually a bit of a dangerous situation as they work on the English > test cases programmers can actually read. Actually, searching for a UTF-8 string within another UTF-8 string is perfectly safe (as long as neither is overlong), even with naïve code designed for ASCII. That was one of the design requirements. That is also true of UTF-16 (and UTF-32) but is _not_ true of certain other (pre-Unicode) encodings. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-30 13:18 -0600 |
| Message-ID | <n3i793$626$1@dont-email.me> |
| In reply to | #77456 |
On 30-Nov-15 06:16, BartC wrote: > On 30/11/2015 09:50, Stephen Sprunk wrote: >> On 30-Nov-15 01:07, Malcolm McLean wrote: >>> And of course UTF-8 arrays don't easily support random access. >> >> The vast majority of code either treats strings as opaque blobs or >> traverses them sequentially. True random access is extremely >> rare. > > Not in my code it isn't. (Suppose you implement a language or even a > library which allows access to the nth character of a string. You > don't have any say in whether the user will always access > sequentially or at random. Which is the best string representation?) Which is "best" depends on how often true random access occurs. Even with UTF-8, finding the Nth char (from either end) isn't difficult, and as long as the strings are of reasonable length, nobody will notice the slight loss of efficiency from an occasional traverse. If fast random access _is_ a requirement, then use UTF-32 in memory; that doesn't mean it's appropriate for a wire/file format, though. > Even with serial access, it's not so easy to iterate over a string > to access each character (or codepoint etc) in turn if using UTF8. > > Some serial access is also from the end of a string. Traversing in either direction is trivial thanks to different encoding of leading vs trailing bytes. You're far more likely to screw up traversal (or indexing) of UTF-16 strings by forgetting surrogates. > Code needs to be UTF8-aware. Only if it actually needs to care about code points, which are _not_ necessarily the same things as "characters". UTF-8 was _designed_ to be transparent to the vast majority of string-handling code. > Wide-character strings make sense in a program, resorting to UTF8 > for reading, writing, or dealing UTF8 APIs like the POSIX you > mentioned. That is a popular option for programs that do unusual types of operations on strings--or need to accommodate other encodings. >> Keep in mind that UTF-16 doesn't allow random access either since >> it's also a variable-length encoding, which many people forget >> about--and that is a very common cause of UTF-16 bugs. > > You'd choose between 8-bit and 32-bit strings. Except that MS uses > 16-bit (not a big deal, just another conversion. I already have to > deal with that because most of my strings aren't zero-terminated, but > Windows and C API string parameters usually are.) If you're on Windows (or Java), then UTF-16 isn't really a choice. >>> And Microsoft has gone the UTF-16 route, as has Java. > > I suspect that 16-bit strings would be fine in most cases. (A > discussion elsewhere was about how it was impossible for a > (programming) language to be case-insensitive because there is the > odd character in one or two languages which has mismatched lower and > upper case versions.) Unicode has all sorts of nasty corners that a program with serious string-handling will have to deal. Some of the more obvious ones are cases like "Σ", whose lower case is either "σ" or "ς" depending on position, and "ß", whose upper case is "ẞ", "SS" or "SZ" depending on context--and is sometimes equal to "ss" (or "ſs") or "sz" (or "ſz") and sometimes not. And then there's the precomposed vs combining characters mess, scripts where letters are used as numerals, languages where the same code points have different collating orders, and endless other insanities. Encoding is the _least_ of your problems. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-11-30 13:23 -0800 |
| Message-ID | <lnh9k3w59i.fsf@kst-u.example.com> |
| In reply to | #77456 |
BartC <bc@freeuk.com> writes:
[...]
>> On 30-Nov-15 01:07, Malcolm McLean wrote:
[...]
>>> And Microsoft has gone the UTF-16 route, as has Java.
>
> I suspect that 16-bit strings would be fine in most cases. (A discussion
> elsewhere was about how it was impossible for a (programming) language
> to be case-insensitive because there is the odd character in one or two
> languages which has mismatched lower and upper case versions.)
The cases in which "16-bit strings would be fine" are those that only
use characters within the BMP (Basic Multilingual Plane). Characters
outside the BMP are exactly why UTF-16 (as opposed to UCS-2) exists.
Why support "most cases" when there are already perfectly good ways to
support all cases?
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-11-30 22:32 +0000 |
| Message-ID | <n3iil7$ld7$1@dont-email.me> |
| In reply to | #77482 |
On 30/11/2015 21:23, Keith Thompson wrote: > BartC <bc@freeuk.com> writes: >> I suspect that 16-bit strings would be fine in most cases. (A discussion >> elsewhere was about how it was impossible for a (programming) language >> to be case-insensitive because there is the odd character in one or two >> languages which has mismatched lower and upper case versions.) > > The cases in which "16-bit strings would be fine" are those that only > use characters within the BMP (Basic Multilingual Plane). > Characters > outside the BMP are exactly why UTF-16 (as opposed to UCS-2) exists. But how common are they, exactly? I understand that /fully/ supporting Unicode is full of problems even using UTF32. I think not being able to randomly index strings containing ancient Etruscan text, would be one of the more minor ones. Meanwhile I still occasionally come across problems with the representation of £ or €; maybe they should fix those first before we worry about ancient scripts or rare Chinese ideograms. > Why support "most cases" when there are already perfectly good ways to > support all cases? "most" is likely to be 100% of the examples I'm going to come across in my lifetime. Anyway I didn't say use 16-bits; I recommended 32-bits. I'm just saying that if someone does use 16-bits for some throwaway, or personal or informal software, I'd be surprised if they came across any of these rare characters. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-11-30 15:10 -0800 |
| Message-ID | <lnzixvulqc.fsf@kst-u.example.com> |
| In reply to | #77486 |
BartC <bc@freeuk.com> writes:
> On 30/11/2015 21:23, Keith Thompson wrote:
>> BartC <bc@freeuk.com> writes:
>
>>> I suspect that 16-bit strings would be fine in most cases. (A discussion
>>> elsewhere was about how it was impossible for a (programming) language
>>> to be case-insensitive because there is the odd character in one or two
>>> languages which has mismatched lower and upper case versions.)
>>
>> The cases in which "16-bit strings would be fine" are those that only
>> use characters within the BMP (Basic Multilingual Plane).
>
>> Characters
>> outside the BMP are exactly why UTF-16 (as opposed to UCS-2) exists.
>
> But how common are they, exactly? I understand that /fully/ supporting
> Unicode is full of problems even using UTF32.
>
> I think not being able to randomly index strings containing ancient
> Etruscan text, would be one of the more minor ones.
If ancient Etruscan were the only language affected, you'd have a valid
point.
Supporing only characters within the BMP presents the same problems as
supporting only 7-bit ASCII, or 8-bit Latin-1 (or Latin-N for any of
several values of N). It just delays those problems a bit longer.
> Meanwhile I still occasionally come across problems with the
> representation of £ or €; maybe they should fix those first before we
> worry about ancient scripts or rare Chinese ideograms.
I agree that such problems should be fixed. I'm guessing that \243 and
\200 are the Windows-1252 representations of the UK pound sign and the
Euro sign, respectively. Avoiding Windows-1252 would be at least a
partial solution to that.
>> Why support "most cases" when there are already perfectly good ways to
>> support all cases?
>
> "most" is likely to be 100% of the examples I'm going to come across in
> my lifetime.
>
> Anyway I didn't say use 16-bits; I recommended 32-bits. I'm just saying
> that if someone does use 16-bits for some throwaway, or personal or
> informal software, I'd be surprised if they came across any of these
> rare characters.
Supporting characters that don't fit in 8 bits is hard. Restricting
support to characters that fit in 16 bits doesn't make it that much
easier.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-11-30 21:05 -0600 |
| Message-ID | <n3j2ke$3do$1@dont-email.me> |
| In reply to | #77486 |
On 30-Nov-15 16:32, BartC wrote: > On 30/11/2015 21:23, Keith Thompson wrote: >> BartC <bc@freeuk.com> writes: >>> I suspect that 16-bit strings would be fine in most cases. (A >>> discussion elsewhere was about how it was impossible for a >>> (programming) language to be case-insensitive because there is >>> the odd character in one or two languages which has mismatched >>> lower and upper case versions.) >> >> The cases in which "16-bit strings would be fine" are those that >> only use characters within the BMP (Basic Multilingual Plane). >> >> Characters outside the BMP are exactly why UTF-16 (as opposed to >> UCS-2) exists. > > But how common are they, exactly? That depends on who you are and what you're doing. Some people deal with non-BMP characters constantly, others only occasionally, but almost nobody _never_ encounters them anymore. That's part of what makes broken UTF-16 code so increasingly painful to deal with. > I understand that /fully/ supporting Unicode is full of problems > even using UTF32. Indeed; encoding is honestly the least of your problems, so just use UTF-8 like everyone else and move on to the _hard_ stuff. > Meanwhile I still occasionally come across problems with the > representation of £ or €; maybe they should fix those first before > we worry about ancient scripts or rare Chinese ideograms. If you're getting mojibake or replacement characters, that is usually due to folks using some ancient encoding rather than something modern and sensible, e.g. UTF-8. >> Why support "most cases" when there are already perfectly good ways >> to support all cases? > > "most" is likely to be 100% of the examples I'm going to come across > in my lifetime. Take a look again at the non-BMP Unicode blocks before you convince yourself that you'll never see _any_ of them. I get at least a dozen a day just counting emojis in text messages, and I correspond daily with coworkers in China whose very _names_ use non-BMP characters. > Anyway I didn't say use 16-bits; I recommended 32-bits. I'm just > saying that if someone does use 16-bits for some throwaway, or > personal or informal software, I'd be surprised if they came across > any of these rare characters. For personal throwaway code, sure. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-01 12:38 +0000 |
| Message-ID | <n3k48m$9j6$1@dont-email.me> |
| In reply to | #77490 |
On 01/12/2015 03:05, Stephen Sprunk wrote:
> On 30-Nov-15 16:32, BartC wrote:
>> I understand that /fully/ supporting Unicode is full of problems
>> even using UTF32.
>
> Indeed; encoding is honestly the least of your problems, so just use
> UTF-8 like everyone else and move on to the _hard_ stuff.
>> Meanwhile I still occasionally come across problems with the
>> representation of £ or €; maybe they should fix those first before
>> we worry about ancient scripts or rare Chinese ideograms.
>
> If you're getting mojibake or replacement characters, that is usually
> due to folks using some ancient encoding rather than something modern
> and sensible, e.g. UTF-8.
This is a typical problem I would get (source code was UTF8):
#include <stdio.h>
#include <string.h>
int main(void) {
char s[]="£100 = €140";
unsigned char c;
int i;
printf("%s\n",s);
for (i=0; i<strlen(s); ++i){
c = s[i];
printf("%2d: %03d %02X <%c>\n",i,c,c,c);
}
}
I want to print the individual characters in the string. Compiled with
gcc, I get (using Windows console set to code page 65001):
£100 = €140
0: 194 C2 <�>
1: 163 A3 <�>
2: 049 31 <1>
3: 048 30 <0>
4: 048 30 <0>
5: 032 20 < >
6: 061 3D <=>
7: 032 20 < >
8: 226 E2 <�>
9: 130 82 <�>
10: 172 AC <�>
11: 049 31 <1>
12: 052 34 <4>
13: 048 30 <0>
I get 13 'characters' output instead of the 11 I expect. The £ and €
characters are replaced by sequences of those funny black diamonds (you
might see some other error character).
Two C compilers even print the first line as:
��100 = ���140
(One or two had a problem with the UTF8 BOM, which I had to remove.)
This is basic stuff. And I'm doing serial access on the string, not random.
So much for the majority of programs being able to work unchanged with
UTF8! I'd need to start going into the multi-byte and wide char stuff.
On Windows, that means UCS2 or UTF16 or whatever it is now, which
apparently isn't good enough either.
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2015-12-01 14:43 +0000 |
| Message-ID | <87a8pudybv.fsf@bsb.me.uk> |
| In reply to | #77502 |
BartC <bc@freeuk.com> writes:
> On 01/12/2015 03:05, Stephen Sprunk wrote:
>> On 30-Nov-15 16:32, BartC wrote:
>
>>> I understand that /fully/ supporting Unicode is full of problems
>>> even using UTF32.
>>
>> Indeed; encoding is honestly the least of your problems, so just use
>> UTF-8 like everyone else and move on to the _hard_ stuff.
>
>>> Meanwhile I still occasionally come across problems with the
>>> representation of £ or €; maybe they should fix those first before
>>> we worry about ancient scripts or rare Chinese ideograms.
>>
>> If you're getting mojibake or replacement characters, that is usually
>> due to folks using some ancient encoding rather than something modern
>> and sensible, e.g. UTF-8.
>
> This is a typical problem I would get (source code was UTF8):
>
> #include <stdio.h>
> #include <string.h>
>
> int main(void) {
> char s[]="£100 = €140";
> unsigned char c;
> int i;
>
> printf("%s\n",s);
>
> for (i=0; i<strlen(s); ++i){
> c = s[i];
> printf("%2d: %03d %02X <%c>\n",i,c,c,c);
> }
> }
>
> I want to print the individual characters in the string. Compiled with
> gcc, I get (using Windows console set to code page 65001):
>
> £100 = €140
> 0: 194 C2 <�>
> 1: 163 A3 <�>
> 2: 049 31 <1>
> 3: 048 30 <0>
> 4: 048 30 <0>
> 5: 032 20 < >
> 6: 061 3D <=>
> 7: 032 20 < >
> 8: 226 E2 <�>
> 9: 130 82 <�>
> 10: 172 AC <�>
> 11: 049 31 <1>
> 12: 052 34 <4>
> 13: 048 30 <0>
>
> I get 13 'characters' output instead of the 11 I expect. The £ and €
> characters are replaced by sequences of those funny black diamonds
> (you might see some other error character).
Nothing wrong there. Your expectation is a little bit off, but
everything seems to be working. It's a bit of a coincidence that it
works, but UTF-8 often does "just work" like this. To make it work by
design you need to tell the compiler that the string is UTF-8 (u8"£100 =
€140") and you might need to set a suitable locale for the output.
> Two C compilers even print the first line as:
>
> ��100 = ���140
They probably don't understand UTF-8 source. If you have the luxury of
a C11 compiler you can us u8"...". If you have C99 you can use
universal character names (e.g. \u00a3 for the pound sign).
> (One or two had a problem with the UTF8 BOM, which I had to remove.)
There is no such thing. UTF-8 has no byte order issues, so that
character is taken to be what it really is: a zero width no-break space.
C it permitted to reject a file with zero width no-break space in it and
it's not even obliged to take UTF-8 source.
> This is basic stuff. And I'm doing serial access on the string, not
> random.
You are in danger of complicating this for other people. Everything is
working correctly in your code (with gcc at least) but it does not match
your expectation. If you wanted your non-ASCII characters to be single
bytes you can pick any of the dozens of alternative encodings and keep
your fingers crossed that everyone else will know which one you chose.
The growing popularity of Unicode and UTF-8 is making that old "guess
the character I was thinking of" a thing of the past. Don't keep it
going!
> So much for the majority of programs being able to work unchanged with
> UTF8! I'd need to start going into the multi-byte and wide char
> stuff. On Windows, that means UCS2 or UTF16 or whatever it is now,
> which apparently isn't good enough either.
You've shown one program that appears to be working. But even it's not,
that tells me nothing about the majority of programs.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-01 12:09 -0800 |
| Message-ID | <21078681-254f-4af6-8f17-9e967a409f28@googlegroups.com> |
| In reply to | #77513 |
On Tuesday, December 1, 2015 at 2:43:14 PM UTC, Ben Bacarisse wrote: > BartC <bc@freeuk.com> writes: > > You've shown one program that appears to be working. But even it's not, > that tells me nothing about the majority of programs. > If you write C source in UTF-8, with some of the string literals and identifier containing extended characters, doe s it still work? As far as the identifiers go, it's hit and miss. It depends on the exact code used for determining a valid identifier. As far as the string literals go, it depends on the low-level interface to printf(). If UTF-8 is accepted, the program will work. But most likely it isn't, and printf will produce odd characters. So it's not quite true that unless a program works directly with glyphs, if it is UTF-8 naive, it should work correctly. The average C compiler is an exception.
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-12-02 09:14 +1300 |
| Message-ID | <dc6ddnFi96mU3@mid.individual.net> |
| In reply to | #77557 |
Malcolm McLean wrote: > On Tuesday, December 1, 2015 at 2:43:14 PM UTC, Ben Bacarisse wrote: >> BartC <bc@freeuk.com> writes: >> >> You've shown one program that appears to be working. But even it's not, >> that tells me nothing about the majority of programs. >> > If you write C source in UTF-8, with some of the string literals > and identifier containing extended characters, doe s it still work? > > As far as the identifiers go, it's hit and miss. It depends on the > exact code used for determining a valid identifier. As far as the > string literals go, it depends on the low-level interface to > printf(). If UTF-8 is accepted, the program will work. But most > likely it isn't, and printf will produce odd characters. Does it? All printf is doing is sending a bunch of bytes to the console. The interpretation of those bytes is handled by the console. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-01 12:27 -0800 |
| Message-ID | <f990da1c-ab87-488e-b0d3-9ac21750c20a@googlegroups.com> |
| In reply to | #77558 |
On Tuesday, December 1, 2015 at 8:15:05 PM UTC, Ian Collins wrote: > Malcolm McLean wrote: > As far as the > > string literals go, it depends on the low-level interface to > > printf(). If UTF-8 is accepted, the program will work. But most > > likely it isn't, and printf will produce odd characters. > > Does it? All printf is doing is sending a bunch of bytes to the > console. The interpretation of those bytes is handled by the console. > I'd guess that the Windows DOS box (= console) accepts UTF-16 characters but not UTF-8, and that somewhere in the printf implementation there's a little routine that pads an ascii character to 16 bits. So it goes wrong if printf is fed UTF-8, but a change would be trivial, as long as you stick to the subset of unicode that can be encoded in single 16 bit code points.
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-12-02 10:14 +1300 |
| Message-ID | <dc6guiFi96mU4@mid.individual.net> |
| In reply to | #77561 |
Malcolm McLean wrote: > On Tuesday, December 1, 2015 at 8:15:05 PM UTC, Ian Collins wrote: >> Malcolm McLean wrote: >> As far as the >>> string literals go, it depends on the low-level interface to >>> printf(). If UTF-8 is accepted, the program will work. But most >>> likely it isn't, and printf will produce odd characters. >> >> Does it? All printf is doing is sending a bunch of bytes to the >> console. The interpretation of those bytes is handled by the console. >> > I'd guess that the Windows DOS box (= console) accepts UTF-16 characters > but not UTF-8, and that somewhere in the printf implementation there's > a little routine that pads an ascii character to 16 bits. So it goes > wrong if printf is fed UTF-8, but a change would be trivial, as long > as you stick to the subset of unicode that can be encoded in single > 16 bit code points. Any conversion is probably somewhere other than in the printf implementation, somewhere in the output driver most likely. Consider what happens when fprintf is substituted for printf. What would be the output in a DOS box from char s[]="£100 = €140"; printf( "%d\n", fprintf( stdout,"%s\n",s ) ); -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-01 18:01 -0600 |
| Message-ID | <n3lc7p$gmd$1@dont-email.me> |
| In reply to | #77561 |
On 01-Dec-15 14:27, Malcolm McLean wrote: > On Tuesday, December 1, 2015 at 8:15:05 PM UTC, Ian Collins wrote: >> Malcolm McLean wrote: As far as the >>> string literals go, it depends on the low-level interface to >>> printf(). If UTF-8 is accepted, the program will work. But most >>> likely it isn't, and printf will produce odd characters. >> >> Does it? All printf is doing is sending a bunch of bytes to the >> console. The interpretation of those bytes is handled by the >> console. > > I'd guess that the Windows DOS box (= console) accepts UTF-16 > characters but not UTF-8, and that somewhere in the printf > implementation there's a little routine that pads an ascii character > to 16 bits. So it goes wrong if printf is fed UTF-8, but a change > would be trivial, as long as you stick to the subset of unicode that > can be encoded in single 16 bit code points. Windows programs can write to the console with either WriteConsoleW(), which takes a UTF-16LE string, or WriteConsoleA(), which translates the bytes to a UTF-16LE string according to the current code page and then (in effect, if not in fact) passes it to WriteConsoleW(). Note that printf() et al (eventually) call WriteConsoleA(). Console programs can use SetConsoleInputCP() and SetConsoleOutputCP() to select any supported code page, including UTF-8 (65001), and users can do the same with the "chcp" command. Unfortunately, Windows does _not_ allow setting UTF-8 as the default code page, so you have to do this every time you open a new console window. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-01 20:41 +0000 |
| Message-ID | <n3l0hk$u87$1@dont-email.me> |
| In reply to | #77558 |
On 01/12/2015 20:14, Ian Collins wrote:
> Malcolm McLean wrote:
>> On Tuesday, December 1, 2015 at 2:43:14 PM UTC, Ben Bacarisse wrote:
>>> BartC <bc@freeuk.com> writes:
>>>
>>> You've shown one program that appears to be working. But even it's not,
>>> that tells me nothing about the majority of programs.
>>>
>> If you write C source in UTF-8, with some of the string literals
>> and identifier containing extended characters, doe s it still work?
>>
>> As far as the identifiers go, it's hit and miss. It depends on the
>> exact code used for determining a valid identifier. As far as the
>> string literals go, it depends on the low-level interface to
>> printf(). If UTF-8 is accepted, the program will work. But most
>> likely it isn't, and printf will produce odd characters.
>
> Does it? All printf is doing is sending a bunch of bytes to the
> console. The interpretation of those bytes is handled by the console.
If I run this code, where it prints the first 4 'somethings' of the string:
printf("%.4s","£100pw");
Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
So does that 4 represent bytes or characters?
The specs for printf on MSDN say printf returns the number of characters
printed, while the C standard says it's the number of characters
transmitted.
But here it returns 4 for an output of "£10", clearly not 4 characters.
So it's all a bit of a mess.
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-01 12:53 -0800 |
| Message-ID | <lnpoypubyy.fsf@kst-u.example.com> |
| In reply to | #77567 |
BartC <bc@freeuk.com> writes:
[...]
> If I run this code, where it prints the first 4 'somethings' of the string:
>
> printf("%.4s","£100pw");
>
> Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
The pound sign in your article is printed in my newsreader (actually in
GNU Emacs) as \243. Your article headers include:
Content-Type: text/plain; charset=windows-1252; format=flowed
Apparently my system (I'm using Linux) isn't configured to understand
windows-1252, so it falls back to displaying the character in octal.
I see you're using Thunderbird on Windows. Is there any way you can
configure it to post using UTF-8?
Anyway ...
> So does that 4 represent bytes or characters?
>
> The specs for printf on MSDN say printf returns the number of characters
> printed, while the C standard says it's the number of characters
> transmitted.
>
> But here it returns 4 for an output of "£10", clearly not 4 characters.
> So it's all a bit of a mess.
The C standard says that printf returns "the number of characters
transmitted, or a negative value if an output or encoding error
occurred".
It appears to be using the word "character" in the sense defined in
3.7.1:
character
single-byte character
<C> bit representation that fits in a byte
as opposed to 3.7:
character
<abstract> member of a set of elements used for the organization,
control, or representation of data
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
Page 3 of 11 — ← Prev page 1 2 [3] 4 5 … 11 Next page →
Back to top | Article view | comp.lang.c
csiph-web