Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77357 > unrolled thread
| Started by | "Morten W. Petersen" <morphex@gmail.com> |
|---|---|
| First post | 2015-11-29 01:06 +0100 |
| Last post | 2015-12-02 09:58 -0800 |
| Articles | 20 on this page of 210 — 25 participants |
Back to article view | Back to comp.lang.c
Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 01:06 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 02:01 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 03:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 00:09 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-11-29 00:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-11-29 14:31 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-11-29 23:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:41 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-29 08:28 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-29 16:30 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-28 23:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 02:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 00:30 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 01:33 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 13:54 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:03 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:15 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 02:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 14:42 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:16 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 20:20 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 04:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 17:09 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 19:44 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:36 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 07:39 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-01 09:17 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:40 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:34 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:03 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:20 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-29 23:40 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 08:48 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 20:52 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-11-30 21:04 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 00:34 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 03:50 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 12:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 06:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:23 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 13:18 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 13:23 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-11-30 22:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 15:10 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-30 21:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-01 14:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:09 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 09:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 12:27 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-02 10:14 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:01 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 12:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-01 13:55 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:30 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-01 18:46 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 14:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 23:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Say, what? <<nothing@nowhere.nohow>> - 2015-12-01 17:13 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 09:08 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-01 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 17:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 00:17 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-01 16:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Martin Shobe <martin.shobe@yahoo.com> - 2015-12-01 21:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 09:37 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. James Kuyper <jameskuyper@verizon.net> - 2015-12-02 10:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 17:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:22 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:32 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 21:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 10:36 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 17:55 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-02 17:04 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 01:11 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:19 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 23:16 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 00:54 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 04:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:31 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Eric Sosman <esosman@comcast-dot-net.invalid> - 2015-12-03 13:59 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 19:45 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 14:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 22:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 12:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 12:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 13:19 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:54 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:50 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:26 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:19 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:25 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 15:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 16:54 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 18:53 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-03 19:00 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 14:07 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:41 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-05 16:09 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Steve Thompson <stevet810@gmail.com> - 2015-12-05 21:15 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-06 12:35 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-03 09:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 19:12 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 16:58 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 15:47 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 14:51 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-03 16:50 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:55 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:56 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:24 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-04 08:49 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 07:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:27 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-03 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 10:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:21 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 16:42 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-04 11:15 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-08 01:57 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. David Brown <david.brown@hesbynett.no> - 2015-12-08 09:08 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:58 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 11:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-04 11:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:24 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 09:30 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 15:52 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 09:07 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 09:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 10:56 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:04 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-04 21:32 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 13:38 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 16:13 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 16:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Geoff <geoff@invalid.invalid> - 2015-12-04 19:16 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-04 21:19 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:44 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 09:01 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:34 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-06 18:32 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 10:43 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 10:02 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:53 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-05 09:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-05 18:36 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 12:26 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 11:36 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:42 +0530
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 03:59 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:17 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. supercat@casperkitty.com - 2015-12-07 07:33 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-03 03:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 00:58 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:34 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-03 11:38 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 14:09 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:10 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 08:28 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 21:33 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Heathfield <rjh@cpax.org.uk> - 2015-12-02 21:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 16:05 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 14:12 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 22:47 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 14:00 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 01:38 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:20 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. raltbos@xs4all.nl (Richard Bos) - 2015-12-04 10:40 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Nobody <nobody@nowhere.invalid> - 2015-12-03 02:42 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-01 20:48 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 12:08 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 04:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:05 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-12-04 01:31 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 14:23 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:00 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 16:49 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:50 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-02 20:02 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 12:31 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 01:43 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-12-02 09:21 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Richard Damon <Richard@Damon-Family.org> - 2015-12-02 07:29 -0500
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 05:47 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. BartC <bc@freeuk.com> - 2015-12-02 14:16 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:56 +1300
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:49 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Philip Lantz <prl@canterey.us> - 2015-12-02 22:11 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:06 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-11-30 22:14 +0000
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Stephen Sprunk <stephen@sprunk.org> - 2015-11-29 23:03 -0600
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-30 06:26 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Keith Thompson <kst-u@mib.org> - 2015-11-30 00:39 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-11-30 01:57 -0800
Re: Working efficiently with 32-bit Unicode output streams, locale etc. "Morten W. Petersen" <morphex@gmail.com> - 2015-11-29 15:32 +0100
Re: Working efficiently with 32-bit Unicode output streams, locale etc. fir <profesor.fir@gmail.com> - 2015-12-02 09:58 -0800
Page 4 of 11 — ← Prev page 1 2 3 [4] 5 6 … 11 Next page →
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-01 21:32 +0000 |
| Message-ID | <n3l3hs$bc1$1@dont-email.me> |
| In reply to | #77569 |
On 01/12/2015 20:53, Keith Thompson wrote:
> BartC <bc@freeuk.com> writes:
> [...]
>> If I run this code, where it prints the first 4 'somethings' of the string:
>>
>> printf("%.4s","£100pw");
>>
>> Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
>
> The pound sign in your article is printed in my newsreader (actually in
> GNU Emacs) as \243. Your article headers include:
>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Apparently my system (I'm using Linux) isn't configured to understand
> windows-1252, so it falls back to displaying the character in octal.
>
> I see you're using Thunderbird on Windows. Is there any way you can
> configure it to post using UTF-8?
I don't know how to do that. But if I switch 'Character encoding' from
Western to Unicode, then all my £ signs above turn into little black
diamonds!
But I'm sure I've never had trouble sending £ symbols and people being
able to read them at the other end. I've viewed my post above via
googlegroups on both Windows and Ubuntu, using Thunderbird in each case,
and the £s are visible.
(Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal
in both Unicode and Windows-1252.)
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-01 13:55 -0800 |
| Message-ID | <lnd1upu93u.fsf@kst-u.example.com> |
| In reply to | #77575 |
BartC <bc@freeuk.com> writes:
> On 01/12/2015 20:53, Keith Thompson wrote:
>> BartC <bc@freeuk.com> writes:
>> [...]
>>> If I run this code, where it prints the first 4 'somethings' of the string:
>>>
>>> printf("%.4s","£100pw");
>>>
>>> Then it outputs "£10" in UTF8, not "£100". £90 is a big difference!
>>
>> The pound sign in your article is printed in my newsreader (actually in
>> GNU Emacs) as \243. Your article headers include:
>>
>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>> Apparently my system (I'm using Linux) isn't configured to understand
>> windows-1252, so it falls back to displaying the character in octal.
>>
>> I see you're using Thunderbird on Windows. Is there any way you can
>> configure it to post using UTF-8?
>
> I don't know how to do that. But if I switch 'Character encoding' from
> Western to Unicode, then all my £ signs above turn into little black
> diamonds!
>
> But I'm sure I've never had trouble sending £ symbols and people being
> able to read them at the other end. I've viewed my post above via
> googlegroups on both Windows and Ubuntu, using Thunderbird in each
> case, and the £s are visible.
>
> (Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal
> in both Unicode and Windows-1252.)
And now the pound signs are showing up correctly for me. Odd.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | raltbos@xs4all.nl (Richard Bos) |
|---|---|
| Date | 2015-12-04 10:30 +0000 |
| Message-ID | <56616a63.975250@news.xs4all.nl> |
| In reply to | #77576 |
Keith Thompson <kst-u@mib.org> wrote: > BartC <bc@freeuk.com> writes: > > (Apart from which. the code for £ is 243 octal, A3 hex and 163 decimal > > in both Unicode and Windows-1252.) > > And now the pound signs are showing up correctly for me. Odd. Not odd at all. Usenet is _still_, in its essentials, a 7-bit medium, despite what blinkered advocates on either side of the Windows-Linux divide want it to default to handling. Well, boo-hoo to them: by default, and when things go wrong - which they do when you try to force the matter in two contradictory ways - Usenet defaults to ASCII. Let them learn to deal with that, and leave the rest of us to view it in peace (and ASCII). Richard
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-01 18:46 -0600 |
| Message-ID | <n3letg$pjd$1@dont-email.me> |
| In reply to | #77575 |
On 01-Dec-15 15:32, BartC wrote: > On 01/12/2015 20:53, Keith Thompson wrote: >> I see you're using Thunderbird on Windows. Is there any way you >> can configure it to post using UTF-8? > > I don't know how to do that. But if I switch 'Character encoding' > from Western to Unicode, then all my £ signs above turn into little > black diamonds! There are two very different "Character encoding" menu settings; the one for reading windows/panes forces Thunderbird to reinterpret the existing bytes as different characters, which often results in mojibake, while the one for writing windows tells it to use different bytes to represent the same characters. It sounds like you used the former, not the latter. > But I'm sure I've never had trouble sending £ symbols and people > being able to read them at the other end. I've viewed my post above > via googlegroups on both Windows and Ubuntu, using Thunderbird in > each case, and the £s are visible. That works as long as the reader understands the encoding specified in the headers--or can guess the correct one if unspecified. Guessing has obvious practical limitations, but UTF-8 is virtually unmistakeable, unlike other encodings, which is yet another plus. > (Apart from which. the code for £ is 243 octal, A3 hex and 163 > decimal in both Unicode and Windows-1252.) Unicode is not an encoding. "£" (U+00A3) is encoded as 0xC2 0xA3 in UTF-8, 0xA3 0x00 in UTF-16LE, 0x00 0xA3 in UTF-16BE, etc. I don't have a Windows-1252 table handy, but it's almost the same as ISO-8859-1, so it's not surprising that "£" is 0xA3 in both, and Unicode's first 256 code points match ISO-8859-1 by design. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Say, what? <<nothing@nowhere.nohow>> |
|---|---|
| Date | 2015-12-01 14:07 -0800 |
| Message-ID | <2015120114070915576-nothing@nowhere.nohow> |
| In reply to | #77569 |
I think this is posted in UTF-8.
#include <stdio.h>
#include <string.h>
int main(void) {
char s[]="£100 = €140";
unsigned char c;
int i,r;
r = printf("%s\n",s);
printf("printf returned %i\n",r);
printf("String length: %lu\n", strlen(s));
for (i = 0; i < strlen(s); ++i){
c = s[i];
printf("%2d: %03d %02X <%c>\n",i,c,c,c);
}
}
Output:
£100 = €140
printf returned 15
String length: 14
0: 194 C2 <\302>
1: 163 A3 <\243>
2: 049 31 <1>
3: 048 30 <0>
4: 048 30 <0>
5: 032 20 < >
6: 061 3D <=>
7: 032 20 < >
8: 226 E2 <\342>
9: 130 82 <\202>
10: 172 AC <\254>
11: 049 31 <1>
12: 052 34 <4>
13: 048 30 <0>
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-01 23:54 +0000 |
| Message-ID | <n3lbr1$f8j$1@dont-email.me> |
| In reply to | #77577 |
On 01/12/2015 22:07, Say wrote:
> I think this is posted in UTF-8.
> char s[]="£100 = €140";
> unsigned char c;
> int i,r;
>
> r = printf("%s\n",s);
> printf("printf returned %i\n",r);
> printf("String length: %lu\n", strlen(s));
> Output:
>
> £100 = €140
> printf returned 15
> String length: 14
Don't forget the \n in the first printf. Without \n, it will return 14,
the same as the strlen.
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Say, what? <<nothing@nowhere.nohow>> |
|---|---|
| Date | 2015-12-01 17:13 -0800 |
| Message-ID | <2015120117131851364-nothing@nowhere.nohow> |
| In reply to | #77579 |
On 2015-12-01 23:54:20 +0000, BartC said:
> On 01/12/2015 22:07, Say wrote:
>> I think this is posted in UTF-8.
>
>> char s[]="£100 = €140";
>> unsigned char c;
>> int i,r;
>>
>> r = printf("%s\n",s);
>> printf("printf returned %i\n",r);
>> printf("String length: %lu\n", strlen(s));
>
>> Output:
>>
>> £100 = €140
>> printf returned 15
>> String length: 14
>
> Don't forget the \n in the first printf. Without \n, it will return 14,
> the same as the strlen.
The 15 vs. 14 might be startling until you remember the printf counts
everything sent up to the '\0', this includes anything resulting from
the format string.
The output was produced on OS X 10.10.5 using UTF-8 source and native C
build by Xcode 7.1.1 (clang). Absolutely effortless to produce in this
context. No messy code pages.
I inspected the source with 0xED to be sure it was encoding the source
code text as UTF-8. But notice the printf returned 15 -- 14 octets plus
the '\n', not 11 chars you stated you expected from inspection of the s
array.
[toc] | [prev] | [next] | [standalone]
| From | Martin Shobe <martin.shobe@yahoo.com> |
|---|---|
| Date | 2015-12-01 09:08 -0600 |
| Message-ID | <n3kd1c$cnt$1@dont-email.me> |
| In reply to | #77502 |
On 12/1/2015 6:38 AM, BartC wrote:
> On 01/12/2015 03:05, Stephen Sprunk wrote:
>> On 30-Nov-15 16:32, BartC wrote:
>
>>> I understand that /fully/ supporting Unicode is full of problems
>>> even using UTF32.
>>
>> Indeed; encoding is honestly the least of your problems, so just use
>> UTF-8 like everyone else and move on to the _hard_ stuff.
>
>>> Meanwhile I still occasionally come across problems with the
>>> representation of £ or €; maybe they should fix those first before
>>> we worry about ancient scripts or rare Chinese ideograms.
>>
>> If you're getting mojibake or replacement characters, that is usually
>> due to folks using some ancient encoding rather than something modern
>> and sensible, e.g. UTF-8.
>
> This is a typical problem I would get (source code was UTF8):
>
> #include <stdio.h>
> #include <string.h>
>
> int main(void) {
> char s[]="£100 = €140";
> unsigned char c;
> int i;
>
> printf("%s\n",s);
>
> for (i=0; i<strlen(s); ++i){
> c = s[i];
> printf("%2d: %03d %02X <%c>\n",i,c,c,c);
> }
> }
>
> I want to print the individual characters in the string. Compiled with
> gcc, I get (using Windows console set to code page 65001):
>
> £100 = €140
> 0: 194 C2 <�>
> 1: 163 A3 <�>
> 2: 049 31 <1>
> 3: 048 30 <0>
> 4: 048 30 <0>
> 5: 032 20 < >
> 6: 061 3D <=>
> 7: 032 20 < >
> 8: 226 E2 <�>
> 9: 130 82 <�>
> 10: 172 AC <�>
> 11: 049 31 <1>
> 12: 052 34 <4>
> 13: 048 30 <0>
>
> I get 13 'characters' output instead of the 11 I expect. The £ and €
> characters are replaced by sequences of those funny black diamonds (you
> might see some other error character).
Why did you expect 11? In your loop, you aren't printing code points,
but octets and there are 13 of those.
Martin Shobe
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-01 20:02 +0000 |
| Message-ID | <n3ku7i$kjd$1@dont-email.me> |
| In reply to | #77517 |
On 01/12/2015 15:08, Martin Shobe wrote:
> On 12/1/2015 6:38 AM, BartC wrote:
>> char s[]="£100 = €140";
>> for (i=0; i<strlen(s); ++i){
>> c = s[i];
>> I want to print the individual characters in the string. Compiled with
>> gcc, I get (using Windows console set to code page 65001):
>>
>> £100 = €140
>> 0: 194 C2 <�>
>> 1: 163 A3 <�>
>> 2: 049 31 <1>
>> 3: 048 30 <0>
>> 4: 048 30 <0>
>> 5: 032 20 < >
>> 6: 061 3D <=>
>> 7: 032 20 < >
>> 8: 226 E2 <�>
>> 9: 130 82 <�>
>> 10: 172 AC <�>
>> 11: 049 31 <1>
>> 12: 052 34 <4>
>> 13: 048 30 <0>
>>
>> I get 13 'characters' output instead of the 11 I expect. The £ and €
>> characters are replaced by sequences of those funny black diamonds (you
>> might see some other error character).
>
> Why did you expect 11?
Because there are 11 characters in "£100 = €140", not 13 (or 14 actually).
In your loop, you aren't printing code points,
> but octets and there are 13 of those.
This is the problem I have with people saying that UTF8 can be be used
transparently.
With an 8-bit coding (eg. ASCII plus 128 selected characters), the bytes
in the data and the characters or code-points they represent have a 1:1
correspondence. (The difference between character and code-point is
something I would have to go and look up.)
Any code that makes that assumption can risk programs not working as
expected.
With 16-bit or 32-bit strings, I would expect output more like the
following:
0: 163 00A3 <£>
1: 49 0031 <1>
2: 48 0030 <0>
3: 48 0030 <0>
4: 32 0020 < >
5: 61 003D <=>
6: 32 0020 < >
7: 8364 20EC <€>
8: 49 0031 <1>
9: 52 0034 <4>
10: 48 0030 <0>
So, how would you, given the same "£100 = €140" UTF8 string, write the C
code to enumerate all the characters or code-points rather than the bytes?
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Martin Shobe <martin.shobe@yahoo.com> |
|---|---|
| Date | 2015-12-01 17:03 -0600 |
| Message-ID | <n3l8s5$4v5$1@dont-email.me> |
| In reply to | #77556 |
On 12/1/2015 2:02 PM, BartC wrote:
> On 01/12/2015 15:08, Martin Shobe wrote:
>> On 12/1/2015 6:38 AM, BartC wrote:
>
>>> char s[]="£100 = €140";
>>> for (i=0; i<strlen(s); ++i){
>>> c = s[i];
>
>>> I want to print the individual characters in the string. Compiled with
>>> gcc, I get (using Windows console set to code page 65001):
>>>
>>> £100 = €140
>>> 0: 194 C2 <�>
>>> 1: 163 A3 <�>
>>> 2: 049 31 <1>
>>> 3: 048 30 <0>
>>> 4: 048 30 <0>
>>> 5: 032 20 < >
>>> 6: 061 3D <=>
>>> 7: 032 20 < >
>>> 8: 226 E2 <�>
>>> 9: 130 82 <�>
>>> 10: 172 AC <�>
>>> 11: 049 31 <1>
>>> 12: 052 34 <4>
>>> 13: 048 30 <0>
>>>
>>> I get 13 'characters' output instead of the 11 I expect. The £ and €
>>> characters are replaced by sequences of those funny black diamonds (you
>>> might see some other error character).
>>
>> Why did you expect 11?
>
> Because there are 11 characters in "£100 = €140", not 13 (or 14 actually).
But you told C to print an octet, not a character.
> In your loop, you aren't printing code points,
>> but octets and there are 13 of those.
>
> This is the problem I have with people saying that UTF8 can be be used
> transparently.
>
> With an 8-bit coding (eg. ASCII plus 128 selected characters), the bytes
> in the data and the characters or code-points they represent have a 1:1
> correspondence. (The difference between character and code-point is
> something I would have to go and look up.)
>
> Any code that makes that assumption can risk programs not working as
> expected.
>
> With 16-bit or 32-bit strings, I would expect output more like the
> following:
>
> 0: 163 00A3 <£>
> 1: 49 0031 <1>
> 2: 48 0030 <0>
> 3: 48 0030 <0>
> 4: 32 0020 < >
> 5: 61 003D <=>
> 6: 32 0020 < >
> 7: 8364 20EC <€>
> 8: 49 0031 <1>
> 9: 52 0034 <4>
> 10: 48 0030 <0>
I wouldn't. Not all characters are encoded using a single code-point.
While in your example it does, you can't, in general, rely on that.
> So, how would you, given the same "£100 = €140" UTF8 string, write the C
> code to enumerate all the characters or code-points rather than the bytes?
Code points aren't characters as you mean it either. If you want that,
you will have to make your code aware of the differences between octets,
code-points, and "characters". C I/O is too low level to understand such.
Martin Shobe
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-02 00:17 +0000 |
| Message-ID | <n3ld6f$k49$1@dont-email.me> |
| In reply to | #77578 |
On 01/12/2015 23:03, Martin Shobe wrote: > On 12/1/2015 2:02 PM, BartC wrote: >> On 01/12/2015 15:08, Martin Shobe wrote: >>> Why did you expect 11? >> >> Because there are 11 characters in "£100 = €140", not 13 (or 14 >> actually). > > But you told C to print an octet, not a character. I told it to print a value in %c format. In other words, a character. > >> So, how would you, given the same "£100 = €140" UTF8 string, write the C >> code to enumerate all the characters or code-points rather than the >> bytes? > > Code points aren't characters as you mean it either. If you want that, > you will have to make your code aware of the differences between octets, > code-points, and "characters". C I/O is too low level to understand such. So considerable amounts of code that was happily mixing up bytes, octets, characters and code-points for decades, no longer works with the advent of UTF8. This is my point. Too many people are saying it will just work transparently. It might do if you are just inputting a bunch of bytes ending with 0 or EOF, and outputting the same data without doing anything to it, not even counting how many 'characters' have been processed. But sometimes you need to do a bit more with it. Then I'm saying that using wide-character strings is easier that trying to grapple with UTF8. You can even use the same algorithms with 16- or 32-bit character strings as with 8-bit. (Of course, many are still going to delight in pin-pointing all sorts of complications with Unicode where the simplest operations are fraught with problems. But then, many others don't care.) -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-01 16:53 -0800 |
| Message-ID | <e6fdb173-64b3-497f-996d-c944404a3764@googlegroups.com> |
| In reply to | #77582 |
On Wednesday, December 2, 2015 at 12:17:55 AM UTC, Bart wrote: > > This is my point. Too many people are saying it will just work > transparently. It might do if you are just inputting a bunch of bytes > ending with 0 or EOF, and outputting the same data without doing > anything to it, not even counting how many 'characters' have been > processed. > > But sometimes you need to do a bit more with it. Then I'm saying that > using wide-character strings is easier that trying to grapple with UTF8. > > You can even use the same algorithms with 16- or 32-bit character > strings as with 8-bit. (Of course, many are still going to delight in > pin-pointing all sorts of complications with Unicode where the simplest > operations are fraught with problems. But then, many others don't care.) > UTF-8 is designed to be backwards compatible with ascii both on the binary level and at the code level, but you can't achieve both. Almost any algorithm that works on ascii strings will still work if you increase the width of a "char" and add extra symbols, with the exception of algorithms that take a histogram of 128 / 256 possible values (even if theoretically they work they often become unviable when character space gets too large), but then you don't have binary compatibility. The main place UTF-8 falls down is at the final output stage, when characters have to be converted to glyphs, but there are other cases. A wildcard matcher will match * correctly but not ?. Parameterised calls to strchr() will fail if the passed character is not in the ascii subset. And programming scripts have special problems, especially if the programming language is doing string handling. Then the grapheme cluster problem means that UTF-32 is only a partial solution. Not getting Hebrew pointing correct isn't such a disaster as unpointed Hebrew is still readable, often acceptable, and Hebrew as a whole is usually a very small market.
[toc] | [prev] | [next] | [standalone]
| From | Martin Shobe <martin.shobe@yahoo.com> |
|---|---|
| Date | 2015-12-01 21:17 -0600 |
| Message-ID | <n3lno6$eh0$1@dont-email.me> |
| In reply to | #77582 |
On 12/1/2015 6:17 PM, BartC wrote: > On 01/12/2015 23:03, Martin Shobe wrote: >> On 12/1/2015 2:02 PM, BartC wrote: >>> On 01/12/2015 15:08, Martin Shobe wrote: > >>>> Why did you expect 11? >>> >>> Because there are 11 characters in "£100 = €140", not 13 (or 14 >>> actually). >> >> But you told C to print an octet, not a character. > > I told it to print a value in %c format. In other words, a character. Then this is where the misunderstanding is. Some of what you gave it were not characters as you call them. >>> So, how would you, given the same "£100 = €140" UTF8 string, write the C >>> code to enumerate all the characters or code-points rather than the >>> bytes? >> >> Code points aren't characters as you mean it either. If you want that, >> you will have to make your code aware of the differences between octets, >> code-points, and "characters". C I/O is too low level to understand such. > > So considerable amounts of code that was happily mixing up bytes, > octets, characters and code-points for decades, no longer works with the > advent of UTF8. I don't know how much code is. I'm pretty sure that code that assumes octets are characters will break, like what you did above. Code that treats strings as strings and doesn't need to know which octets are part of which characters will still work. Most of the code I've written falls into the latter category, but others may have different experiences. > This is my point. Too many people are saying it will just work > transparently. It might do if you are just inputting a bunch of bytes > ending with 0 or EOF, and outputting the same data without doing > anything to it, not even counting how many 'characters' have been > processed. > But sometimes you need to do a bit more with it. Then I'm saying that > using wide-character strings is easier that trying to grapple with UTF8. Not really, you still have to deal with the fact that code-points aren't characters either. > You can even use the same algorithms with 16- or 32-bit character > strings as with 8-bit. (Of course, many are still going to delight in > pin-pointing all sorts of complications with Unicode where the simplest > operations are fraught with problems. But then, many others don't care.) > Martin Shobe
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 09:37 -0600 |
| Message-ID | <n3n33g$qs7$1@dont-email.me> |
| In reply to | #77582 |
On 01-Dec-15 18:17, BartC wrote: > On 01/12/2015 23:03, Martin Shobe wrote: >> On 12/1/2015 2:02 PM, BartC wrote: >>> On 01/12/2015 15:08, Martin Shobe wrote: >>>> Why did you expect 11? >>> >>> Because there are 11 characters in "£100 = €140", not 13 (or 14 >>> actually). >> >> But you told C to print an octet, not a character. > > I told it to print a value in %c format. In other words, a > character. For historical reasons, C conflates "bytes" and "characters". In various contexts, "character" is used mean any of "byte", "code unit", "code point", "glyph", "grapheme" or "grapheme cluster"--and sometimes more than one of them in the _same_ context. And if that wasn't confusing enough, sometimes it even means "non-character"! >>> So, how would you, given the same "£100 = €140" UTF8 string, >>> write the C code to enumerate all the characters or code-points >>> rather than the bytes? >> >> Code points aren't characters as you mean it either. If you want >> that, you will have to make your code aware of the differences >> between octets, code-points, and "characters". C I/O is too low >> level to understand such. > > So considerable amounts of code that was happily mixing up bytes, > octets, characters and code-points for decades, no longer works with > the advent of UTF8. As long as you don't _split_ strings, which includes extracting individual bytes from them, UTF-8 is completely transparent. > This is my point. Too many people are saying it will just work > transparently. It might do if you are just inputting a bunch of > bytes ending with 0 or EOF, and outputting the same data without > doing anything to it, Most code does not split strings; it treats them as opaque blobs or, at most, concatenates them (which is safe with UTF-8). > not even counting how many 'characters' have been processed. Most code gets this part wrong anyway, at least for most meanings of "character". > But sometimes you need to do a bit more with it. Then I'm saying > that using wide-character strings is easier that trying to grapple > with UTF8. Yes, which is why UTF-32 is an acceptable _memory_ representation for strings. OTOH, UTF-16 has all the problems of UTF-8 and more yet none of the benefits. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@verizon.net> |
|---|---|
| Date | 2015-12-02 10:59 -0500 |
| Message-ID | <n3n4dd$vn7$1@dont-email.me> |
| In reply to | #77625 |
On 12/02/2015 10:37 AM, Stephen Sprunk wrote: ... > For historical reasons, C conflates "bytes" and "characters". > > In various contexts, "character" is used mean any of "byte", "code > unit", "code point", "glyph", "grapheme" or "grapheme cluster"--and > sometimes more than one of them in the _same_ context. And if that > wasn't confusing enough, sometimes it even means "non-character"! "byte addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (3.6p1) So a byte is an amount of data storage. It can hold a character, but is not the same thing as a character - which is good, because if they were the same thing, that definition would be saying that a byte must be big enough to store a byte, which is a pretty meaningless definition. "character 〈abstract〉 member of a set of elements used for the organization, control, or representation of data." (3.7p1) I'm afraid that's a little too abstract for my taste. "character single-byte character (C) bit representation that fits in a byte." (3.7.1p1) There are separate definitions for "wide character" and "multi-byte character". So a C character is something that will fit in a byte - but is not the same thing as a byte - which is good, because otherwise it would be saying that a character must fit in a character, a pretty meaningless definition. The distinction made by the standard between a byte and a C character seems pretty clear to me. Can you provide any citations of places where "byte" is used when "character" was meant, or vice versa? I'm not suggesting that there are no such citations - it's a big document, and both terms are used quite frequently in that document, I'd be more surprised if there were no errors. However, if there are any, they should be brought to the attention to the editor of the standard, Larry Jones, so they can be fixed. I'm not aware of any location where C talks about "code point", "glyph", "grapheme" or "grapheme cluster" - nor any location where it uses either "byte" or "character" when it should instead have used one of those terms. Could you identify some locations where such a mistake was made? -- James Kuyper
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-02 17:43 +0000 |
| Message-ID | <n3nafp$pd0$1@dont-email.me> |
| In reply to | #77625 |
On 02/12/2015 15:37, Stephen Sprunk wrote: > On 01-Dec-15 18:17, BartC wrote: >> This is my point. Too many people are saying it will just work >> transparently. It might do if you are just inputting a bunch of >> bytes ending with 0 or EOF, and outputting the same data without >> doing anything to it, > > Most code does not split strings; it treats them as opaque blobs or, at > most, concatenates them (which is safe with UTF-8). I don't believe that. Perhaps in a scripting language that might be the case: it might be too slow for that, or relies on built-in functionality to do all that stuff that needs to be done with strings. But that functionality might well be implemented in C. >> not even counting how many 'characters' have been processed. > > Most code gets this part wrong anyway, at least for most meanings of > "character". I don't agree with this either! Of course most of us don't have friends who like to sign their emails with obscure non-BMP names (perhaps chosen precisely /because/ they are non-BMP and therefore different). Most of the time a character count is going to be right, unless you're going to suggest it's wrong because we don't understand what a 'character' is. I think most people have a pretty good idea! -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 13:22 -0600 |
| Message-ID | <n3ng9m$i2e$1@dont-email.me> |
| In reply to | #77644 |
On 02-Dec-15 11:43, BartC wrote: > On 02/12/2015 15:37, Stephen Sprunk wrote: >> On 01-Dec-15 18:17, BartC wrote: >>> This is my point. Too many people are saying it will just work >>> transparently. It might do if you are just inputting a bunch of >>> bytes ending with 0 or EOF, and outputting the same data without >>> doing anything to it, >> >> Most code does not split strings; it treats them as opaque blobs >> or, at most, concatenates them (which is safe with UTF-8). > > I don't believe that. Perhaps in a scripting language that might be > the case: it might be too slow for that, or relies on built-in > functionality to do all that stuff that needs to be done with > strings. But that functionality might well be implemented in C. ... and it is straightforward do so when needed, but if you're going to do a lot of it, or deal with encodings other than UTF-8 (which are becoming less relevant every day), then it's probably worth going to UTF-32. But even that doesn't solve _all_ of your problems because you still have to deal with combining characters, non-characters, etc. >>> not even counting how many 'characters' have been processed. >> >> Most code gets this part wrong anyway, at least for most meanings >> of "character". > > I don't agree with this either! Of course most of us don't have > friends who like to sign their emails with obscure non-BMP names > (perhaps chosen precisely /because/ they are non-BMP and therefore > different). I seriously doubt their ancestors chose their surname based on a prediction that, thousands of years in the future, the Unicode Consortium would assign it a code point outside the BMP. > Most of the time a character count is going to be right, unless > you're going to suggest it's wrong because we don't understand what > a 'character' is. I think most people have a pretty good idea! That's the problem, actually: _nobody_ knows exactly what "character" means. It means something different to everyone and even different things to the same person depending on context. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-12-03 09:32 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc92reFi96mU6@mid.individual.net> |
| In reply to | #77644 |
BartC wrote: > On 02/12/2015 15:37, Stephen Sprunk wrote: >> On 01-Dec-15 18:17, BartC wrote: > >>> This is my point. Too many people are saying it will just work >>> transparently. It might do if you are just inputting a bunch of >>> bytes ending with 0 or EOF, and outputting the same data without >>> doing anything to it, >> >> Most code does not split strings; it treats them as opaque blobs or, at >> most, concatenates them (which is safe with UTF-8). > > I don't believe that. Perhaps in a scripting language that might be the > case: it might be too slow for that, or relies on built-in functionality > to do all that stuff that needs to be done with strings. But that > functionality might well be implemented in C. Maybe "most code does not split strings in a naïve manner" would be better? Looking at my own code, where a string gets split, it gets splat after a search for a delimiter. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-02 21:12 +0000 |
| Message-ID | <n3nmo8$e5j$1@dont-email.me> |
| In reply to | #77661 |
On 02/12/2015 20:32, Ian Collins wrote: > BartC wrote: >> I don't believe that. Perhaps in a scripting language that might be the >> case: it might be too slow for that, or relies on built-in functionality >> to do all that stuff that needs to be done with strings. But that >> functionality might well be implemented in C. > > Maybe "most code does not split strings in a naïve manner" would be > better? Looking at my own code, where a string gets split, it gets > splat after a search for a delimiter. What's wrong in wanting to do things in a naive manner? Ie. in a simple and obvious way. When character sets such as ASCII came along, it provided a limited and stylised way of representing English in a computer (compared with what was available in type-setting for example, or with handwriting). Yet an enormous amount was possible. Systems tended to be written around the limitations (so using *, / and . for multiple, divide and decimal point), which seems to have worked very well. (As has the typewriter actually.) It was possible to easily write programs to manipulate characters, words and lines because everything was so obvious. Now, we need a single, unified character set so we need to go beyond 8 bits to 16 and 21 (or 32). Fine. But we're also no longer allowed to treat things so simply. Because some alphabets don't share the same concepts of upper and lower case; characters might be represented in multiple code points; the same glyph is associated with many code points; etc etc. In short, it's become impossible. Many of the complications of type-setting, as well as too many real-world considerations, have been introduced, when they really belong at a different level. And, to top it all, we are expected to cope with a variable length encoding too! > Maybe "most code does not split strings in a naïve manner" would be > better? Looking at my own code, where a string gets split, it gets > splat after a search for a delimiter. That's too complicated. In a language you want to keep it simple as possible. Suppose I have a string S and want to copy the initial character into the last, so that "Bart" => "BarB". Even in C, I want to be able to write (after ensuring S isn't empty): S[strlen(S)-1] = S[0]; when S uses 8-bit elements and the string can be represented as such, and to also be able to write: S[strlen(S)-1] = S[0]; when S uses 16-bit or 32-bit elements. With a variable-length encoding however, it's completely different. The resulting string might be longer, meaning additional problems of memory management. Or the string exists in a field of a struct which might not be big enough. It's like taking existing code which works perfectly well with characters, and changing it to work with variable-length words. Who wants to code like that? I think UTF8 is a fine compression scheme for storing text on disk, otherwise... -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2015-12-03 10:36 +1300 |
| Subject | Re: Working efficiently with 32-bit Unicode output streams, locale etc. |
| Message-ID | <dc96i7Fi96mU8@mid.individual.net> |
| In reply to | #77665 |
BartC wrote: > On 02/12/2015 20:32, Ian Collins wrote: >> BartC wrote: > >>> I don't believe that. Perhaps in a scripting language that might be the >>> case: it might be too slow for that, or relies on built-in functionality >>> to do all that stuff that needs to be done with strings. But that >>> functionality might well be implemented in C. >> >> Maybe "most code does not split strings in a naïve manner" would be >> better? Looking at my own code, where a string gets split, it gets >> splat after a search for a delimiter. > > What's wrong in wanting to do things in a naive manner? Ie. in a simple > and obvious way. Did you parse what I wrote? Is looking for a delimiter and splitting the string around it anything other than simple? If you want to go one step further, splitting the string a a fixed point, you have to know where. If you know where, you know the format of the data. <snip> > That's too complicated. In a language you want to keep it simple as > possible. So you advocate random splitting of strings? > Suppose I have a string S and want to copy the initial character into > the last, so that "Bart" => "BarB". Even in C, I want to be able to > write (after ensuring S isn't empty): > > S[strlen(S)-1] = S[0]; How often in real code would you want to do that? -- Ian Collins
[toc] | [prev] | [next] | [standalone]
Page 4 of 11 — ← Prev page 1 2 3 [4] 5 6 … 11 Next page →
Back to top | Article view | comp.lang.c
csiph-web