Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77629 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2015-12-02 08:01 -0800 |
| Last post | 2015-12-06 13:45 +0000 |
| Articles | 20 on this page of 158 — 25 participants |
Back to article view | Back to comp.lang.c
unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000
Page 1 of 8 [1] 2 3 4 5 6 7 8 Next page →
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2015-12-02 08:01 -0800 |
| Subject | unicode is a fail |
| Message-ID | <fbcae10f-7fc6-4a1e-90d7-ea4925016e47@googlegroups.com> |
Im personally still using asci in all my private apps and i shiver (a bit) to use unicode as i read from time to time text that says unicode is a pain (at least in some situations) This directs me to think that unicode is in general a fail.. Unicode could go the way and become something maybe even simpler than ascii but gone a bit in a wrong way of making a lot additional mess I thing then that maybe one posible recovery scenerio is to use damn utf-32 only, everywhere you coud and try to forget and deprecate the other part of the mess what do ya think?
[toc] | [next] | [standalone]
| From | me <self@example.org> |
|---|---|
| Date | 2015-12-02 16:12 +0000 |
| Message-ID | <n3n5ab$6g3$1@speranza.aioe.org> |
| In reply to | #77629 |
On 2015-12-02, fir <profesor.fir@gmail.com> wrote: > Im personally still using asci in all my private apps and i shiver (a > bit) to use unicode as i read from time to time text that says unicode > is a pain (at least in some situations) …good for you, pal. > what do ya think? Nice trolling attempt.
[toc] | [prev] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2015-12-02 09:09 -0800 |
| Message-ID | <bdc0f137-e299-4eb6-96e7-4c9e5c8b9051@googlegroups.com> |
| In reply to | #77630 |
W dniu środa, 2 grudnia 2015 17:13:18 UTC+1 użytkownik me napisał: > On 2015-12-02, fir <profesor.fir@gmail.com> wrote: > > Im personally still using asci in all my private apps and i shiver (a > > bit) to use unicode as i read from time to time text that says unicode > > is a pain (at least in some situations) > > …good for you, pal. > > > what do ya think? > > Nice trolling attempt. get wiser fella (dont say you want to achieve level of intelligence of well known non-trolls - as troll-insulters i assume try to presents themselves in general aura of their troll insulting stupidity here )
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-02 08:18 -0800 |
| Message-ID | <22fccd29-addc-4070-8d1d-c3f876f5f12e@googlegroups.com> |
| In reply to | #77629 |
On Wednesday, December 2, 2015 at 4:02:14 PM UTC, fir wrote: > Im personally still using asci in all my private apps and i shiver (a bit) to use unicode as i read > from time to time text that says unicode is a pain (at least in some situations) > > This directs me to think that unicode is in general a fail.. Unicode could go the way and > become something maybe even simpler than ascii but gone a bit in a wrong way of making > a lot additional mess > > I thing then that maybe one posible recovery scenerio is to use damn utf-32 only, everywhere > you coud and try to forget and deprecate the other part of the mess > > what do ya think? > If ascii had never achieved any traction outside of North America, then I think there would be a strong case for UTF-32. Reality is that there are masses and masses of ascii interfaces around, and it would be a nightmare job to track them all down and either rip them out or write little adapter functions to make them talk to the rest of the world in UTF-32. UTF-8 is the best compromise. But there are some problem that are very hard to avoid., like supporting archaic ash and thorn in English (mediaeval, ye olde coffee shoppe), when half the population think the latter is a y as in yellow.
[toc] | [prev] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2015-12-02 09:07 -0800 |
| Message-ID | <9d6e662f-e8eb-4f76-bc92-6d04d7b3eba0@googlegroups.com> |
| In reply to | #77631 |
W dniu środa, 2 grudnia 2015 17:18:54 UTC+1 użytkownik Malcolm McLean napisał: > On Wednesday, December 2, 2015 at 4:02:14 PM UTC, fir wrote: > > Im personally still using asci in all my private apps and i shiver (a bit) to use unicode as i read > > from time to time text that says unicode is a pain (at least in some situations) > > > > This directs me to think that unicode is in general a fail.. Unicode could go the way and > > become something maybe even simpler than ascii but gone a bit in a wrong way of making > > a lot additional mess > > > > I thing then that maybe one posible recovery scenerio is to use damn utf-32 only, everywhere > > you coud and try to forget and deprecate the other part of the mess > > > > what do ya think? > > > If ascii had never achieved any traction outside of North America, then I think there would > be a strong case for UTF-32. Reality is that there are masses and masses of ascii interfaces > around, and it would be a nightmare job to track them all down and either rip them out > or write little adapter functions to make them talk to the rest of the world in UTF-32. > > UTF-8 is the best compromise. But there are some problem that are very hard to avoid., > like supporting archaic ash and thorn in English (mediaeval, ye olde coffee shoppe), > when half the population think the latter is a y as in yellow. Im not sure of overal utf-8 is the good compromise, ascii is simple utf32 is simple (i hope, dont know deep details) so maybe those interfacing wouldnt be so hard (should be binary trivial and thats a big value, those oldschool value that is lost when you use utf-8 (and need to rely on external libraries rather than writing own routines in own hand if need)) (still im not sure depends if utf-32 has no weird glitches and if it is really binary easy format) (there is still a wuestion if LE of BE, i tend to say that it should be native in ram and probably both format allowed in files, though with some tendency to favorize big endian as international standard)
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 11:21 -0600 |
| Message-ID | <n3n969$k70$1@dont-email.me> |
| In reply to | #77636 |
On 02-Dec-15 11:07, fir wrote: > Malcolm McLean napisał: >> If ascii had never achieved any traction outside of North America, >> then I think there would be a strong case for UTF-32. Reality is >> that there are masses and masses of ascii interfaces around, and it >> would be a nightmare job to track them all down and either rip them >> out or write little adapter functions to make them talk to the rest >> of the world in UTF-32. >> >> UTF-8 is the best compromise. ... > > Im not sure of overal utf-8 is the good compromise, ascii is simple > utf32 is simple (i hope, dont know deep details) so maybe those > interfacing wouldnt be so hard (should be binary trivial and thats a > big value, those oldschool value that is lost when you use utf-8 (and > need to rely on external libraries rather than writing own routines > in own hand if need)) (still im not sure depends if utf-32 has no > weird glitches and if it is really binary easy format) (there is > still a wuestion if LE of BE, i tend to say that it should be native > in ram and probably both format allowed in files, though with some > tendency to favorize big endian as international standard) UTF-32's simplicity comes at the cost of embedded NUL characters, so it's inherently incompatible with all existing C string-handling code. UTF-8 isn't perfect, but at least it is _usually_ compatible, and it has the side benefits of being endian-neutral and generally smaller. UTF-16 takes the worst of both and the best of neither. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2015-12-02 09:40 -0800 |
| Message-ID | <ae1212b2-dd3f-46e3-8745-7ba23971f641@googlegroups.com> |
| In reply to | #77639 |
W dniu środa, 2 grudnia 2015 18:21:46 UTC+1 użytkownik Stephen Sprunk napisał: > On 02-Dec-15 11:07, fir wrote: > > Malcolm McLean napisał: > >> If ascii had never achieved any traction outside of North America, > >> then I think there would be a strong case for UTF-32. Reality is > >> that there are masses and masses of ascii interfaces around, and it > >> would be a nightmare job to track them all down and either rip them > >> out or write little adapter functions to make them talk to the rest > >> of the world in UTF-32. > >> > >> UTF-8 is the best compromise. ... > > > > Im not sure of overal utf-8 is the good compromise, ascii is simple > > utf32 is simple (i hope, dont know deep details) so maybe those > > interfacing wouldnt be so hard (should be binary trivial and thats a > > big value, those oldschool value that is lost when you use utf-8 (and > > need to rely on external libraries rather than writing own routines > > in own hand if need)) (still im not sure depends if utf-32 has no > > weird glitches and if it is really binary easy format) (there is > > still a wuestion if LE of BE, i tend to say that it should be native > > in ram and probably both format allowed in files, though with some > > tendency to favorize big endian as international standard) > > UTF-32's simplicity comes at the cost of embedded NUL characters, so > it's inherently incompatible with all existing C string-handling code. > UTF-8 isn't perfect, but at least it is _usually_ compatible, and it has > the side benefits of being endian-neutral and generally smaller. UTF-16 > takes the worst of both and the best of neither. > on utf-16 i wouldnt like to speak at all, on utf-8 - I know - but those advantages and disadvantages come versus utf-32 advantages and disadvantages, and my point is that utf-8 advantages you get as a cost of general (thus very heavy) mess, (and now as utf-8 and-16 are common world is really polluted by this unicode mess (this is mess sorta like various html versions, of various support, all that mess) - it is probably not worth it) (people will get used to it, as people will get used to anything but it doesnt mean that utf-32 world would not be far better really) still im not sure if on windows i got easy way just to unify all my unicode with utf-32 (probably no, as they enforce a variant of utf-16 afaik) but still i just mean that utf-32 is most logical option to me, (blah )
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-02 11:22 -0800 |
| Message-ID | <lnoae8sljm.fsf@kst-u.example.com> |
| In reply to | #77639 |
Stephen Sprunk <stephen@sprunk.org> writes:
[...]
> UTF-32's simplicity comes at the cost of embedded NUL characters, so
> it's inherently incompatible with all existing C string-handling code.
> UTF-8 isn't perfect, but at least it is _usually_ compatible, and it has
> the side benefits of being endian-neutral and generally smaller. UTF-16
> takes the worst of both and the best of neither.
UTF-32 has that cost if it's encoded as a sequence of 4 octets per
character.
If wchar_t is 32 bits, then the standard library functions that handle
arrays of wchar_t (wcslen() et al) don't have that problem; only a
32-bit zero is treated as a (wide) null character.
On the other hand, MS Windows has 16-bit wchar_t.
On the other other hand, C11 adds char16_t and char32_t.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 15:59 -0600 |
| Message-ID | <n3npg5$os5$1@dont-email.me> |
| In reply to | #77648 |
On 02-Dec-15 13:22, Keith Thompson wrote: > Stephen Sprunk <stephen@sprunk.org> writes: [...] >> UTF-32's simplicity comes at the cost of embedded NUL characters, >> so it's inherently incompatible with all existing C string-handling >> code. UTF-8 isn't perfect, but at least it is _usually_ compatible, >> and it has the side benefits of being endian-neutral and generally >> smaller. UTF-16 takes the worst of both and the best of neither. > > UTF-32 has that cost if it's encoded as a sequence of 4 octets per > character. UTF-32 is, by definition, an encoding of exactly one 32-bit code unit per code point. Depending on what you mean by "character", though, it may require a variable number of code points, and there are also code points for non-characters. That's the _real_ problem, so UTF-32's alleged fixed width is misleading. > If wchar_t is 32 bits, then the standard library functions that > handle arrays of wchar_t (wcslen() et al) don't have that problem; > only a 32-bit zero is treated as a (wide) null character. That solves the NUL-terminated string issue, but it also means every function with a string argument or return must be replaced or, worse, duplicated. Ouch. Worse, all that pain doesn't even solve the _real_ problems! > On the other hand, MS Windows has 16-bit wchar_t. That violates the C Standard's requirements, but it's also a popular enough platform that to ignore its problems may be unwise. > On the other other hand, C11 adds char16_t and char32_t. That seems like a sop to Microsoft. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-02 16:25 -0800 |
| Message-ID | <lny4dcqsxu.fsf@kst-u.example.com> |
| In reply to | #77671 |
Stephen Sprunk <stephen@sprunk.org> writes:
> On 02-Dec-15 13:22, Keith Thompson wrote:
>> Stephen Sprunk <stephen@sprunk.org> writes: [...]
>>> UTF-32's simplicity comes at the cost of embedded NUL characters,
>>> so it's inherently incompatible with all existing C string-handling
>>> code. UTF-8 isn't perfect, but at least it is _usually_ compatible,
>>> and it has the side benefits of being endian-neutral and generally
>>> smaller. UTF-16 takes the worst of both and the best of neither.
>>
>> UTF-32 has that cost if it's encoded as a sequence of 4 octets per
>> character.
>
> UTF-32 is, by definition, an encoding of exactly one 32-bit code unit
> per code point.
Surely that 32-bit code unit can be represented by a sequence of 4
octets. For example, if I type
echo hello | iconv -f utf-8 -t utf-32 > hello.utf32
I get a file that represents each character as 4 bytes (and that starts
with a 4-byte BOM).
> Depending on what you mean by "character", though, it may require a
> variable number of code points, and there are also code points for
> non-characters. That's the _real_ problem, so UTF-32's alleged fixed
> width is misleading.
Yes, I was glossing over that issue.
>> If wchar_t is 32 bits, then the standard library functions that
>> handle arrays of wchar_t (wcslen() et al) don't have that problem;
>> only a 32-bit zero is treated as a (wide) null character.
>
> That solves the NUL-terminated string issue, but it also means every
> function with a string argument or return must be replaced or, worse,
> duplicated. Ouch.
But that's pretty much already done; wcslen() et al are part of the
standard library.
> Worse, all that pain doesn't even solve the _real_ problems!
>
>> On the other hand, MS Windows has 16-bit wchar_t.
>
> That violates the C Standard's requirements, but it's also a popular
> enough platform that to ignore its problems may be unwise.
Does it? wchar_t is supposed to be "an integer type whose range of
values can represent distinct codes for all members of the largest
extended character set specified among the supported locales". An
conforming implementation whose largest extended character set has no
more than 65536 characters could legally use 16-bit wchar_t. I don't
know whether that applies to Microsoft's implementation.
>> On the other other hand, C11 adds char16_t and char32_t.
>
> That seems like a sop to Microsoft.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-02 19:47 -0600 |
| Message-ID | <n3o6rq$9mt$1@dont-email.me> |
| In reply to | #77683 |
On 02-Dec-15 18:25, Keith Thompson wrote: > Stephen Sprunk <stephen@sprunk.org> writes: >> On 02-Dec-15 13:22, Keith Thompson wrote: >>> Stephen Sprunk <stephen@sprunk.org> writes: [...] >>>> UTF-32's simplicity comes at the cost of embedded NUL characters, >>>> so it's inherently incompatible with all existing C string-handling >>>> code. UTF-8 isn't perfect, but at least it is _usually_ compatible, >>>> and it has the side benefits of being endian-neutral and generally >>>> smaller. UTF-16 takes the worst of both and the best of neither. >>> >>> UTF-32 has that cost if it's encoded as a sequence of 4 octets per >>> character. >> >> UTF-32 is, by definition, an encoding of exactly one 32-bit code unit >> per code point. > > Surely that 32-bit code unit can be represented by a sequence of 4 > octets. For example, if I type > > echo hello | iconv -f utf-8 -t utf-32 > hello.utf32 > > I get a file that represents each character as 4 bytes (and that starts > with a 4-byte BOM). Of course; with UTF-32LE, the 32-bit code unit is represented with one set of 4 bytes, and with UTF-32BE, the 32-bit code unit is represented with a _different_ set of 4 bytes, which is why you need the BOM to distinguish them. OTOH, a UTF-32LE BOM looks exactly like a UTF-16LE BOM followed by a NUL, so there's no way to reliably determine which was used to encode some files. Oops. And neither can be reliably distinguished from a non-UTF-16/32 file that just happens to start with 0xFE 0xFF, a valid byte sequence in many other encodings (but notably _not_ UTF-8). >>> If wchar_t is 32 bits, then the standard library functions that >>> handle arrays of wchar_t (wcslen() et al) don't have that problem; >>> only a 32-bit zero is treated as a (wide) null character. >> >> That solves the NUL-terminated string issue, but it also means every >> function with a string argument or return must be replaced or, worse, >> duplicated. Ouch. > > But that's pretty much already done; wcslen() et al are part of the > standard library. It's not just the Standard Library; it's every function in every program or library ever written that takes or returns a string. For a real-world example, look at the Windows API. >> Worse, all that pain doesn't even solve the _real_ problems! >> >>> On the other hand, MS Windows has 16-bit wchar_t. >> >> That violates the C Standard's requirements, but it's also a >> popular enough platform that to ignore its problems may be unwise. > > Does it? wchar_t is supposed to be "an integer type whose range of > values can represent distinct codes for all members of the largest > extended character set specified among the supported locales". An > conforming implementation whose largest extended character set has > no more than 65536 characters could legally use 16-bit wchar_t. I > don't know whether that applies to Microsoft's implementation. Supporting non-BMP characters (i.e. >65536 total) was what drove their switch from UCS-2 (which complied) to UTF-16 (which doesn't). S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2015-12-02 14:38 -0800 |
| Message-ID | <20019f4f-2d82-4b0c-9144-ce1513139b52@googlegroups.com> |
| In reply to | #77648 |
On Wednesday, December 2, 2015 at 1:22:34 PM UTC-6, Keith Thompson wrote: > On the other other hand, C11 adds char16_t and char32_t. Neither of which, interestingly enough, counts as a "character" type despite the name.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-02 16:26 -0800 |
| Message-ID | <lntwo0qsvk.fsf@kst-u.example.com> |
| In reply to | #77676 |
supercat@casperkitty.com writes:
> On Wednesday, December 2, 2015 at 1:22:34 PM UTC-6, Keith Thompson wrote:
>> On the other other hand, C11 adds char16_t and char32_t.
>
> Neither of which, interestingly enough, counts as a "character" type
> despite the name.
Neither is wchar_t. char, unsigned char, and signed char are the only
"character types".
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <txr@alumni.caltech.edu> |
|---|---|
| Date | 2015-12-09 11:33 -0800 |
| Message-ID | <kfnsi3bv2lc.fsf@x-alumni2.alumni.caltech.edu> |
| In reply to | #77684 |
Keith Thompson <kst-u@mib.org> writes: > supercat@casperkitty.com writes: >> On Wednesday, December 2, 2015 at 1:22:34 PM UTC-6, Keith Thompson wrote: >>> On the other other hand, C11 adds char16_t and char32_t. >> >> Neither of which, interestingly enough, counts as a "character" type >> despite the name. > > Neither is wchar_t. char, unsigned char, and signed char are the only > "character types". IIANM, any or all of wchar_t, char16_t, and char32_t can be character types, depending on the implementation.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-09 12:21 -0800 |
| Message-ID | <lnk2onl6ei.fsf@kst-u.example.com> |
| In reply to | #78278 |
Tim Rentsch <txr@alumni.caltech.edu> writes:
> Keith Thompson <kst-u@mib.org> writes:
>> supercat@casperkitty.com writes:
>>> On Wednesday, December 2, 2015 at 1:22:34 PM UTC-6, Keith Thompson wrote:
>>>> On the other other hand, C11 adds char16_t and char32_t.
>>>
>>> Neither of which, interestingly enough, counts as a "character" type
>>> despite the name.
>>
>> Neither is wchar_t. char, unsigned char, and signed char are the only
>> "character types".
>
> IIANM, any or all of wchar_t, char16_t, and char32_t can be character
> types, depending on the implementation.
You're right, of course. Any of them is a character type if and
only if it's a typedef for char, unsigned char, or signed char.
(On most implementations, they aren't.)
One might reach the conclusion that the standard's definition of the
phrase "character type" is confusing and misleading. Presumably the
term was defined before wchar_t and friends were added to the language,
and not updated.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2015-12-03 11:28 +0100 |
| Message-ID | <n3p5bv$q3j$1@dont-email.me> |
| In reply to | #77639 |
On 02/12/15 18:21, Stephen Sprunk wrote: > On 02-Dec-15 11:07, fir wrote: >> Malcolm McLean napisał: >>> If ascii had never achieved any traction outside of North America, >>> then I think there would be a strong case for UTF-32. Reality is >>> that there are masses and masses of ascii interfaces around, and it >>> would be a nightmare job to track them all down and either rip them >>> out or write little adapter functions to make them talk to the rest >>> of the world in UTF-32. >>> >>> UTF-8 is the best compromise. ... >> >> Im not sure of overal utf-8 is the good compromise, ascii is simple >> utf32 is simple (i hope, dont know deep details) so maybe those >> interfacing wouldnt be so hard (should be binary trivial and thats a >> big value, those oldschool value that is lost when you use utf-8 (and >> need to rely on external libraries rather than writing own routines >> in own hand if need)) (still im not sure depends if utf-32 has no >> weird glitches and if it is really binary easy format) (there is >> still a wuestion if LE of BE, i tend to say that it should be native >> in ram and probably both format allowed in files, though with some >> tendency to favorize big endian as international standard) > > UTF-32's simplicity comes at the cost of embedded NUL characters, so > it's inherently incompatible with all existing C string-handling code. > UTF-8 isn't perfect, but at least it is _usually_ compatible, and it has > the side benefits of being endian-neutral and generally smaller. UTF-16 > takes the worst of both and the best of neither. > UTF-32 also has endian issues, and while it has one code unit (i.e., 32-bit number) per code point (i.e., Unicode character), you still don't have a one-to-one correspondence between code points and glyphs. So UTF-32 does not make unicode as easy as ASCII - nor does it make it fixed length. For example, é can be made from a single character U+00E9, or from two characters: e and ́́ which combine to look like é . You cannot therefore assume that one 32-bit code unit is one character. So using UTF-32 simplifies some aspects of unicode, while keeping some complications that are inherent in unicode and introducing some of its own. Thus by far the most common choice for data transfer (files, protocols, etc.) is UTF-8, while UTF-32 is a common choice for an internal format within a program (where endian issues are not relevant). UTF-16 is the worst of both worlds, and (especially on Windows) is often mixed with UCS-2 which has limited range and fails with anything outside the BMP. (Noting, however, that non-BMP characters are rare except for CJK - and often these scripts use different encodings anyway.)
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-03 08:50 -0600 |
| Message-ID | <n3pkmu$kll$1@dont-email.me> |
| In reply to | #77721 |
On 03-Dec-15 04:28, David Brown wrote: > On 02/12/15 18:21, Stephen Sprunk wrote: >> UTF-32's simplicity comes at the cost of embedded NUL characters, >> so it's inherently incompatible with all existing C string-handling >> code. UTF-8 isn't perfect, but at least it is _usually_ compatible, >> and it has the side benefits of being endian-neutral and generally >> smaller. UTF-16 takes the worst of both and the best of neither. > > UTF-32 also has endian issues, and while it has one code unit (i.e., > 32-bit number) per code point (i.e., Unicode character), you still > don't have a one-to-one correspondence between code points and > glyphs. ITYM "grapheme clusters" for the latter. A "glyph" is the visual rendering of a "grapheme" in a certain font, and a "grapheme cluster" may require multiple glyphs. For example, "A" in Times and "A" in Helvetica are the same grapheme (and code point) but different glyphs. OTOH, Latin "A" and Greek "Α" are different graphemes (and code points) but typically map to the same set of glyphs. > (Noting, however, that non-BMP characters are rare except for CJK That depends; all of the new emoji are non-BMP, for instance, and many of us encounter those on a daily basis. I wouldn't call that "rare". > - and often these scripts use different encodings anyway.) ShiftJIS still has measurable usage in Japan but is steadily losing ground to UTF-8. Despite the PRC govt's mandate that everyone use GB18030/GB2312, UTF-8 clearly dominates there, same as in ROC and ROK. It's unclear what DPRK uses--or if an answer is even meaningful. > ... The rest of your post seems to be a restatement of what I've already said in other posts. Were you trying to collect it all in one place for the convenience of other readers? S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2015-12-03 16:38 +0100 |
| Message-ID | <n3pnhe$kf$1@dont-email.me> |
| In reply to | #77743 |
On 03/12/15 15:50, Stephen Sprunk wrote: > On 03-Dec-15 04:28, David Brown wrote: >> On 02/12/15 18:21, Stephen Sprunk wrote: >>> UTF-32's simplicity comes at the cost of embedded NUL characters, >>> so it's inherently incompatible with all existing C string-handling >>> code. UTF-8 isn't perfect, but at least it is _usually_ compatible, >>> and it has the side benefits of being endian-neutral and generally >>> smaller. UTF-16 takes the worst of both and the best of neither. >> >> UTF-32 also has endian issues, and while it has one code unit (i.e., >> 32-bit number) per code point (i.e., Unicode character), you still >> don't have a one-to-one correspondence between code points and >> glyphs. > > ITYM "grapheme clusters" for the latter. > > A "glyph" is the visual rendering of a "grapheme" in a certain font, and > a "grapheme cluster" may require multiple glyphs. > > For example, "A" in Times and "A" in Helvetica are the same grapheme > (and code point) but different glyphs. OTOH, Latin "A" and Greek "Α" > are different graphemes (and code points) but typically map to the same > set of glyphs. I believe you are correct. The terminology is complicated, and easy to get wrong - thanks for the clear explanation. > >> (Noting, however, that non-BMP characters are rare except for CJK > > That depends; all of the new emoji are non-BMP, for instance, and many > of us encounter those on a daily basis. I wouldn't call that "rare". > I didn't know these had their own unicode points - I have always thought of them as being combinations of ASCII characters like colon, hyphen, parenthesis :-) That's me learned two things in one post - probably a record! >> - and often these scripts use different encodings anyway.) > > ShiftJIS still has measurable usage in Japan but is steadily losing > ground to UTF-8. Despite the PRC govt's mandate that everyone use > GB18030/GB2312, UTF-8 clearly dominates there, same as in ROC and ROK. > It's unclear what DPRK uses--or if an answer is even meaningful. > >> ... > > The rest of your post seems to be a restatement of what I've already > said in other posts. Were you trying to collect it all in one place for > the convenience of other readers? > There have been a great many posts in a couple of threads about unicode just recently - some repetition is inevitable, and I might make a new post before having read all the other posts. But I was not directing my post to you specifically.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-03 10:01 -0600 |
| Message-ID | <n3poro$5v1$1@dont-email.me> |
| In reply to | #77752 |
On 03-Dec-15 09:38, David Brown wrote: > On 03/12/15 15:50, Stephen Sprunk wrote: >> On 03-Dec-15 04:28, David Brown wrote: >>> UTF-32 also has endian issues, and while it has one code unit (i.e., >>> 32-bit number) per code point (i.e., Unicode character), you still >>> don't have a one-to-one correspondence between code points and >>> glyphs. >> >> ITYM "grapheme clusters" for the latter. >> >> A "glyph" is the visual rendering of a "grapheme" in a certain >> font, and a "grapheme cluster" may require multiple glyphs. >> >> For example, "A" in Times and "A" in Helvetica are the same >> grapheme (and code point) but different glyphs. OTOH, Latin "A" >> and Greek "Α" are different graphemes (and code points) but >> typically map to the same set of glyphs. > > I believe you are correct. The terminology is complicated, and easy > to get wrong - thanks for the clear explanation. IMHO, it's not all that complicated; it's just a field that most of us haven't encountered before. You're probably familiar with phonemes due to IPA, and graphemes are the same idea applied to writing. Sememes are the same idea again applied to meaning, which is important in Unicode's Han Unification of CJK scripts. >>> (Noting, however, that non-BMP characters are rare except for >>> CJK >> >> That depends; all of the new emoji are non-BMP, for instance, and >> many of us encounter those on a daily basis. I wouldn't call that >> "rare". > > I didn't know these had their own unicode points - I have always > thought of them as being combinations of ASCII characters like colon, > hyphen, parenthesis :-) AFAICT, that's the difference between emoticons, e.g. ":)", and emoji, e.g. "☺️". The latter are mostly in U+26xx, U+27xx and U+1Fxxx, but they can be found scattered throughout various other blocks too. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-03 09:46 -0800 |
| Message-ID | <lna8prqvc3.fsf@kst-u.example.com> |
| In reply to | #77743 |
Stephen Sprunk <stephen@sprunk.org> writes:
[...]
> A "glyph" is the visual rendering of a "grapheme" in a certain font, and
> a "grapheme cluster" may require multiple glyphs.
And if you filter the text through rot13 before printing it, you can
render unto Caesar.
[...]
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
Page 1 of 8 [1] 2 3 4 5 6 7 8 Next page →
Back to top | Article view | comp.lang.c
csiph-web