Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #77629 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2015-12-02 08:01 -0800 |
| Last post | 2015-12-06 13:45 +0000 |
| Articles | 20 on this page of 158 — 25 participants |
Back to article view | Back to comp.lang.c
unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 08:01 -0800
Re: unicode is a fail me <self@example.org> - 2015-12-02 16:12 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:09 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 08:18 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:07 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:21 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:22 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:59 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:25 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 19:47 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-02 14:38 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 16:26 -0800
Re: unicode is a fail Tim Rentsch <txr@alumni.caltech.edu> - 2015-12-09 11:33 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:21 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 11:28 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 08:50 -0600
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:38 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 10:01 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-03 09:46 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:39 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 08:26 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-03 18:42 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-03 17:14 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 19:02 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-04 06:35 +0000
Re: unicode is a fail David Thompson <dave.thompson2@verizon.net> - 2015-12-28 05:11 -0500
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 10:24 -0600
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-03 22:37 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 11:32 +0100
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 11:10 -0600
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 09:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 13:10 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-02 19:45 +0000
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-03 09:08 +1300
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 14:10 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 11:27 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 15:21 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 15:18 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:45 +0000
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 09:43 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 11:40 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-02 12:19 -0800
Re: unicode is a fail Nobody <nobody@nowhere.invalid> - 2015-12-02 21:23 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 10:12 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 02:13 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:11 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 05:17 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 15:33 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:05 -0800
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 16:42 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-03 07:58 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-03 10:38 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-03 14:17 +0100
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:54 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-04 14:25 +0100
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-04 13:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-02 23:24 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-03 00:45 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-02 20:59 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-02 19:13 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-03 07:00 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 04:45 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:04 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 13:22 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-04 07:35 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 19:17 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 11:49 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 15:39 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-04 14:19 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 12:57 -0600
Re: unicode is a fail supercat@casperkitty.com - 2015-12-06 15:47 -0800
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:13 +0000
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-05 01:59 +0000
Re: unicode is a fail David Brown <david.brown@hesbynett.no> - 2015-12-05 17:17 +0100
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:28 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-04 23:46 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-05 01:04 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 03:21 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:03 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 11:47 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 04:40 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-05 13:26 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-05 13:35 -0600
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-06 02:23 +0000
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 16:09 +0530
Re: unicode is a fail Xavier <zaz.colmant@free.fr> - 2015-12-05 15:45 +0100
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 07:42 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-05 16:32 -0800
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 18:11 -0800
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 02:19 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-06 13:09 +0000
Re: unicode is a fail Martin Shobe <martin.shobe@yahoo.com> - 2015-12-06 18:38 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 01:55 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 19:14 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 13:53 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 06:31 -0800
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-07 21:22 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 15:34 -0600
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-07 16:36 -0800
Re: unicode is a fail Lowell Gilbert <lgusenet@be-well.ilk.org> - 2015-12-08 11:40 -0500
Re: unicode is a fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2015-12-08 17:18 +0000
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-09 08:36 -0600
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 10:06 -0600
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 09:35 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 10:07 -0800
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:04 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 12:35 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-09 23:46 +0000
Re: unicode is a fail supercat@casperkitty.com - 2015-12-09 16:15 -0800
Re: unicode is a fail glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2015-12-10 03:49 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-09 18:12 -0600
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-09 13:12 -0500
Re: unicode is a fail Keith Thompson <kst-u@mib.org> - 2015-12-09 12:12 -0800
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-10 20:48 +0000
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-09 23:44 +0000
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 01:13 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-10 10:39 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-10 03:33 -0800
Re: unicode is a fail supercat@casperkitty.com - 2015-12-10 06:07 -0800
Re: unicode is a fail "Osmium" <r124c4u102@comcast.net> - 2015-12-10 08:21 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-10 00:59 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 14:33 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-06 22:45 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 12:38 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 13:55 -0600
Re: unicode is a fail BartC <bc@freeuk.com> - 2015-12-07 21:14 +0000
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-07 16:50 -0600
Re: unicode is a fail Robert Wessel <robertwessel2@yahoo.com> - 2015-12-07 02:38 -0600
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 07:34 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-06 00:24 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-04 19:49 -0600
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 21:32 +0000
Re: unicode is a fail Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:50 -0800
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 22:15 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 17:27 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:06 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 18:29 -0500
Re: unicode is a fail Richard Heathfield <rjh@cpax.org.uk> - 2015-12-05 23:50 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:38 +0000
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:33 +0000
Re: unicode is a fail James Kuyper <jameskuyper@verizon.net> - 2015-12-05 16:51 -0500
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 10:59 +1300
Re: unicode is a fail Ian Collins <ian-news@hotmail.com> - 2015-12-06 11:00 +1300
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-06 06:31 +0000
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-02 17:48 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 01:20 -0800
Re: unicode is a fail fir <profesor.fir@gmail.com> - 2015-12-03 02:02 -0800
Re: unicode is a fail Stephen Sprunk <stephen@sprunk.org> - 2015-12-03 09:43 -0600
Re: unicode is a fail raltbos@xs4all.nl (Richard Bos) - 2015-12-04 12:55 +0000
Re: unicode is a fail Steve Thompson <stevet810@gmail.com> - 2015-12-04 18:29 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 16:42 +0000
Re: unicode is a fail Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-05 10:06 +0000
OT: Usenet (Was: unicode is a fail) Steve Thompson <stevet810@gmail.com> - 2015-12-05 20:41 +0000
Re: OT: Usenet (Was: unicode is a fail) Malcolm McLean <malcolm.mclean5@btinternet.com> - 2015-12-05 13:18 -0800
Re: unicode is a fail Udyant Wig <udyantw@gmail.com> - 2015-12-06 10:21 +0530
OT: Facebook (was Re: unicode is a fail) Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-12-06 08:51 +0000
Re: OT: Facebook (was Re: unicode is a fail) raltbos@xs4all.nl (Richard Bos) - 2015-12-06 13:45 +0000
Page 5 of 8 — ← Prev page 1 2 3 4 [5] 6 7 8 Next page →
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-05 11:47 +0000 |
| Message-ID | <n3uiop$p98$1@dont-email.me> |
| In reply to | #77867 |
On 05/12/2015 01:04, Steve Thompson wrote:
> On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote:
>> Fine, then we move to 16 bits, which had long been anticipated anyway,
>> and gives us plenty of room for special symbols. But not if we have to
>> throw in every single alphabet and writing system that anybody has ever
>> heard of (and apparently plenty that no one has heard of!).
>
> I rather suspect the Anthropologists will scream bloody murder if
> Egyptian hieroglyphics, Linear B, and all the rest are excluded.
They probably wouldn't notice. Whatever software they use to enter and
display the characters would still work if a different encoding scheme
was used.
Or many might prefer just using mark-up to describe it:
{snake}{bird}{water}.
>> (And then you have vast, sprawling 'alphabets' like Chinese which are
>> words rather than the letters used to build the words.)
>
> So go tell the Chinese (and Japanese, and Thais, and ...) that they
> should man-up and use a Western alphabet. Such schemes exist, after
> all.
No, they can use the same alphabets, but they don't put them all into
one giant melting pot with every other.
Now, I can now longer write what had been trivial string handling
routines such as capitalise, toupper, reverse, compare, left, leftn,
etc etc. All are very well defined in ASCII, but would no longer be
guaranteed to work with Unicode because most of the alphabets are so weird.
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-05 04:40 -0800 |
| Message-ID | <b88ee903-17ed-4f0d-8ebc-308a22fd4de8@googlegroups.com> |
| In reply to | #77879 |
On Saturday, December 5, 2015 at 11:48:12 AM UTC, Bart wrote: > > Now, I can now longer write what had been trivial string handling > routines such as capitalise, toupper, reverse, compare, left, leftn, > etc etc. All are very well defined in ASCII, but would no longer be > guaranteed to work with Unicode because most of the alphabets are so weird. > The concept of capitals is also pretty weird. We accept it normal because we grew up with it. It's just reality. Some operations either won't make sense or will be problematic if you don't use English or a closely related language. E.g. a French capital E cannot take an acute accent. So if we capitalise touché (a favourite French word) we get TOUCHE, and we put it back we get touche, touch, which doesn't mean the same thing - the reversibility rule which holds in English is broken. But that's a characteristic of French, not a problem created by Unicode.
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-05 13:26 +0000 |
| Message-ID | <n3uoh3$e71$1@dont-email.me> |
| In reply to | #77881 |
On 05/12/2015 12:40, Malcolm McLean wrote:
> On Saturday, December 5, 2015 at 11:48:12 AM UTC, Bart wrote:
>>
>> Now, I can now longer write what had been trivial string handling
>> routines such as capitalise, toupper, reverse, compare, left, leftn,
>> etc etc. All are very well defined in ASCII, but would no longer be
>> guaranteed to work with Unicode because most of the alphabets are so weird.
>>
> The concept of capitals is also pretty weird. We accept it normal
> because we grew up with it.
> It's just reality. Some operations either won't make sense or
> will be problematic if you don't use English or a closely related
> language. E.g. a French capital E cannot take an acute accent.
> So if we capitalise touché (a favourite French word) we get
> TOUCHE, and we put it back we get touche, touch, which doesn't
> mean the same thing - the reversibility rule which holds in
> English is broken. But that's a characteristic of French, not
> a problem created by Unicode.
>
But an accented E exists (É). Would TOUCHÉ be meaningless to a French
speaker?
I'm mainly familiar with Italian where accents are used to indicate
stress if it deviates from the rules, but it appears to be optional.
Stress can also be significant in English (PROject, proJECT), but we
seem to manage without marking the difference, and the two versions of
'project' will always match.
I just want to be able to write code like this without worrying about
all the murky areas of Unicode (scripting code not C):
forall w in words do
if reverse(w)=w then
println w,"is a palindrome"
fi
od
Here the set of words is a list of about 100,000 English language words,
stored in lower case.
How would this be changed to accommodate Unicode? Well, reverse() would
be useless for a start because of the many special cases (characters
changing depending on their position in a word for example).
Then, the dictionary of words would surely have to be upgraded to
include all the words of every language in the world. Why not? We
already have a giant character set of all the world's alphabets. Unicode
makes it harder to impose boundaries.
Further, the concept of a palindrome itself is probably meaningless with
most languages.
So, this little program either has to get unfeasibly complicated to meet
expectations, or it's argued out of existence altogether!
--
Bartc
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-05 13:35 -0600 |
| Message-ID | <n3ve4o$gi7$1@dont-email.me> |
| In reply to | #77882 |
On 05-Dec-15 07:26, BartC wrote: > On 05/12/2015 12:40, Malcolm McLean wrote: >> The concept of capitals is also pretty weird. We accept it normal >> because we grew up with it. It's just reality. Some operations >> either won't make sense or will be problematic if you don't use >> English or a closely related language. E.g. a French capital E >> cannot take an acute accent. So if we capitalise touché (a >> favourite French word) we get TOUCHE, and we put it back we get >> touche, touch, which doesn't mean the same thing - the >> reversibility rule which holds in English is broken. But that's a >> characteristic of French, not a problem created by Unicode. > > But an accented E exists (É). Would TOUCHÉ be meaningless to a > French speaker? Not using accents on upper case letters became _tolerated_ during the typewriter era because the dead-key trick for lower case letters would put an accent _inside_ a capital letter, but that was never "correct", and it's less tolerated now that computers can easily get it right. > I'm mainly familiar with Italian where accents are used to indicate > stress if it deviates from the rules, but it appears to be optional. Spanish does the same, but they're not optional; "esta" and "está" are two very different words, for instance. French doesn't have stressed syllables; instead, accents are used to change the _sound_ of vowels. It's rare for words to differ only in accents, but when they do, a fluent speaker can easily figure out the right word from the context, so missing accents aren't fatal. > Further, the concept of a palindrome itself is probably meaningless > with most languages. Well, it probably makes sense for alphabets, abjads and abugidas, but probably not for syllabaries or logographies. > So, this little program either has to get unfeasibly complicated to > meet expectations, or it's argued out of existence altogether! Or you just restrict the problem domain that you're addressing. Most calculators only work with Arabic numerals, not Roman ones, and I don't see anyone complaining about that. The latter is, at most, an interesting exercise for Programming 101 courses. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2015-12-06 02:23 +0000 |
| Message-ID | <n4067j$r68$1@speranza.aioe.org> |
| In reply to | #77910 |
Stephen Sprunk <stephen@sprunk.org> wrote: (snip) > French doesn't have stressed syllables; instead, accents are used to > change the _sound_ of vowels. It's rare for words to differ only in > accents, but when they do, a fluent speaker can easily figure out the > right word from the context, so missing accents aren't fatal. Are you sure that there are no fatal mispronounciations? Some years ago, the New York Times did a correction of mistaking poisonous snack for posionous snake (or the other way around). At the same time, I knew someone who spoke English such that I couldn't tell which one she was saying. It might not be good to get those wrong. -- glen
[toc] | [prev] | [next] | [standalone]
| From | Udyant Wig <udyantw@gmail.com> |
|---|---|
| Date | 2015-12-06 16:09 +0530 |
| Message-ID | <87610bx3na.fsf@rudiments.goosenet.in> |
| In reply to | #77937 |
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes: > Some years ago, the New York Times did a correction of mistaking > poisonous snack for posionous snake (or the other way around). At the > same time, I knew someone who spoke English such that I couldn't tell > which one she was saying. It might not be good to get those wrong. When this happens, most times it is hilarious. At other times, not so much. > -- glen -- Udyant Wig
[toc] | [prev] | [next] | [standalone]
| From | Xavier <zaz.colmant@free.fr> |
|---|---|
| Date | 2015-12-05 15:45 +0100 |
| Message-ID | <alpine.LNX.2.20.1512051532030.22558@cruxy2.freebox.fr> |
| In reply to | #77881 |
[Multipart message — attachments visible in raw view] — view raw
On Sat, 5 Dec 2015, Malcolm McLean wrote: > The concept of capitals is also pretty weird. We accept it normal > because we grew up with it. > It's just reality. Some operations either won't make sense or > will be problematic if you don't use English or a closely related > language. E.g. a French capital E cannot take an acute accent. > So if we capitalise touché (a favourite French word) we get > TOUCHE, and we put it back we get touche, touch, which doesn't > mean the same thing - the reversibility rule which holds in > English is broken. But that's a characteristic of French, not > a problem created by Unicode. > In French, you have to put accented letters on capital letters, dixit l'Académie française, « Le bon usage » from Grevisse, etc. Many French people are confused about this issue. It's probably due to the fact that typewriters couldn't properly do it properly. À bientôt, Xavier
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-05 07:42 -0800 |
| Message-ID | <bd9f56b1-a8fc-4faa-9ebe-36d0fb8b2213@googlegroups.com> |
| In reply to | #77885 |
On Saturday, December 5, 2015 at 2:45:49 PM UTC, Xavier wrote: > > In French, you have to put accented letters on capital letters, > dixit l'Académie française, « Le bon usage » from Grevisse, etc. > > Many French people are confused about this issue. It's probably > due to the fact that typewriters couldn't properly do it properly. > I was taught that a capital may not take an accent, in French. Of course that was only a schoolmaster's view. The academy may hold differently. so I stand corrected on that.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2015-12-05 16:32 -0800 |
| Message-ID | <lnwpsspgbt.fsf@kst-u.example.com> |
| In reply to | #77879 |
BartC <bc@freeuk.com> writes:
> On 05/12/2015 01:04, Steve Thompson wrote:
>> On Fri, Dec 04, 2015 at 11:46:52PM +0000, BartC wrote:
[...]
>>> (And then you have vast, sprawling 'alphabets' like Chinese which are
>>> words rather than the letters used to build the words.)
>>
>> So go tell the Chinese (and Japanese, and Thais, and ...) that they
>> should man-up and use a Western alphabet. Such schemes exist, after
>> all.
>
> No, they can use the same alphabets, but they don't put them all into
> one giant melting pot with every other.
So you want users of Asian writing systems to use their own separate
character set encodings, incompatible with the encodings used in
Western countries.
Because that way it's more convenient for you.
Sorry, but the decision has already been made. Unicode combines
most of the world's character sets into a single standard, and that's
not going to change. Complain all you like (preferably elsewhere);
it's not going to make any difference.
No doubt you have some ideas for how HTML web pages can include
both ASCII-encoded tag names and Chinese characters. Which means
there has to be a way to combine Latin and Chinese characters in
a single document anyway.
> Now, I can now longer write what had been trivial string handling
> routines such as capitalise, toupper, reverse, compare, left, leftn,
> etc etc. All are very well defined in ASCII, but would no longer be
> guaranteed to work with Unicode because most of the alphabets are so weird.
Too bad. The "giant melting pot" you worry about already exists, and is
used for most text transmitted over the Internet.
If you want to write software that only deals with ASCII, you're
absolutely free to do so, and you can do as much trivial string
handling as you like.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-05 18:11 -0800 |
| Message-ID | <09a2c396-9ccf-4385-8712-89e80cef6cef@googlegroups.com> |
| In reply to | #77929 |
On Sunday, December 6, 2015 at 12:32:39 AM UTC, Keith Thompson wrote: > > No doubt you have some ideas for how HTML web pages can include > both ASCII-encoded tag names and Chinese characters. Which means > there has to be a way to combine Latin and Chinese characters in > a single document anyway. > In fact mixed text is only going to get more common. There's a massive move to learn English in places like China and Eastern Europe, and you see things like Manchester United football shirts with Chinese adverts on them.
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-06 02:19 +0000 |
| Message-ID | <n405rh$8nu$1@dont-email.me> |
| In reply to | #77929 |
On 06/12/2015 00:32, Keith Thompson wrote: > BartC <bc@freeuk.com> writes: >> No, they can use the same alphabets, but they don't put them all into >> one giant melting pot with every other. > > So you want users of Asian writing systems to use their own separate > character set encodings, Well, they'd have the advantage of starting from code-point 0! Imagine what we'd think about ASCII (an offset version) starting at code-point 0x27F80 in some supplementary plane. > Because that way it's more convenient for you. Maybe to others too. (What's next for the Unicode architects, to combine all the programming languages of the world into one giant syntax? What could possibly go wrong?!) > No doubt you have some ideas for how HTML web pages can include > both ASCII-encoded tag names and Chinese characters. Which means > there has to be a way to combine Latin and Chinese characters in > a single document anyway. HTML pages can include all sorts of junk, of which the character encoding scheme, when you can locate any actual text, might be a small part. >> Now, I can now longer write what had been trivial string handling >> routines such as capitalise, toupper, reverse, compare, left, leftn, >> etc etc. All are very well defined in ASCII, but would no longer be >> guaranteed to work with Unicode because most of the alphabets are so weird. > > Too bad. The "giant melting pot" you worry about already exists, and is > used for most text transmitted over the Internet. > > If you want to write software that only deals with ASCII, you're > absolutely free to do so, and you can do as much trivial string > handling as you like. Yes, I can, and I can have my own scheme for dealing with the extra characters I need. Then it will need conversions for interacting with anything else. (But most likely I will end up using a 16-bit scheme that can represent the BMP.) I'm just saying it might have been a better idea for those large open-ended alphabets not to simply have been merged into and to have overwhelmed the set of compact alphabets. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-06 13:09 +0000 |
| Message-ID | <n41btc$nbu$1@dont-email.me> |
| In reply to | #77935 |
On 06/12/2015 02:19, BartC wrote: > On 06/12/2015 00:32, Keith Thompson wrote: >> If you want to write software that only deals with ASCII, you're >> absolutely free to do so, and you can do as much trivial string >> handling as you like. > > Yes, I can, and I can have my own scheme for dealing with the extra > characters I need. Then it will need conversions for interacting with > anything else. (But most likely I will end up using a 16-bit scheme that > can represent the BMP.) I spent 5 minutes thinking about an alternative to Unicode, and 10 minutes writing up a first draft, and 10 more minutes for a second draft (I won't bore you with the details). 30 minutes to invent a new Unicode; it wasn't hard! (Of course, it might need tweaking in actual use...) In 32-bit form, the two schemes (Unicode, and mine), aren't that different in that each character is allocated a dedicated code-point. But in mine, the large alphabets are tidily partitioned out of the way. A similar concept to code-pages, but 32K characters each and that can co-exist in the same text. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Martin Shobe <martin.shobe@yahoo.com> |
|---|---|
| Date | 2015-12-06 18:38 -0600 |
| Message-ID | <n42k9h$jp2$1@dont-email.me> |
| In reply to | #77977 |
On 12/6/2015 7:09 AM, BartC wrote: > On 06/12/2015 02:19, BartC wrote: >> On 06/12/2015 00:32, Keith Thompson wrote: > >>> If you want to write software that only deals with ASCII, you're >>> absolutely free to do so, and you can do as much trivial string >>> handling as you like. >> >> Yes, I can, and I can have my own scheme for dealing with the extra >> characters I need. Then it will need conversions for interacting with >> anything else. (But most likely I will end up using a 16-bit scheme that >> can represent the BMP.) > > I spent 5 minutes thinking about an alternative to Unicode, and 10 > minutes writing up a first draft, and 10 more minutes for a second draft > (I won't bore you with the details). > > 30 minutes to invent a new Unicode; it wasn't hard! (Of course, it might > need tweaking in actual use...) > > In 32-bit form, the two schemes (Unicode, and mine), aren't that > different in that each character is allocated a dedicated code-point. > But in mine, the large alphabets are tidily partitioned out of the way. > A similar concept to code-pages, but 32K characters each and that can > co-exist in the same text. > Can you give a link to it? Martin Shobe
[toc] | [prev] | [next] | [standalone]
| From | BartC <bc@freeuk.com> |
|---|---|
| Date | 2015-12-07 01:55 +0000 |
| Message-ID | <n42or0$op$1@dont-email.me> |
| In reply to | #78018 |
On 07/12/2015 00:38, Martin Shobe wrote: > On 12/6/2015 7:09 AM, BartC wrote: >> I spent 5 minutes thinking about an alternative to Unicode, and 10 >> minutes writing up a first draft, and 10 more minutes for a second draft >> (I won't bore you with the details). >> In 32-bit form, the two schemes (Unicode, and mine), aren't that >> different in that each character is allocated a dedicated code-point. >> But in mine, the large alphabets are tidily partitioned out of the way. >> A similar concept to code-pages, but 32K characters each and that can >> co-exist in the same text. > Can you give a link to it? It was only a dozen or so lines of text! Anyway I thought about it for another ten or twenty minutes and I have a revised scheme (the previous one included non-character escape codes within a string which I didn't like). Here's version 3: * In-memory representation, 32-bit version * All large alphabets are organised into sets of 64K characters, each is given an alphabet code (similar to a code-page, but bigger) * ASCII, small alphabets and symbols fit into a single special alphabet of 64K characters, and itself has an alphabet code of zero * Local character encodings for each alphabet are from 0 to 65535, which form the lsw of the 32-bit code. * The msw of the 32-bit code is the alphabet code. The complete code forms a unique identifier for the character (ignoring the possibilities of duplicates). The set of all character codes is sparse (not all alphabets will occupy 64K slots) * Where one only alphabet is known to be in use (alphabet 0 also counts as just one), then a 16-bit in-memory encoding can be used. (With a similar trick for 8-bit encoding when all character codes are 0 to 255.) * (This can also be done on a per-string basic, with the alphabet in use being an attribute associated with the string.) * (Possibly, the first 256 codes of alphabet 0, which are really general purpose characters, could be repeated at the start of all alphabets. But this creates the problem of multiple encodings of these characters.) -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-06 19:14 -0800 |
| Message-ID | <77d7b808-27fc-48aa-b24f-53f9636a6634@googlegroups.com> |
| In reply to | #78023 |
On Monday, December 7, 2015 at 1:56:13 AM UTC, Bart wrote: > On 07/12/2015 00:38, Martin Shobe wrote: > > On 12/6/2015 7:09 AM, BartC wrote: > > >> I spent 5 minutes thinking about an alternative to Unicode, and 10 > >> minutes writing up a first draft, and 10 more minutes for a second draft > >> (I won't bore you with the details). > > >> In 32-bit form, the two schemes (Unicode, and mine), aren't that > >> different in that each character is allocated a dedicated code-point. > >> But in mine, the large alphabets are tidily partitioned out of the way. > >> A similar concept to code-pages, but 32K characters each and that can > >> co-exist in the same text. > > > Can you give a link to it? > > It was only a dozen or so lines of text! > > Anyway I thought about it for another ten or twenty minutes and I have a > revised scheme (the previous one included non-character escape codes > within a string which I didn't like). Here's version 3: > You've got to consider the users. For simple English text you need ascii. The rest of Western Europe uses extended Latin, and annoyingly it won't quite fit into 8 bits. Eastern Europe uses Greek characters. Complex English text includes ascii, extended Latin, and Greek, and a few special symbols not included in ascii. At that point, we start to have the issue of what is markup and what is content. Is 1/2 the same content as a half symbol? You don't usually see Hebrew or Arabic in English texts, unless they are specifically dealing with Hebrew or Arabic as their subject, and it's not expected that the general reader will recognise the symbols. They're also right to left, and Arabic is cursive. However they have small alphabets. They also have markup systems for the vowels. Then you've got minority scripts with small alphabets, and the Far Eastern languages with massive character sets, and the Indian languages. Again, virtually all of the symbols are meaningless to the average English reader, but it's not usually true the other way round - Far Eastern and Indian readers are likely to know the English characters and embed English text in their documents. Finally you've got marginal scripts which are very much special purpose, like Linear B or Klingon. The former is serious but not used for communication, only for representing a tiny corpus of archaeologically recovered literature, and the latter is really just for demonstrating the universality of the encoding system. Those are your levels of support, from an Anglo-centric perspective.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2015-12-07 13:53 +0000 |
| Message-ID | <87d1ui1i2i.fsf@bsb.me.uk> |
| In reply to | #78028 |
Malcolm McLean <malcolm.mclean5@btinternet.com> writes: <snip> > You've got to consider the users. > For simple English text you need ascii. The rest of Western Europe > uses extended Latin, and annoyingly it won't quite fit into 8 bits. > Eastern Europe uses Greek characters. Complex English text includes > ascii, extended Latin, and Greek, and a few special symbols not > included in ascii. You say "You've got to consider the users" but you are not considering them. You are classifying texts by language, not be what texts users want to read or write. Users in Western Europe often want to use non-Latin scripts. <snip> -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-07 06:31 -0800 |
| Message-ID | <a540c923-a038-4c5a-815d-58004e3e2551@googlegroups.com> |
| In reply to | #78077 |
On Monday, December 7, 2015 at 1:53:23 PM UTC, Ben Bacarisse wrote: > Malcolm McLean <malcolm.mclean5@btinternet.com> writes: > > You say "You've got to consider the users" but you are not considering > them. You are classifying texts by language, not be what texts users > want to read or write. Users in Western Europe often want to use > non-Latin scripts. > Only Greek, and in the special case where the non-Latin script language or text is itself the subject of the material. We don't embed Hebrew or Arabic in normal English texts, as the general reader is not considered to be sufficiently familiar with the symbols. The sole exception is the mathematical aleph for infinity. Greek has also died out, except for mathematical use, but in older writing aimed at an educated readership, you do often see Greek words. It was assumed that a gentleman would have been taught Greek at school and it was insulting to translate. (That's actually still style guide for Oxford examination scripts, candidates are told to keep foreign language quotations in the original).
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2015-12-07 21:22 +0000 |
| Message-ID | <87a8pm9cnk.fsf@bsb.me.uk> |
| In reply to | #78087 |
Malcolm McLean <malcolm.mclean5@btinternet.com> writes: > On Monday, December 7, 2015 at 1:53:23 PM UTC, Ben Bacarisse wrote: >> Malcolm McLean <malcolm.mclean5@btinternet.com> writes: >> >> You say "You've got to consider the users" but you are not considering >> them. You are classifying texts by language, not be what texts users >> want to read or write. Users in Western Europe often want to use >> non-Latin scripts. >> > Only Greek, and in the special case where the non-Latin script language > or text is itself the subject of the material. hwɒt ˈplanɪt ˈɑːreɪ juː ɒn? My previous local authority frequently sent me notices where at least one of the scripts was unknown to me, and the university I worked at produced promotional materials in many scripts. At least ¾ of them were marked ♼, and many—but not all—used special punctuation (see pages 22–24). Users in Western Europe do not always use only Latin scripts. <snip> -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Sprunk <stephen@sprunk.org> |
|---|---|
| Date | 2015-12-07 15:34 -0600 |
| Message-ID | <n44trl$lgd$1@dont-email.me> |
| In reply to | #78087 |
On 07-Dec-15 08:31, Malcolm McLean wrote: > Ben Bacarisse wrote: >> You say "You've got to consider the users" but you are not >> considering them. You are classifying texts by language, not be >> what texts users want to read or write. Users in Western Europe >> often want to use non-Latin scripts. > > Only Greek, and in the special case where the non-Latin script > language or text is itself the subject of the material. Western Europeans haven't discovered emojis yet? They don't use mathematical or scientific symbols? There are no translators, immigrants or diplomats who know a non-Latin language? There are no schools that teach non-Latin languages? Western Europe sounds like a backward place compared to the Americas; maybe the old joke about monolingual people needs to be revised. > We don't embed Hebrew or Arabic in normal English texts, as the > general reader is not considered to be sufficiently familiar with the > symbols. Are there no people in Western Europe who know Hebrew or Arabic? I know they slaughtered several million Jews and shipped millions more to the US and Israel, so maybe there are none left, but I'm pretty sure I've read complaints about Muslim immigrants in Western Europe, many of whom presumably know Arabic, if only for reading the Quran. > The sole exception is the mathematical aleph for infinity. Different glyph, different code point, different bidi behavior. > Greek has also died out, Outside of Greece, yes, aside from immigrants and such--and aside from Greek script being on the Euro. > except for mathematical use, Same glyphs, different code points, different bidi behavior. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2015-12-07 16:36 -0800 |
| Message-ID | <1649f717-fbd3-4043-beb5-9a23e38f1ab3@googlegroups.com> |
| In reply to | #78138 |
On Monday, December 7, 2015 at 9:34:17 PM UTC, Stephen Sprunk wrote: > On 07-Dec-15 08:31, Malcolm McLean wrote: > > Ben Bacarisse wrote: > >> You say "You've got to consider the users" but you are not > >> considering them. You are classifying texts by language, not be > >> what texts users want to read or write. Users in Western Europe > >> often want to use non-Latin scripts. > > > > Only Greek, and in the special case where the non-Latin script > > language or text is itself the subject of the material. > > Western Europeans haven't discovered emojis yet? They don't use > mathematical or scientific symbols? There are no translators, > immigrants or diplomats who know a non-Latin language? There are no > schools that teach non-Latin languages? > Obviously if you are publishing a primer on Mandarin for English schools, you will need Chinese text. But that's the exceptional case - the script or text itself is the subject of the material. An emoji is a marginal case. > > Are there no people in Western Europe who know Hebrew or Arabic? I know > they slaughtered several million Jews and shipped millions more to the > US and Israel, so maybe there are none left, but I'm pretty sure I've > read complaints about Muslim immigrants in Western Europe, many of whom > presumably know Arabic, if only for reading the Quran. > Hebrew words like "kosher" have entered English, together with much older words like "allelujah" or "David". But they never appear in Hebrew script. Similarly Arabic words like "fatwah" or "mujihadeen". That's not true of Greek words. Modern authors will tend to transliterate, (e.g. hubris) but Victorian authors often kept them in Greek script. The exception, again, is when a Biblical text is itself is the subject of the English work. So a scholarly or Jewish text in English, on the subject of the Hebrew Bible, may contain embedded Hebrew script.
[toc] | [prev] | [next] | [standalone]
Page 5 of 8 — ← Prev page 1 2 3 4 [5] 6 7 8 Next page →
Back to top | Article view | comp.lang.c
csiph-web