Groups | Search | Server Info | Login | Register
Groups > comp.os.linux.misc > #79144
| From | Johnny Billquist <bqt@softjar.se> |
|---|---|
| Newsgroups | comp.os.linux.misc, alt.folklore.computers |
| Subject | Re: Recent history of vi |
| Date | 2025-12-15 10:38 +0100 |
| Organization | MGT Consulting |
| Message-ID | <10hoktv$o38$1@news.misty.com> (permalink) |
| References | (6 earlier) <slrn10i3vnc.3n8d7.als@mordor.angband.thangorodrim.de> <10ga6r1$7ph$1@news.misty.com> <slrn10ik3r5.2dppt.als@mordor.angband.thangorodrim.de> <10gpatq$jpt$3@news.misty.com> <10gpi2s$3cij2$2@dont-email.me> |
Cross-posted to 2 groups.
On 2025-12-03 15:39, Peter Flass wrote: > On 12/3/25 05:37, Johnny Billquist wrote: >> On 2025-11-28 22:08, Alexander Schreiber wrote: >>> Johnny Billquist <bqt@softjar.se> wrote: >>>> Just because there was a problem it don't follow that Unicode was a >>>> good >>>> solution. >>> >>> I'm not claiming it is a good solution, but it is the solution we >>> ended up >>> with that reasonably covers a lot of the problem space. Given that: >>> - it covers a wide and very irregular problem space >>> - it >>> - it is, due to the problem scope, a design by committee >>> ending with a solution being bit of a mess is hardly avoidable. >>> >>> It has the property of "working well enough most of the time", which is >>> already a big impediment to anyone spending the time, money and brains >>> in order to: >>> - come up with a New And Improved Design That Surely Has No Warts >>> - establish it as the new standard >>> >>> Honestly: not happening. >> >> I know that Unicode is here to stay. Said as much before. But it has >> introduced a whole range of problems that people tend to pretend don't >> exist. The most immediate one coming to my mind are all kind of >> scammers creating fake domains to phish stuff. Using known, trusted >> company names, but letters replaced by things that look visually >> equivalent, but actually are other characters, and then through those >> domains fool people to give information, such as passwords, account >> numbers, money, and god knows what else. > > I discovered this when I tried to set up spam filters and couldn't > figure out why they weren't working. Yeah. It's nasty... >> A big part of the problem is that Unicode don't even seem to have >> known what problem is was supposed to solve. Was it about representing >> different characters that have different meanings? Was it about >> representing same characters but with different visual effects? Was it >> supposed to be some kind of generic system to modify characters >> through some clever system design? >> As it is, it's sortof all of these, but none of them properly. > > It's supposed to be about the meanings of the characters. Capital 'A' in > any font is the same Unicode character, but two characters that look > identical but have different meanings are two. Except it isn't. You have several codepoints for capital 'A'. How about U+1D00 - LATIN LETTER SMALL CAPITAL A ? Or U+ff21 - FULLWIDTH LATIN CAPITAL LETTER A And then you have the ridiculous mathematical symbol 'A', which Unicode have defined in normal, italic, bold, italic+bold, script, script bold, script italic, script italic bold... And it goes on with fraktur versions, sans-serif versions, monospace versions. (U+1D400 for a start down that rabbit hole...) Heck, you even have U+1F110 - PARENTHESIZED LATIN CAPITAL LETTER A. So a latin capital A with parenthesises around it. That needed its own code point??? So, you can see that even some font selections creeped in to Unicode, as well as rendering of things in bold or italic. And all kind of other crazy variants and details. > The biggest problem I have with any Unicode representation except (I > think) UTF-32 is that a program has no way of knowing how long a string > is without encoding/decoding it. Given a string of characters in some > codepage, how many bytes does it occupy when converted to UTF-8? Given a > UTF-8 character string, how many character positions does it occupy, > say, for example, when displayed on a screen? True. However, that has nothing to do with Unicode as such, but the UTF-8 encoding of it. Johnny
Back to comp.os.linux.misc | Previous | Next — Next in thread | Find similar
Re: Recent history of vi Johnny Billquist <bqt@softjar.se> - 2025-12-15 10:38 +0100
Re: Recent history of vi antispam@fricas.org (Waldek Hebisch) - 2025-12-16 02:20 +0000
Re: Recent history of vi Lawrence D’Oliveiro <ldo@nz.invalid> - 2025-12-16 02:52 +0000
Re: Recent history of vi Nuno Silva <nunojsilva@invalid.invalid> - 2025-12-16 11:53 +0000
Re: Recent history of vi Richard Kettlewell <invalid@invalid.invalid> - 2025-12-16 17:42 +0000
Re: Recent history of vi Johnny Billquist <bqt@softjar.se> - 2025-12-17 10:39 +0100
Re: Recent history of vi "Carlos E.R." <robin_listas@es.invalid> - 2025-12-16 23:34 +0100
Re: Recent history of vi Lawrence D’Oliveiro <ldo@nz.invalid> - 2025-12-17 01:49 +0000
csiph-web