Groups | Search | Server Info | Login | Register

Re: Recent history of vi

From	Johnny Billquist <bqt@softjar.se>
Newsgroups	comp.os.linux.misc, alt.folklore.computers
Subject	Re: Recent history of vi
Date	2025-12-15 10:38 +0100
Organization	MGT Consulting
Message-ID	<10hoktv$o38$1@news.misty.com> (permalink)
References	(6 earlier) <slrn10i3vnc.3n8d7.als@mordor.angband.thangorodrim.de> <10ga6r1$7ph$1@news.misty.com> <slrn10ik3r5.2dppt.als@mordor.angband.thangorodrim.de> <10gpatq$jpt$3@news.misty.com> <10gpi2s$3cij2$2@dont-email.me>

Cross-posted to 2 groups.

Show all headers | View raw

On 2025-12-03 15:39, Peter Flass wrote:
> On 12/3/25 05:37, Johnny Billquist wrote:
>> On 2025-11-28 22:08, Alexander Schreiber wrote:
>>> Johnny Billquist <bqt@softjar.se> wrote:
>>>> Just because there was a problem it don't follow that Unicode was a 
>>>> good
>>>> solution.
>>>
>>> I'm not claiming it is a good solution, but it is the solution we 
>>> ended up
>>> with that reasonably covers a lot of the problem space. Given that:
>>>   - it covers a wide and very irregular problem space
>>>   - it
>>>   - it is, due to the problem scope, a design by committee
>>> ending with a solution being bit of a mess is hardly avoidable.
>>>
>>> It has the property of "working well enough most of the time", which is
>>> already a big impediment to anyone spending the time, money and brains
>>> in order to:
>>>   - come up with a New And Improved Design That Surely Has No Warts
>>>   - establish it as the new standard
>>>
>>> Honestly: not happening.
>>
>> I know that Unicode is here to stay. Said as much before. But it has 
>> introduced a whole range of problems that people tend to pretend don't 
>> exist. The most immediate one coming to my mind are all kind of 
>> scammers creating fake domains to phish stuff. Using known, trusted 
>> company names, but letters replaced by things that look visually 
>> equivalent, but actually are other characters, and then through those 
>> domains fool people to give information, such as passwords, account 
>> numbers, money, and god knows what else.
> 
> I discovered this when I tried to set up spam filters and couldn't 
> figure out why they weren't working.

Yeah. It's nasty...

>> A big part of the problem is that Unicode don't even seem to have 
>> known what problem is was supposed to solve. Was it about representing 
>> different characters that have different meanings? Was it about 
>> representing same characters but with different visual effects? Was it 
>> supposed to be some kind of generic system to modify characters 
>> through some clever system design?
>> As it is, it's sortof all of these, but none of them properly.
> 
> It's supposed to be about the meanings of the characters. Capital 'A' in 
> any font is the same Unicode character, but two characters that look 
> identical but have different meanings are two.

Except it isn't. You have several codepoints for capital 'A'.
How about U+1D00 - LATIN LETTER SMALL CAPITAL A ?
Or U+ff21 - FULLWIDTH LATIN CAPITAL LETTER A

And then you have the ridiculous mathematical symbol 'A', which Unicode 
have defined in normal, italic, bold, italic+bold, script, script bold, 
script italic, script italic bold... And it goes on with fraktur 
versions, sans-serif versions, monospace versions. (U+1D400 for a start 
down that rabbit hole...)

Heck, you even have U+1F110 - PARENTHESIZED LATIN CAPITAL LETTER A.
So a latin capital A with parenthesises around it. That needed its own 
code point???

So, you can see that even some font selections creeped in to Unicode, as 
well as rendering of things in bold or italic. And all kind of other 
crazy variants and details.

> The biggest problem I have with any Unicode representation except (I 
> think) UTF-32 is that a program has no way of knowing how long a string 
> is without encoding/decoding it. Given a string of characters in some 
> codepage, how many bytes does it occupy when converted to UTF-8? Given a 
> UTF-8 character string, how many character positions does it occupy, 
> say, for example, when displayed on a screen?

True. However, that has nothing to do with Unicode as such, but the 
UTF-8 encoding of it.

   Johnny

Back to comp.os.linux.misc | Previous | Next — Next in thread | Find similar

Thread

Re: Recent history of vi Johnny Billquist <bqt@softjar.se> - 2025-12-15 10:38 +0100
  Re: Recent history of vi antispam@fricas.org (Waldek Hebisch) - 2025-12-16 02:20 +0000
    Re: Recent history of vi Lawrence D’Oliveiro <ldo@nz.invalid> - 2025-12-16 02:52 +0000
    Re: Recent history of vi Nuno Silva <nunojsilva@invalid.invalid> - 2025-12-16 11:53 +0000
      Re: Recent history of vi Richard Kettlewell <invalid@invalid.invalid> - 2025-12-16 17:42 +0000
    Re: Recent history of vi Johnny Billquist <bqt@softjar.se> - 2025-12-17 10:39 +0100
  Re: Recent history of vi "Carlos E.R." <robin_listas@es.invalid> - 2025-12-16 23:34 +0100
    Re: Recent history of vi Lawrence D’Oliveiro <ldo@nz.invalid> - 2025-12-17 01:49 +0000

csiph-web