Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > microsoft.public.scripting.vbscript > #12263
| From | "Mayayana" <mayayana@invalid.nospam> |
|---|---|
| Newsgroups | microsoft.public.scripting.vbscript |
| Subject | Re: file.ReadAll - another quirk |
| Date | 2019-11-22 18:40 -0500 |
| Organization | A noiseless patient Spider |
| Message-ID | <qr9rm7$qmh$1@dont-email.me> (permalink) |
| References | (4 earlier) <qr6t4k$kht$1@gioia.aioe.org> <1133f73l5pkpq$.1czqy5zm4hvoi.dlg@40tude.net> <qr8t33$1j3e$1@gioia.aioe.org> <qr8tqi$tsl$1@dont-email.me> <qr96d7$v89$1@gioia.aioe.org> |
"R.Wieser" <address@not.available> wrote | I'm still a bit hazy about the names of | the different multi-byte encodings. I'm using the Windows terms, which are not always the same as what other people use. ASCII - bytes 0-127, which are always the same, in any encoding, but are paired with nulls in unicode. ANSI - bytes 0-255, in which 128+ are rendered according to the local codepage while 0-127 match ASCII. So English speakers (and I think most Europeans) get specific symbols for 128+, but Russians and Turks, for example, get characters in their language. The Asian multi-byte languages are the only exception. They can have more than 1 byte per character in their codepage. That's how I ended up using FSO for VBS binary operations. If it's handled carefully, and you're not using an Asian multibyte codepage, then it works. It really doesn't matter whether Windows thinks the byte represents a dollar sign or an Arabic character. Unicode - 2-byte characters as used in Windows, which may not be the same as all unicode 16 and is not the same as unicode-32. As far as I know, in Windows generally, only ANSI and unicode are relevant. Win32 is using unicode under the covers but provides ANSI as default for VB, VBS, older versions of notepad, etc. (As you may know, in VB it's actually not easy to access the unicode version. One must use the string pointer directly because when the variable is referenced there's an automatic conversion to/from unicode.) It gets confusing because "multi-byte" sounds like unicode but instead refers to ANSI encoding which *could* use multiple bytes. (You probably know that, too, but I'm not sure how many others do.) UTF-8 - That one is fairly new to me. I've heard it's now the standard for plain text on Linux. And it's become the standard for plain text in webpages. For obvious reasons: The vast majority of webpages are valid ASCII, anyway. And ASCII matches the 0-127 in UTF-8. So there's no upset in switching, except for the people who want to do things like use curly braces in UTF-8, which render as gibberish in ANSI. I added UTF-8 support to my own HTML editor since it's now standard. The editor uses a RichEdit window. But interestingly, support for UTF-8 in RichEdit seems to be new and is almost entirely undocumented. I just happened to come across a note somewhere. It wasn't listed in the official docs. But I tried the sample code I found, to load a file as UTF-8, and it worked. Of course, that's only partially useful. If the chosen font is not unicode it makes no difference! And I think the only unicode font I have is MS Arial. I like Verdana for coding. So I don't get the benefit of my own UTF-8 support. :) | > But I very much doubt something like 80 CE 32 would be | > translated to the u-16 equivalent. It just comes through | > (fortunately) as 3 ANSI characters. | | I thought that the above MultiByteToWideChar call would take care of that. | Though there is a possibility that the "readall" code checks for a UTF-8 | header (EF BB BF) before setting a flag to do so. (I really should | re-examine the disassembled code some time ...) | First parameter is codepage. Surprisingly, UTF-8 is one possibility there. But I'd guess they're using ANSI codepage. That's the way it seems to come through and if they used UTF-8 it would potentially change the number of characters when rendered as ANSI (or what the help is calling ASCII.)
Back to microsoft.public.scripting.vbscript | Previous | Next — Previous in thread | Next in thread | Find similar
file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-20 12:54 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-20 08:39 -0500
Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 00:00 +0700
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 12:50 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-21 20:39 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 15:04 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-21 21:46 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 16:31 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 09:23 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 09:14 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 15:58 +0100
Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 19:13 +0700
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 15:47 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 10:10 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 18:37 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 18:40 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-23 09:19 +0100
Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-23 10:09 -0500
Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-23 17:36 +0100
Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 19:10 +0700
csiph-web