Groups > microsoft.public.scripting.vbscript > #12263

Re: file.ReadAll - another quirk

From	"Mayayana" <mayayana@invalid.nospam>
Newsgroups	microsoft.public.scripting.vbscript
Subject	Re: file.ReadAll - another quirk
Date	2019-11-22 18:40 -0500
Organization	A noiseless patient Spider
Message-ID	<qr9rm7$qmh$1@dont-email.me> (permalink)
References	(4 earlier) <qr6t4k$kht$1@gioia.aioe.org> <1133f73l5pkpq$.1czqy5zm4hvoi.dlg@40tude.net> <qr8t33$1j3e$1@gioia.aioe.org> <qr8tqi$tsl$1@dont-email.me> <qr96d7$v89$1@gioia.aioe.org>

Show all headers | View raw

"R.Wieser" <address@not.available> wrote

| I'm still a bit hazy about the names of
| the different multi-byte encodings.

  I'm using the Windows terms, which are not always
the same as what other people use.

ASCII - bytes 0-127, which are always the same,
in any encoding, but are paired with nulls in unicode.

ANSI - bytes 0-255, in which 128+ are rendered
according to the local codepage while 0-127 match
ASCII. So English speakers (and I think most
Europeans) get specific symbols for 128+, but
Russians and Turks, for example, get characters in
their language.
  The Asian multi-byte languages are the only
exception. They can have more than 1 byte per
character in their codepage.

  That's how I ended up using FSO for VBS binary
operations. If it's handled carefully, and you're not
using an Asian multibyte codepage, then it works.
It really doesn't matter whether Windows thinks the
byte represents a dollar sign or an Arabic character.

Unicode - 2-byte characters as used in Windows,
which may not be the same as all unicode 16 and
is not the same as unicode-32.

  As far as I know, in Windows generally, only ANSI
and unicode are relevant. Win32 is using unicode
under the covers but provides ANSI as default for
VB, VBS, older versions of notepad, etc. (As you may
know, in VB it's actually not easy to access the unicode
version. One must use the string pointer directly because
when the variable is referenced there's an automatic
conversion to/from unicode.)

   It gets confusing because
"multi-byte" sounds like unicode but instead refers
to ANSI encoding which *could* use multiple bytes.
(You probably know that, too, but I'm not sure how
many others do.)

UTF-8 - That one is fairly new to me. I've heard
it's now the standard for plain text on Linux. And it's
become the standard for plain text in webpages. For
obvious reasons: The vast majority of webpages are
valid ASCII, anyway. And ASCII matches the 0-127
in UTF-8. So there's no upset in switching, except for
the people who want to do things like use curly braces
in UTF-8, which render as gibberish in ANSI.

  I added UTF-8 support to my own HTML editor since
it's now standard. The editor uses a RichEdit window.
But interestingly, support for UTF-8 in RichEdit seems
to be new and is almost entirely undocumented. I just
happened to come across a note somewhere. It wasn't
listed in the official docs. But I tried the sample code
I found, to load a file as UTF-8, and it worked.

  Of course, that's only partially useful. If the chosen font
is not unicode it makes no difference! And I think the only
unicode font I have is MS Arial. I like Verdana for coding.
So I don't get the benefit of my own UTF-8 support. :)


| > But I very much doubt something like 80 CE 32 would be
| > translated to the u-16 equivalent. It just comes through
| > (fortunately) as 3 ANSI characters.
|
| I thought that the above MultiByteToWideChar call would take care of that.
| Though there is a possibility that the "readall" code checks for a UTF-8
| header (EF BB BF) before setting a flag to do so.   (I really should
| re-examine the disassembled code some time ...)
|

  First parameter is codepage. Surprisingly, UTF-8
is one possibility there. But I'd guess they're using
ANSI codepage. That's the way it seems to come through
and if they used UTF-8 it would potentially change
the number of characters when rendered as ANSI
(or what the help is calling ASCII.)

Back to microsoft.public.scripting.vbscript | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-20 12:54 +0100
  Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-20 08:39 -0500
  Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 00:00 +0700
    Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 12:50 -0500
    Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-21 20:39 +0100
      Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 15:04 -0500
        Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-21 21:46 +0100
          Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-21 16:31 -0500
            Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 09:23 +0100
              Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 09:14 -0500
                Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 15:58 +0100
          Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 19:13 +0700
            Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 15:47 +0100
              Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 10:10 -0500
                Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-22 18:37 +0100
                Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-22 18:40 -0500
                Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-23 09:19 +0100
                Re: file.ReadAll - another quirk "Mayayana" <mayayana@invalid.nospam> - 2019-11-23 10:09 -0500
                Re: file.ReadAll - another quirk "R.Wieser" <address@not.available> - 2019-11-23 17:36 +0100
      Re: file.ReadAll - another quirk JJ <jj4public@vfemail.net> - 2019-11-22 19:10 +0700

csiph-web