Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.basic.visual.misc > #894

Re: How to handle LARGE UTF-8 file

From "Farnsworth" <nospam@nospam.com>
Newsgroups comp.lang.basic.visual.misc
Subject Re: How to handle LARGE UTF-8 file
Date 2012-03-08 23:32 -0500
Organization Aioe.org NNTP Server
Message-ID <jjc14v$v0i$1@speranza.aioe.org> (permalink)
References <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> <jjb6ma$4nq$1@speranza.aioe.org> <jjb6uu$5h2$1@speranza.aioe.org> <17156310.66.1331257903071.JavaMail.geo-discussion-forums@vbkc1>

Show all headers | View raw


stevegdula@yahoo.com wrote:
> Farnsworth,
>
> Your first reply, byte order actually seems to match my sample data.
>
> ASCII(254)
> UTF-8 Two Byte Representation: 1100 0011 1011 1110 &HC3BE
>
> I haven't currently digested the detailed UTF-8 Wiki explanation yet
> and I hopefully won't have to unless I end up needing to write my own
> UTF-8 record decoder.
>
> I am hoping to merely strip out the Byte Order Mark(BOM)
> &HEFBBBF,inspect for end of record &H0D0A (one line = one record),
> and pass that to the afore mentioned API call.

If you look at the list at Wiki article, you notice each of the extra bytes 
is always >= 128, so you can read a large chunk, 1MB+, and you would know if 
you need to read few extra bytes or not if the last byte is >=128.

As for CR LF, InStrB can be used for byte arrays. Example:

Debug.Print InStrB(arr, vbCrLf)

Finally, check ParseCSV01 routine at this page to parse the lines:

http://www.xbeat.net/vbspeed/c_ParseCSV.php

Back to comp.lang.basic.visual.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
  Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
    Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
  Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
    Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
        Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
        Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
          Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
            Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700

csiph-web