Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.basic.visual.misc > #893

Re: How to handle LARGE UTF-8 file

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From stevegdula@yahoo.com
Newsgroups comp.lang.basic.visual.misc
Subject Re: How to handle LARGE UTF-8 file
Date Thu, 8 Mar 2012 17:51:43 -0800 (PST)
Organization http://groups.google.com
Lines 32
Message-ID <17156310.66.1331257903071.JavaMail.geo-discussion-forums@vbkc1> (permalink)
References <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> <jjb6ma$4nq$1@speranza.aioe.org> <jjb6uu$5h2$1@speranza.aioe.org>
NNTP-Posting-Host 67.167.18.95
Mime-Version 1.0
Content-Type text/plain; charset=ISO-8859-1
X-Trace posting.google.com 1331257903 26023 127.0.0.1 (9 Mar 2012 01:51:43 GMT)
X-Complaints-To groups-abuse@google.com
NNTP-Posting-Date Fri, 9 Mar 2012 01:51:43 +0000 (UTC)
In-Reply-To <jjb6uu$5h2$1@speranza.aioe.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=67.167.18.95; posting-account=6DX0cgkAAAAoDsfrvrkw7olQC-OfHI_P
User-Agent G2/1.0
X-Received-Bytes 2091
Xref csiph.com comp.lang.basic.visual.misc:893

Show key headers only | View raw


Farnsworth,

Your first reply, byte order actually seems to match my sample data.

ASCII(254)
UTF-8 Two Byte Representation: 1100 0011 1011 1110 &HC3BE

I haven't currently digested the detailed UTF-8 Wiki explanation yet and I hopefully won't have to unless I end up needing to write my own UTF-8 record decoder.

I am hoping to merely strip out the Byte Order Mark(BOM) &HEFBBBF,inspect for end of record &H0D0A (one line = one record), and pass that to the afore mentioned API call.

Thanks,

~Steve

On Thursday, March 8, 2012 3:05:30 PM UTC-6, Farnsworth wrote:
> Farnsworth wrote:
> > Besides what others suggested, check this link to see how the
> > characters are encoded:
> >
> > http://en.wikipedia.org/wiki/Utf-8#Description
> >
> > So ASCII 254(1111 1110) =
> >
> > Byte 1: 110 00011 = &HC3
> > Byte 2: 10 111110 = &HBE
> 
> I made a mistake in the byte order, so it should be the other way around:
> 
> Byte 1: 110 11110 = &HDE
> Byte 2: 10 000111 = &H87

Back to comp.lang.basic.visual.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
  Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
    Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
  Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
    Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
        Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
        Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
          Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
            Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700

csiph-web