Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.basic.visual.misc > #889

Re: How to handle LARGE UTF-8 file

From "Bob Butler" <bob_butler@cox.invalid>
Newsgroups comp.lang.basic.visual.misc
Subject Re: How to handle LARGE UTF-8 file
Date 2012-03-08 10:13 -0800
Organization A noiseless patient Spider
Message-ID <jjat0a$j6n$1@dont-email.me> (permalink)
References <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> <jjao9g$sis$1@speranza.aioe.org>

Show all headers | View raw


"Deanna Earley" <dee.earley@icode.co.uk> wrote in message 
news:jjao9g$sis$1@speranza.aioe.org...
> On 08/03/2012 16:05, stevegdula@yahoo.com wrote:
>> Hi folks,
>>
>> I recently had a large text file approaching 7GB in size dropped on
>> me.  The contents of which are supposed to be delimited text field
>> data from a database.  It's prohibitive size will not let me open it
>> in a robust text editor so I've just sampled the first 32K out of it
>> via opening it as a Binary file with 'Get&  Put'.  This at least
>> allowed me to see what I was dealing with.
>>
>> The little 32K subset of text turned out to be Encoded UTF-8 text
>> with the EF BB BF header and is comprised of some 166 fields of
>> delimited data.  At least some subset of this data will eventually
>> need to be loaded into an older legal database which only supports
>> ANSI text.
>
> While the data may be UTF-8 format, will it actually contain any "non 
> ascii" text?
> UTF-8 and ASCII are identical for the first 128 code points.
>
> You can check this be reading chunks (into a byte array) and scanning for 
> values > 127.

If it does have any special characters you should be able to leverage the 
WideCharToMultiByte API call to convert from UTF8 to Unicode and then figure 
out what to do with the special characters for inserting into the database.

Back to comp.lang.basic.visual.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
  Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
    Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
  Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
    Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
      Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
        Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
        Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
          Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
            Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700

csiph-web