Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.basic.visual.misc > #889
| From | "Bob Butler" <bob_butler@cox.invalid> |
|---|---|
| Newsgroups | comp.lang.basic.visual.misc |
| Subject | Re: How to handle LARGE UTF-8 file |
| Date | 2012-03-08 10:13 -0800 |
| Organization | A noiseless patient Spider |
| Message-ID | <jjat0a$j6n$1@dont-email.me> (permalink) |
| References | <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> <jjao9g$sis$1@speranza.aioe.org> |
"Deanna Earley" <dee.earley@icode.co.uk> wrote in message news:jjao9g$sis$1@speranza.aioe.org... > On 08/03/2012 16:05, stevegdula@yahoo.com wrote: >> Hi folks, >> >> I recently had a large text file approaching 7GB in size dropped on >> me. The contents of which are supposed to be delimited text field >> data from a database. It's prohibitive size will not let me open it >> in a robust text editor so I've just sampled the first 32K out of it >> via opening it as a Binary file with 'Get& Put'. This at least >> allowed me to see what I was dealing with. >> >> The little 32K subset of text turned out to be Encoded UTF-8 text >> with the EF BB BF header and is comprised of some 166 fields of >> delimited data. At least some subset of this data will eventually >> need to be loaded into an older legal database which only supports >> ANSI text. > > While the data may be UTF-8 format, will it actually contain any "non > ascii" text? > UTF-8 and ASCII are identical for the first 128 code points. > > You can check this be reading chunks (into a byte array) and scanning for > values > 127. If it does have any special characters you should be able to leverage the WideCharToMultiByte API call to convert from UTF8 to Unicode and then figure out what to do with the special characters for inserting into the database.
Back to comp.lang.basic.visual.misc | Previous | Next — Previous in thread | Next in thread | Find similar
How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700
csiph-web