Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.basic.visual.misc > #888
| From | Deanna Earley <dee.earley@icode.co.uk> |
|---|---|
| Newsgroups | comp.lang.basic.visual.misc |
| Subject | Re: How to handle LARGE UTF-8 file |
| Date | 2012-03-08 16:55 +0000 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <jjao9g$sis$1@speranza.aioe.org> (permalink) |
| References | <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> |
On 08/03/2012 16:05, stevegdula@yahoo.com wrote: > Hi folks, > > I recently had a large text file approaching 7GB in size dropped on > me. The contents of which are supposed to be delimited text field > data from a database. It's prohibitive size will not let me open it > in a robust text editor so I've just sampled the first 32K out of it > via opening it as a Binary file with 'Get& Put'. This at least > allowed me to see what I was dealing with. > > The little 32K subset of text turned out to be Encoded UTF-8 text > with the EF BB BF header and is comprised of some 166 fields of > delimited data. At least some subset of this data will eventually > need to be loaded into an older legal database which only supports > ANSI text. While the data may be UTF-8 format, will it actually contain any "non ascii" text? UTF-8 and ASCII are identical for the first 128 code points. You can check this be reading chunks (into a byte array) and scanning for values > 127. -- Deanna Earley (dee.earley@icode.co.uk) i-Catcher Development Team http://www.icode.co.uk/icatcher/ iCode Systems (Replies direct to my email address will be ignored. Please reply to the group.)
Back to comp.lang.basic.visual.misc | Previous | Next — Previous in thread | Next in thread | Find similar
How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700
csiph-web