Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.basic.visual.misc > #887
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
|---|---|
| From | stevegdula@yahoo.com |
| Newsgroups | comp.lang.basic.visual.misc |
| Subject | How to handle LARGE UTF-8 file |
| Date | Thu, 8 Mar 2012 08:05:04 -0800 (PST) |
| Organization | http://groups.google.com |
| Lines | 40 |
| Message-ID | <29897294.1014.1331222704653.JavaMail.geo-discussion-forums@vblb5> (permalink) |
| NNTP-Posting-Host | 4.28.51.130 |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | posting.google.com 1331222704 19471 127.0.0.1 (8 Mar 2012 16:05:04 GMT) |
| X-Complaints-To | groups-abuse@google.com |
| NNTP-Posting-Date | Thu, 8 Mar 2012 16:05:04 +0000 (UTC) |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=4.28.51.130; posting-account=6DX0cgkAAAAoDsfrvrkw7olQC-OfHI_P |
| User-Agent | G2/1.0 |
| X-Received-Bytes | 2554 |
| Xref | csiph.com comp.lang.basic.visual.misc:887 |
Show key headers only | View raw
Hi folks, I recently had a large text file approaching 7GB in size dropped on me. The contents of which are supposed to be delimited text field data from a database. It's prohibitive size will not let me open it in a robust text editor so I've just sampled the first 32K out of it via opening it as a Binary file with 'Get & Put'. This at least allowed me to see what I was dealing with. The entity who provided the data has shut down all responsibility for the data so I cannot optionally ask for the data in another format. The little 32K subset of text turned out to be Encoded UTF-8 text with the EF BB BF header and is comprised of some 166 fields of delimited data. At least some subset of this data will eventually need to be loaded into an older legal database which only supports ANSI text. I've tried loading the entire thing into an Office 2010 Access database, but because the text is UTF8 Encoded it seems to insist that it is loading an XML document and errors out during load. My hope was to export out only the fields we need in ANSI format. Because the UTF8 format is not double-byte unicode all of the time (best I can tell from my research) I cannot simply step thru the data and consistently ignore the 'extra' byte. I experimented with 'StrConv' with no success for getting ANSI text out of sampled pieces of text. My goal is to step thru this text file and export out some more manageable 2GB ANSI segments or some such approach. Can anyone offer any suggestions on how I can achieve my goal? TIA ! ~Steve
Back to comp.lang.basic.visual.misc | Previous | Next — Next in thread | Find similar
How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 08:05 -0800
Re: How to handle LARGE UTF-8 file Deanna Earley <dee.earley@icode.co.uk> - 2012-03-08 16:55 +0000
Re: How to handle LARGE UTF-8 file "Bob Butler" <bob_butler@cox.invalid> - 2012-03-08 10:13 -0800
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 10:49 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:00 -0500
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 16:05 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-08 17:51 -0800
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-08 23:32 -0500
Re: How to handle LARGE UTF-8 file Schmidt <sss@online.de> - 2012-03-09 07:32 +0100
Re: How to handle LARGE UTF-8 file "Farnsworth" <nospam@nospam.com> - 2012-03-09 13:40 -0500
Re: How to handle LARGE UTF-8 file stevegdula@yahoo.com - 2012-03-14 08:54 -0700
csiph-web