Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #97042 > unrolled thread
| Started by | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| First post | 2015-09-23 10:07 -0600 |
| Last post | 2015-09-23 10:07 -0600 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Readlines returns non ASCII character Ian Kelly <ian.g.kelly@gmail.com> - 2015-09-23 10:07 -0600
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-09-23 10:07 -0600 |
| Subject | Re: Readlines returns non ASCII character |
| Message-ID | <mailman.106.1443024499.28679.python-list@python.org> |
On Wed, Sep 23, 2015 at 6:47 AM, SANKAR . <shankarphy@gmail.com> wrote: > Hi all, > > I am not a expert programmer but I have to extract information from a large > file. > I used codecs.open(..) with UTF16 encoding to read this file. It could > read all the lines in the file but returns with the non Ascii characters. > Below are 5 sample lines. How do I avoid having this non Ascii items. Is > there a better way to read this? I suspect that what you want is not "non-ASCII" but just to read the file without all the mojibake, which is likely an indication that you're using the wrong encoding. Do you know that UTF-16 is actually the encoding of the file? Based on the spaces that appear between adjacent characters, I would guess that this is probably in a 32-bit encoding, perhaps UTF-32. On the other hand, the repeated 0x00ff 0x00fe 0x00ff are very curious; I don't see how that could be valid UTF-32. Are you sure that this is a text file and not some propietary binary data format?
Back to top | Article view | comp.lang.python
csiph-web