Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #11463 > unrolled thread
| Started by | Artie Ziff <artie.ziff@gmail.com> |
|---|---|
| First post | 2011-08-15 08:20 -0700 |
| Last post | 2011-08-16 17:32 -0700 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
string to unicode Artie Ziff <artie.ziff@gmail.com> - 2011-08-15 08:20 -0700
Re: string to unicode Tim Roberts <timr@probo.com> - 2011-08-16 17:32 -0700
| From | Artie Ziff <artie.ziff@gmail.com> |
|---|---|
| Date | 2011-08-15 08:20 -0700 |
| Subject | string to unicode |
| Message-ID | <mailman.12.1313421630.27778.python-list@python.org> |
if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? thanks!
[toc] | [next] | [standalone]
| From | Tim Roberts <timr@probo.com> |
|---|---|
| Date | 2011-08-16 17:32 -0700 |
| Message-ID | <1r2m47h2fsmoe1cd9tnng4rdlf3l8cf39m@4ax.com> |
| In reply to | #11463 |
Artie Ziff <artie.ziff@gmail.com> wrote:
>
>if I am using the standard csv library to read contents of a csv file
>which contains Unicode strings (short example:
>'\xe8\x9f\x92\xe8\x9b\x87'),
You need to be rather precise when talking about this. That's not a
"Unicode string" in Python terms. It's an 8-bit string. It might be UTF-8
encoding. If so, it maps to two Unicode code points, U+87D2 and U+86C7,
which are both CJK ideograms. Is that what you expected?
C:\Dev\videology\sw\viewer>python
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = '\xe8\x9f\x92\xe8\x9b\x87'
>>> x.decode('utf8')
u'\u87d2\u86c7'
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web