Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #11463 > unrolled thread

string to unicode

Started byArtie Ziff <artie.ziff@gmail.com>
First post2011-08-15 08:20 -0700
Last post2011-08-16 17:32 -0700
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  string to unicode Artie Ziff <artie.ziff@gmail.com> - 2011-08-15 08:20 -0700
    Re: string to unicode Tim Roberts <timr@probo.com> - 2011-08-16 17:32 -0700

#11463 — string to unicode

FromArtie Ziff <artie.ziff@gmail.com>
Date2011-08-15 08:20 -0700
Subjectstring to unicode
Message-ID<mailman.12.1313421630.27778.python-list@python.org>
if I am using the standard csv library to read contents of a csv file 
which contains Unicode strings (short example: 
'\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such 
as decode or encode to transform this string type into a python unicode 
type? Must I know the encoding (byte groupings) of the Unicode? Can I 
get this from the file? Perhaps I need to open the file with particular 
attributes?

thanks!

[toc] | [next] | [standalone]


#11635

FromTim Roberts <timr@probo.com>
Date2011-08-16 17:32 -0700
Message-ID<1r2m47h2fsmoe1cd9tnng4rdlf3l8cf39m@4ax.com>
In reply to#11463
Artie Ziff <artie.ziff@gmail.com> wrote:
>
>if I am using the standard csv library to read contents of a csv file 
>which contains Unicode strings (short example: 
>'\xe8\x9f\x92\xe8\x9b\x87'),

You need to be rather precise when talking about this.  That's not a
"Unicode string" in Python terms.  It's an 8-bit string.  It might be UTF-8
encoding.  If so, it maps to two Unicode code points, U+87D2 and U+86C7,
which are both CJK ideograms.  Is that what you expected?

  C:\Dev\videology\sw\viewer>python
  Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> x = '\xe8\x9f\x92\xe8\x9b\x87'
  >>> x.decode('utf8')
  u'\u87d2\u86c7'
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web