Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #49327 > unrolled thread
| Started by | darpan6aya <akshay.ksth@gmail.com> |
|---|---|
| First post | 2013-06-27 08:05 -0700 |
| Last post | 2013-06-27 12:28 -0400 |
| Articles | 4 — 3 participants |
Back to article view | Back to comp.lang.python
Devnagari Unicode Conversion Issues darpan6aya <akshay.ksth@gmail.com> - 2013-06-27 08:05 -0700
Re: Devnagari Unicode Conversion Issues MRAB <python@mrabarnett.plus.com> - 2013-06-27 16:28 +0100
Re: Devnagari Unicode Conversion Issues darpan6aya <akshay.ksth@gmail.com> - 2013-06-27 08:39 -0700
Re: Devnagari Unicode Conversion Issues Dave Angel <davea@davea.name> - 2013-06-27 12:28 -0400
| From | darpan6aya <akshay.ksth@gmail.com> |
|---|---|
| Date | 2013-06-27 08:05 -0700 |
| Subject | Devnagari Unicode Conversion Issues |
| Message-ID | <c8ea987a-a493-4adc-a35d-11a82f1bd03a@googlegroups.com> |
How can i convert text of the following type
नेपाली
into devnagari unicode in Python 2.7?
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-06-27 16:28 +0100 |
| Message-ID | <mailman.3932.1372346917.3114.python-list@python.org> |
| In reply to | #49327 |
On 27/06/2013 16:05, darpan6aya wrote:
> How can i convert text of the following type
>
> नेपाली
>
> into devnagari unicode in Python 2.7?
>
Is that a bytestring? In other words, is its type 'str'?
If so, you need to decode it. That particular string is UTF-8:
>>> print "नेपाली".decode("utf-8")
नेपाली
[toc] | [prev] | [next] | [standalone]
| From | darpan6aya <akshay.ksth@gmail.com> |
|---|---|
| Date | 2013-06-27 08:39 -0700 |
| Message-ID | <02ea5055-7617-4db1-a3b7-82d155c6954d@googlegroups.com> |
| In reply to | #49327 |
That worked out. I was trying to encode it the entire time. Now I realise how silly I am. Thanks MRAB. Once Again. :D
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-06-27 12:28 -0400 |
| Message-ID | <mailman.3935.1372350553.3114.python-list@python.org> |
| In reply to | #49331 |
On 06/27/2013 11:39 AM, darpan6aya wrote: > That worked out. I was trying to encode it the entire time. > Now I realise how silly I am. > > Thanks MRAB. Once Again. :D > you're not silly, it's a complex question. MRAB is good at guessing which part is messing you up. However, when you're writing a real Python program with a real text editor, and when you're not using a newsgroup in between to mangle or unmangle things, you have a few things to match up to get it right. The file is just a bunch of bytes. Those bytes are being inserted in there by your editor, and interpreted by the compiler. So if you have a non-ASCII character on your keyboard and you hit it, the editor will decode it (from Unicode to byte(s)) and put it in the file. If you tell the editor to use utf-8, then you also want to tell the compiler to decode it using utf-8. The most polite way to do that looks something like: # -*- coding: <encoding-name> -*- # -*- coding: <utf-8> -*- http://docs.python.org/release/2.7.5/reference/lexical_analysis.html#encoding-declarations Once you've got that straight, you don't need to explicitly decode byte strings. You can just use u"This is my string" with whatever characters you need. As long as the declarations match, this should "just work." If the data comes from a byte string other than a literal string, you might need the more verbose form. Your original message was sent in Western (ISO 8859-1), and MRAB's response was in utf-8, and my mail program decoded the string the same way. However, I don't know anything about Devnagari, so I can't say if it looked reasonable here. -- DaveA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web