Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #49336

Re: Devnagari Unicode Conversion Issues

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'explicitly': 0.05; 'mrab': 0.05; '-*-': 0.07; 'compiler': 0.07; 'string': 0.09; 'bytes.': 0.09; 'coding:': 0.09; 'encode': 0.09; 'inserted': 0.09; 'interpreted': 0.09; 'literal': 0.09; 'messing': 0.09; 'strings.': 0.09; 'things,': 0.09; 'python': 0.11; 'question.': 0.14; 'compiler.': 0.16; 'declarations': 0.16; 'editor,': 0.16; 'guessing': 0.16; 'non-ascii': 0.16; 'silly': 0.16; 'subject:Unicode': 0.16; 'work."': 0.16; 'wrote:': 0.18; 'looked': 0.18; 'trying': 0.19; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'bytes': 0.24; 'keyboard': 0.24; 'string,': 0.24; 'unicode': 0.24; 'file.': 0.24; 'looks': 0.24; 'right.': 0.26; 'header:In-Reply- To:1': 0.27; 'am,': 0.29; 'character': 0.29; 'characters': 0.30; 'bunch': 0.31; 'file': 0.32; 'text': 0.33; 'up.': 0.33; 'worked': 0.33; 'url:python': 0.33; "can't": 0.35; 'something': 0.35; 'editor': 0.35; 'form.': 0.35; 'there': 0.35; 'thanks': 0.36; 'url:org': 0.36; 'should': 0.36; 'being': 0.38; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'anything': 0.39; '(from': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'most': 0.60; 'tell': 0.60; 'url:5': 0.61; 'entire': 0.61; "you're": 0.61; "you've": 0.63; 'real': 0.63; 'more': 0.64; 'between': 0.67; 'received:74.208': 0.68; '"just': 0.84; 'received:74.208.4.194': 0.84; 'url:reference': 0.84; 'western': 0.86; 'am.': 0.91
Date Thu, 27 Jun 2013 12:28:59 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version 1.0
To python-list@python.org
Subject Re: Devnagari Unicode Conversion Issues
References <c8ea987a-a493-4adc-a35d-11a82f1bd03a@googlegroups.com> <02ea5055-7617-4db1-a3b7-82d155c6954d@googlegroups.com>
In-Reply-To <02ea5055-7617-4db1-a3b7-82d155c6954d@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:vSsnXFDe4WEESI7RzqkL2eqrWpQtP/VtMoEuS7QH/MI S0zTzyLABU9bpHEyadL/lDUw2ZaZCknbHmRgCbK6BdU6azaJxK acqQtYsrZXiNwPafE9StjFzbXpPQQPny8hGOgJrdViYRJLVop8 D0eEuX+Qvhsekv9xftB9aGKrGZM5xH1kq5KVBPMYMYz9SVPYYY ID0af3Ccyds9T609srrmmI+JCrdwKzJEbOtDPnnaDduPFkKx/m v1TouRsccqfg1l+5NTFVLcC0vYvDJ/xMnT/UM+Rmdj1xsd8Mpg BLeqmLpXcEJ9hXMbpdLrGNqeBzl0El4E7+ukVQNykpKQz+LtEd 8eyiz2OljNrquL+Llue0=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3935.1372350553.3114.python-list@python.org> (permalink)
Lines 44
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1372350553 news.xs4all.nl 15888 [2001:888:2000:d::a6]:33532
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:49336

Show key headers only | View raw


On 06/27/2013 11:39 AM, darpan6aya wrote:
> That worked out. I was trying to encode it the entire time.
> Now I realise how silly I am.
>
> Thanks MRAB. Once Again. :D
>

you're not silly, it's a complex question.  MRAB is good at guessing 
which part is messing you up.

However, when you're writing a real Python program with a real text 
editor, and when you're not using a newsgroup in between to mangle or 
unmangle things, you have a few things to match up to get it right.


The file is just a bunch of bytes.  Those bytes are being inserted in 
there by your editor, and interpreted by the compiler.  So if you have a 
non-ASCII character on your keyboard and you hit it, the editor will 
decode it (from Unicode to byte(s)) and put it in the file.  If you tell 
the editor to use utf-8, then you also want to tell the compiler to 
decode it using utf-8.

The most polite way to do that looks something like:
# -*- coding: <encoding-name> -*-
# -*- coding: <utf-8> -*-

http://docs.python.org/release/2.7.5/reference/lexical_analysis.html#encoding-declarations

Once you've got that straight, you don't need to explicitly decode byte 
strings.  You can just use
   u"This is my string"

with whatever characters you need.  As long as the declarations match, 
this should "just work."  If the data comes from a byte string other 
than a literal string, you might need the more verbose form.

Your original message was sent in Western (ISO 8859-1), and MRAB's 
response was in utf-8, and my mail program decoded the string the same 
way.  However, I don't know anything about Devnagari, so I can't say if 
it looked reasonable here.


-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Devnagari Unicode Conversion Issues darpan6aya <akshay.ksth@gmail.com> - 2013-06-27 08:05 -0700
  Re: Devnagari Unicode Conversion Issues MRAB <python@mrabarnett.plus.com> - 2013-06-27 16:28 +0100
  Re: Devnagari Unicode Conversion Issues darpan6aya <akshay.ksth@gmail.com> - 2013-06-27 08:39 -0700
    Re: Devnagari Unicode Conversion Issues Dave Angel <davea@davea.name> - 2013-06-27 12:28 -0400

csiph-web