Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52994

Re: can't get utf8 / unicode strings from embedded python

Path csiph.com!usenet.pasdenom.info!news.franciliens.net!fdn.fr!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python@mrabarnett.plus.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'encoding': 0.05; 'c++,': 0.07; 'utf-8': 0.07; 'string': 0.09; 'encode': 0.09; 'python': 0.11; '"run': 0.16; '"u"': 0.16; 'assumed.': 0.16; 'bytes)': 0.16; 'doing,': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'literals': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'specifying': 0.16; 'subject:unicode': 0.16; 'unicode,': 0.16; 'utf8': 0.16; 'subject:python': 0.16; ':-)': 0.16; 'wrote:': 0.18; 'header:User- Agent:1': 0.23; 'bytes': 0.24; 'unicode': 0.24; 'file.': 0.24; 'source': 0.25; 'pass': 0.26; 'asking': 0.27; 'header:In-Reply- To:1': 0.27; 'function': 0.29; 'correct': 0.29; 'code': 0.31; '(on': 0.31; 'file': 0.32; 'front': 0.32; 'subject:from': 0.34; 'received:84': 0.35; 'but': 0.35; 'doing': 0.36; 'should': 0.36; 'being': 0.38; 'to:addr:python-list': 0.38; 'subject:can': 0.39; 'to:addr:python.org': 0.39; 'read': 0.60; 'subject: / ': 0.60; "you're": 0.61; "you've": 0.63; 'side': 0.67; 'header:Reply-To:1': 0.67; 'below:': 0.68; 'reply-to:no real name:2**0': 0.71; 'special': 0.74; 'subject:get': 0.81; '(2),': 0.84; 'comment.': 0.84; 'out!': 0.84; 'reply-to:addr:python.org': 0.84; 'lot,': 0.93
X-CM-Score 0.00
X-CNFS-Analysis v=2.1 cv=ZMDuxxLb c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=0kkAYlmtguIA:10 a=fJfERJHSPtEA:10 a=ihvODaAuJD4A:10 a=OUOv7kDek9cA:10 a=8nJEP1OIZ-IA:10 a=EBOSESyhAAAA:8 a=8AHkEIZyAAAA:8 a=C7vhqBQ2SW4A:10 a=-7dqoCFbp8SkbGO8nyoA:9 a=wPNLvfGTeEIA:10
X-AUTH mrabarnett:2500
Date Mon, 26 Aug 2013 01:30:15 +0100
From MRAB <python@mrabarnett.plus.com>
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version 1.0
To python-list@python.org
Subject Re: can't get utf8 / unicode strings from embedded python
References <fbeee40a-bc8a-4cef-abe7-2b2d54f59625@googlegroups.com> <cf8eba75-045c-4fa2-abae-14b8cf02c915@googlegroups.com>
In-Reply-To <cf8eba75-045c-4fa2-abae-14b8cf02c915@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To python-list@python.org
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.231.1377477011.19984.python-list@python.org> (permalink)
Lines 26
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1377477011 news.xs4all.nl 15871 [2001:888:2000:d::a6]:36236
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:52994

Show key headers only | View raw


On 25/08/2013 23:32, David M. Cotter wrote:
> i got it!!  OMG!  so sorry for the confusion, but i learned a lot,
> and i can share the result:
>
> the CORRECT code *was* what i had assumed.  the Python side has
> always been correct (no need to put "u" in front of strings, it is
> known that the bytes are utf8 bytes)
>
> it was my "run script" function which read in the file.  THAT was
> what was "reinterpreting" the utf8 bytes as macRoman (on both
> platforms).  correct code below:
>
When working with Unicode, what you should be doing is:

1. Specifying the encoding line in the special comment.

2. Setting the encoding of the source file.

3. Using Unicode string literals in the source file.

You're doing (1) and (2), but not (3).

If you want to pass UTF-8 to the the C++, then encode the Unicode
string to bytes when you pass it. Using bytestring literals and relying
on the source file being UTF-8, like you doing, is just asking for
trouble, as you've found out! :-)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 13:49 -0700
  Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-24 01:54 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 23:45 -0700
    Re: can't get utf8 / unicode strings from embedded python Dave Angel <davea@davea.name> - 2013-08-24 07:04 +0000
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 09:49 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-24 09:47 -0700
    Re: can't get utf8 / unicode strings from embedded python wxjmfauth@gmail.com - 2013-08-24 11:31 -0700
    Re: can't get utf8 / unicode strings from embedded python Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-08-24 12:45 -0700
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 20:01 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 10:57 -0700
    Re: can't get utf8 / unicode strings from embedded python Vlastimil Brom <vlastimil.brom@gmail.com> - 2013-08-25 20:23 +0200
    Re: can't get utf8 / unicode strings from embedded python Terry Reedy <tjreedy@udel.edu> - 2013-08-25 14:59 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:25 -0700
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:32 -0700
    Re: can't get utf8 / unicode strings from embedded python MRAB <python@mrabarnett.plus.com> - 2013-08-26 01:30 +0100
      Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 15:21 -0700
        Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-27 23:24 +0000
          Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 22:57 -0700
            Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-28 12:03 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-28 10:46 -0700

csiph-web