Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52947

Re: can't get utf8 / unicode strings from embedded python

References <fbeee40a-bc8a-4cef-abe7-2b2d54f59625@googlegroups.com> <d3e52d5b-84c9-4cb4-84bf-cbdd886425b1@googlegroups.com>
From Benjamin Kaplan <benjamin.kaplan@case.edu>
Date 2013-08-24 12:45 -0700
Subject Re: can't get utf8 / unicode strings from embedded python
Newsgroups comp.lang.python
Message-ID <mailman.200.1377373941.19984.python-list@python.org> (permalink)

Show all headers | View raw


On Sat, Aug 24, 2013 at 9:47 AM, David M. Cotter <me@davecotter.com> wrote:
>
> > What _are_ you using?
> i have scripts in a file, that i am invoking into my embedded python within a C++ program.  there is no terminal involved.  the "print" statement has been redirected (via sys.stdout) to my custom print class, which does not specify "encoding", so i tried the suggestion above to set it:
>
> static const char *s_RedirectScript =
>         "import " kEmbeddedModuleName "\n"
>         "import sys\n"
>         "\n"
>         "class CustomPrintClass:\n"
>         "       def write(self, stuff):\n"
>         "               " kEmbeddedModuleName "." kCustomPrint "(stuff)\n"
>         "class CustomErrClass:\n"
>         "       def write(self, stuff):\n"
>         "               " kEmbeddedModuleName "." kCustomErr "(stuff)\n"
>         "sys.stdout = CustomPrintClass()\n"
>         "sys.stderr = CustomErrClass()\n"
>         "sys.stdout.encoding = 'UTF-8'\n"
>         "sys.stderr.encoding = 'UTF-8'\n";
>
>
> but it didn't help.
>
> I'm still getting back a string that is a utf-8 string of characters that, if converted to "macRoman" and then interpreted as UTF8, shows the original, correct string.  who is specifying macRoman, and where, and how do i tell whoever that is that i really *really* want utf8?
> --

If you're running this from a C++ program, then you aren't getting
back characters. You're getting back bytes. If you treat them as
UTF-8, they'll work properly. The only thing wrong is the text editor
you're using to open the file afterwards- since you aren't specifying
an encoding, it's assuming MacRoman. You can try putting the UTF-8 BOM
(it's not really a BOM) at the front of the file- the bytes 0xEF 0xBB
0xBF are used by some editors to identify a file as UTF-8.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 13:49 -0700
  Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-24 01:54 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 23:45 -0700
    Re: can't get utf8 / unicode strings from embedded python Dave Angel <davea@davea.name> - 2013-08-24 07:04 +0000
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 09:49 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-24 09:47 -0700
    Re: can't get utf8 / unicode strings from embedded python wxjmfauth@gmail.com - 2013-08-24 11:31 -0700
    Re: can't get utf8 / unicode strings from embedded python Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-08-24 12:45 -0700
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 20:01 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 10:57 -0700
    Re: can't get utf8 / unicode strings from embedded python Vlastimil Brom <vlastimil.brom@gmail.com> - 2013-08-25 20:23 +0200
    Re: can't get utf8 / unicode strings from embedded python Terry Reedy <tjreedy@udel.edu> - 2013-08-25 14:59 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:25 -0700
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:32 -0700
    Re: can't get utf8 / unicode strings from embedded python MRAB <python@mrabarnett.plus.com> - 2013-08-26 01:30 +0100
      Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 15:21 -0700
        Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-27 23:24 +0000
          Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 22:57 -0700
            Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-28 12:03 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-28 10:46 -0700

csiph-web