Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52982
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Subject | Re: can't get utf8 / unicode strings from embedded python |
| Date | 2013-08-25 14:59 -0400 |
| References | <fbeee40a-bc8a-4cef-abe7-2b2d54f59625@googlegroups.com> <d6250c5d-ff7d-46ae-9e0a-1c51a6e9b7dc@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.223.1377457194.19984.python-list@python.org> (permalink) |
On 8/25/2013 1:57 PM, David M. Cotter wrote: > i'm sorry this is so confusing, let me try to re-state the problem in as clear a way as i can. > > I have a C++ program, with very well tested unicode support. All logging is done in utf8. I have conversion routines that work flawlessly, so i can assure you there is nothing wrong with logging and unicode support in the underlying program. > I am embedding python 2.7 into the program, and extending python with routines in my C++ program. If you want 'well-tested' (correct) unicode support from Python, use 3.3. Unicode in 2.x is somewhat buggy and definitely flakey. The first fix was to make unicode *the* text type, in 3.0. The second was to redesign the internals in 3.3. It is possible that 2.7 is too broken for what you want to do. > I have a script, encoded in utf8, and *marked* as utf8 with this line: > # -*- coding: utf-8 -*- > > In that script, i have inline unicode text. The example scripts that you posted pictures of do *not* have unicode text. They have bytestring literals with (encoded) non-ascii chars inside them. This is not a great idea. I am not sure what bytes you end up with. Apparently, not what you expect. To make them 'unicode text', you must prepend the literals with 'u'. Didn't someone say this before? > When I pass that text to my C++ program, the Python interpreter decides that these bytes are macRoman, and handily "converts" them to unicode. To compensate, i must "convert" these "macRoman" characters encoded as utf8, back to macRoman, then "interpret" them as utf8. In this way i can recover the original unicode. > > When i return a unicode string back to python, i must do the reverse so that Python gets back what it expects. > > This is not related to printing, or sys.stdout, it does happen with that too but focusing on that is a red-herring. Let's focus on just passing a string into C++ then back out. > > This would all actually make sense IF my script was marked as being "macRoman" even tho i entered UTF8 Characters, but that is not the case. > > Let's prove my statements. Here is the script, *interpreted* as MacRoman: > http://karaoke.kjams.com/screenshots/bugs/python_unicode/script_as_macroman.png Why are you posting pictures of code, instead of the (runnable) code itself, as you did with C code? -- Terry Jan Reedy
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 13:49 -0700
Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-24 01:54 +0000
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 23:45 -0700
Re: can't get utf8 / unicode strings from embedded python Dave Angel <davea@davea.name> - 2013-08-24 07:04 +0000
Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 09:49 -0400
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-24 09:47 -0700
Re: can't get utf8 / unicode strings from embedded python wxjmfauth@gmail.com - 2013-08-24 11:31 -0700
Re: can't get utf8 / unicode strings from embedded python Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-08-24 12:45 -0700
Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 20:01 -0400
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 10:57 -0700
Re: can't get utf8 / unicode strings from embedded python Vlastimil Brom <vlastimil.brom@gmail.com> - 2013-08-25 20:23 +0200
Re: can't get utf8 / unicode strings from embedded python Terry Reedy <tjreedy@udel.edu> - 2013-08-25 14:59 -0400
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:25 -0700
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:32 -0700
Re: can't get utf8 / unicode strings from embedded python MRAB <python@mrabarnett.plus.com> - 2013-08-26 01:30 +0100
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 15:21 -0700
Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-27 23:24 +0000
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 22:57 -0700
Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-28 12:03 +0000
Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-28 10:46 -0700
csiph-web