Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52913

can't get utf8 / unicode strings from embedded python

Newsgroups comp.lang.python
Date 2013-08-23 13:49 -0700
Message-ID <fbeee40a-bc8a-4cef-abe7-2b2d54f59625@googlegroups.com> (permalink)
Subject can't get utf8 / unicode strings from embedded python
From "David M. Cotter" <me@davecotter.com>

Show all headers | View raw


note everything works great if i use Ascii, but:

in my utf8-encoded script i have this:

>	print "frøânçïé"

in my embedded C++ i have this:

PyObject*	CPython_Script::print(PyObject *args)
{
	PyObject		*resultObjP	= NULL;
	const char		*utf8_strZ	= NULL;
	
	if (PyArg_ParseTuple(args, "s", &utf8_strZ)) {
		Log(utf8_strZ, false);

		resultObjP = Py_None;
		Py_INCREF(resultObjP);
	}
	
	return resultObjP;
}

Now, i know that my Log() can print utf8 (has for years, very well debugged)

but what it *actually* prints is this:

>	print "frøânçïé"
--> frøânçïé

another method i use looks like this:
>	kj_commands.menu("控件", "同步滑帧", "全局无滑帧")
or
>	kj_commands.menu(u"控件", u"同步滑帧", u"全局无滑帧")

and in my C++ i have:

SuperString		ScPyObject::GetAs_String()
{
	SuperString		str;
	
	if (PyUnicode_Check(i_objP)) {
		#if 1
		//	method 1
		{
			ScPyObject		utf8Str(PyUnicode_AsUTF8String(i_objP));
			
			str = utf8Str.GetAs_String();
		}
		#elif 0
		//	method 2
		{
			UTF8Char		*uniZ = (UTF8Char *)PyUnicode_AS_UNICODE(i_objP);
		
			str.assign(&uniZ[0], &uniZ[PyUnicode_GET_DATA_SIZE(i_objP)], kCFStringEncodingUTF16);
		}
		#else
		//	method 3
		{
			UTF32Vec			charVec(32768); CF_ASSERT(sizeof(UTF32Vec::value_type) == sizeof(wchar_t));
			PyUnicodeObject		*uniObjP = (PyUnicodeObject *)(i_objP);
			Py_ssize_t			sizeL(PyUnicode_AsWideChar(uniObjP, (wchar_t *)&charVec[0], charVec.size()));
			
			charVec.resize(sizeL);
			charVec.push_back(0);
			str.Set(SuperString(&charVec[0]));
		}
		#endif
	} else {
		str.Set(uc(PyString_AsString(i_objP)));
	}
	
	Log(str.utf8Z());
	
	return str;
}


for the string, "控件", i get:
--> 控件

for the *unicode* string, u"控件", Methods 1, 2, and 3, i get the same thing:
--> 控件

okay so what am i doing wrong???

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 13:49 -0700
  Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-24 01:54 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-23 23:45 -0700
    Re: can't get utf8 / unicode strings from embedded python Dave Angel <davea@davea.name> - 2013-08-24 07:04 +0000
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 09:49 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-24 09:47 -0700
    Re: can't get utf8 / unicode strings from embedded python wxjmfauth@gmail.com - 2013-08-24 11:31 -0700
    Re: can't get utf8 / unicode strings from embedded python Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-08-24 12:45 -0700
    Re: can't get utf8 / unicode strings from embedded python random832@fastmail.us - 2013-08-24 20:01 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 10:57 -0700
    Re: can't get utf8 / unicode strings from embedded python Vlastimil Brom <vlastimil.brom@gmail.com> - 2013-08-25 20:23 +0200
    Re: can't get utf8 / unicode strings from embedded python Terry Reedy <tjreedy@udel.edu> - 2013-08-25 14:59 -0400
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:25 -0700
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-25 15:32 -0700
    Re: can't get utf8 / unicode strings from embedded python MRAB <python@mrabarnett.plus.com> - 2013-08-26 01:30 +0100
      Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 15:21 -0700
        Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-27 23:24 +0000
          Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-27 22:57 -0700
            Re: can't get utf8 / unicode strings from embedded python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-28 12:03 +0000
  Re: can't get utf8 / unicode strings from embedded python "David M. Cotter" <me@davecotter.com> - 2013-08-28 10:46 -0700

csiph-web