Groups > comp.lang.python > #6674 > unrolled thread

sqlalchemy and Unicode strings: errormessage

Started by	Wolfgang Meiners <WolfgangMeiners01@web.de>
First post	2011-05-30 23:40 +0200
Last post	2011-06-02 04:38 +1000
Articles	13 — 6 participants

Back to article view | Back to comp.lang.python

  sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-30 23:40 +0200
    Re: sqlalchemy and Unicode strings: errormessage Chris Withers <chris@simplistix.co.uk> - 2011-05-31 10:55 +0100
      Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:47 +0200
    Re: sqlalchemy and Unicode strings: errormessage Daniel Kluev <dan.kluev@gmail.com> - 2011-05-31 22:32 +1100
      Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:45 +0200
        Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 18:10 +0200
        Re: sqlalchemy and Unicode strings: errormessage Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-31 09:42 -0700
        RE: sqlalchemy and Unicode strings: errormessage "Prasad, Ramit" <ramit.prasad@jpmchase.com> - 2011-05-31 12:31 -0400
        Re: sqlalchemy and Unicode strings: errormessage Chris Angelico <rosuav@gmail.com> - 2011-06-01 03:19 +1000
        Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 21:52 +0200
          Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-01 07:56 +1000
            Re: Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-06-01 19:29 +0200
              Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-02 04:38 +1000

#6674 — sqlalchemy and Unicode strings: errormessage

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-05-30 23:40 +0200
Subject	sqlalchemy and Unicode strings: errormessage
Message-ID	<4de40ee8$0$6623$9b4e6d93@newsspool2.arcor-online.net>

Hi,

I am trying to build an application using sqlalchemy.

in principle i have the structure

#==============================================

from sqlalchemy import *
from sqlalchemy.orm import *

metadata = MetaData('sqlite://')
a_table = Table('tf_lehrer', metadata,
    Column('id', Integer, primary_key=True),
    Column('Kuerzel', Text),
    Column('Name', Text))

A_class = Class(object):
    def __init__(self, Kuerzel, Name)
        self.Kuerzel=Kuerzel
        self.Name=Name

mapper(A_class, a_table)

A_record = A_class('BUM', 'Bäumer')

Session = sessionmaker()
session = Session()

session.add(A_record)

session.flush()

#================================================

At this time it runs to the line

session.flush()

where i get the following errormessage:

sqlalchemy.exc.ProgrammingError: (ProgrammingError) You must not use
8-bit bytestrings unless you use a text_factory that can interpret 8-bit
bytestrings (like text_factory = str). It is highly recommended that you
instead just switch your application to Unicode strings. u'INSERT INTO
tf_lehrer ("Kuerzel", "Name") VALUES (?, ?)' ('BUM', 'B\xc3\xa4umer')

but where can i switch my  application to Unicode strings?

Thank you for all hints
Wolfgang

[toc] | [next] | [standalone]

#6721

From	Chris Withers <chris@simplistix.co.uk>
Date	2011-05-31 10:55 +0100
Message-ID	<mailman.2313.1306835718.9059.python-list@python.org>
In reply to	#6674

Hi Wolfgang,

On 30/05/2011 22:40, Wolfgang Meiners wrote:
> I am trying to build an application using sqlalchemy.

You're likely to get much better help here:

http://www.sqlalchemy.org/support.html#mailinglist

When you post there, make sure you include:

- what python version you're using
- what sqlalchemy version you're using

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk

[toc] | [prev] | [next] | [standalone]

#6734

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-05-31 17:47 +0200
Message-ID	<4de50d82$0$6538$9b4e6d93@newsspool4.arcor-online.net>
In reply to	#6721

Am 31.05.11 11:55, schrieb Chris Withers:
> Hi Wolfgang,
> 
> On 30/05/2011 22:40, Wolfgang Meiners wrote:
>> I am trying to build an application using sqlalchemy.
> 
> You're likely to get much better help here:
> 
> http://www.sqlalchemy.org/support.html#mailinglist
> 
> When you post there, make sure you include:
> 
> - what python version you're using
> - what sqlalchemy version you're using
> 
> cheers,
> 
> Chris
> 

Thank you for pointing me to this list. I will have a look to it. At the
moment i think i am really struggeling with python and uft8.

Wolfgang

[toc] | [prev] | [next] | [standalone]

#6724

From	Daniel Kluev <dan.kluev@gmail.com>
Date	2011-05-31 22:32 +1100
Message-ID	<mailman.2315.1306841548.9059.python-list@python.org>
In reply to	#6674

On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> metadata = MetaData('sqlite://')
> a_table = Table('tf_lehrer', metadata,
>    Column('id', Integer, primary_key=True),
>    Column('Kuerzel', Text),
>    Column('Name', Text))

Use UnicodeText instead of Text.

> A_record = A_class('BUM', 'Bäumer')

If this is python2.x, use u'Bäumer' instead.


-- 
With best regards,
Daniel Kluev

[toc] | [prev] | [next] | [standalone]

#6732

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-05-31 17:45 +0200
Message-ID	<4de50cfd$0$6538$9b4e6d93@newsspool4.arcor-online.net>
In reply to	#6724

Am 31.05.11 13:32, schrieb Daniel Kluev:
> On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> metadata = MetaData('sqlite://')
>> a_table = Table('tf_lehrer', metadata,
>>    Column('id', Integer, primary_key=True),
>>    Column('Kuerzel', Text),
>>    Column('Name', Text))
> 
> Use UnicodeText instead of Text.
> 
>> A_record = A_class('BUM', 'Bäumer')
> 
> If this is python2.x, use u'Bäumer' instead.
> 
> 

Thank you Daniel.
So i came a little bit closer to the solution. Actually i dont write the
strings in a python program but i read them from a file, which is
utf8-encoded.

So i changed the lines

    for line in open(file,'r'):
        line = line.strip()

first to

    for line in open(file,'r'):
        line = unicode(line.strip())

and finally to

    for line in open(file,'r'):
        line = unicode(line.strip(),'utf8')

and now i get really utf8-strings. It does work but i dont know why it
works. For me it looks like i change an utf8-string to an utf8-string.

By the way: when i run a python program from eclipse, then

print sys.getdefaultencoding()

returns utf-8

and when i run the same python program from the command line, then

print sys.getdefaultencoding()

returns ascii

but my locale is set to
$ locale
LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"

I think, utf8 is somewhat confusing in python - at least to me.

Wolfgang

[toc] | [prev] | [next] | [standalone]

#6735

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-05-31 18:10 +0200
Message-ID	<4de51303$0$6557$9b4e6d93@newsspool4.arcor-online.net>
In reply to	#6732

I just found a second method on
http://docs.python.org/howto/unicode

you can use tho module codecs and then simply write

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)

Wolfgang

[toc] | [prev] | [next] | [standalone]

#6738

From	Benjamin Kaplan <benjamin.kaplan@case.edu>
Date	2011-05-31 09:42 -0700
Message-ID	<mailman.2321.1306860238.9059.python-list@python.org>
In reply to	#6732

On Tue, May 31, 2011 at 8:45 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> Am 31.05.11 13:32, schrieb Daniel Kluev:
>> On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
>> <WolfgangMeiners01@web.de> wrote:
>>> metadata = MetaData('sqlite://')
>>> a_table = Table('tf_lehrer', metadata,
>>>    Column('id', Integer, primary_key=True),
>>>    Column('Kuerzel', Text),
>>>    Column('Name', Text))
>>
>> Use UnicodeText instead of Text.
>>
>>> A_record = A_class('BUM', 'Bäumer')
>>
>> If this is python2.x, use u'Bäumer' instead.
>>
>>
>
> Thank you Daniel.
> So i came a little bit closer to the solution. Actually i dont write the
> strings in a python program but i read them from a file, which is
> utf8-encoded.
>
> So i changed the lines
>
>    for line in open(file,'r'):
>        line = line.strip()
>
> first to
>
>    for line in open(file,'r'):
>        line = unicode(line.strip())
>
> and finally to
>
>    for line in open(file,'r'):
>        line = unicode(line.strip(),'utf8')
>
> and now i get really utf8-strings. It does work but i dont know why it
> works. For me it looks like i change an utf8-string to an utf8-string.
>

There's no such thing as a UTF-8 string. You have a list of bytes
(byte string) and you have a list of characters (unicode). UTF-8 is a
function that can convert bytes into characters (and the reverse). You
may recognize that the list of bytes was encoded using UTF-8 but the
computer does not unless you explicitly tell it to. Does that help
clear it up?

[toc] | [prev] | [next] | [standalone]

#6741

From	"Prasad, Ramit" <ramit.prasad@jpmchase.com>
Date	2011-05-31 12:31 -0400
Message-ID	<mailman.2322.1306860978.9059.python-list@python.org>
In reply to	#6732

>line = unicode(line.strip(),'utf8')
>and now i get really utf8-strings. It does work but i dont know why it works. For me it looks like i change an utf8-string to an utf8-string.


I would like to point out that UTF-8 is not exactly "Unicode". From what I understand, Unicode is a standard while UTF-8 is like an implementation of that standard (called an encoding). Being able to convert to Unicode (the standard) should mean you are then able to convert to any encoding that supports the Unicode characters used.

As you can see below a string in UTF-8 is actually not Unicode. (decode converts to Unicode, encode converts away from Unicode)

>>> type(u'test'.encode('utf8'))
<type 'str'>
>>> type('test'.decode('utf8'))
<type 'unicode'>
>>> type('test'.encode('utf8'))
<type 'str'>
>>> type(u'test')
<type 'unicode'>


Ramit



Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423


This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

[toc] | [prev] | [next] | [standalone]

#6746

From	Chris Angelico <rosuav@gmail.com>
Date	2011-06-01 03:19 +1000
Message-ID	<mailman.2326.1306862395.9059.python-list@python.org>
In reply to	#6732

On Wed, Jun 1, 2011 at 2:31 AM, Prasad, Ramit <ramit.prasad@jpmchase.com> wrote:
>>line = unicode(line.strip(),'utf8')
>>and now i get really utf8-strings. It does work but i dont know why it works. For me it looks like i change an utf8-string to an utf8-string.
>
>
> I would like to point out that UTF-8 is not exactly "Unicode". From what I understand, Unicode is a standard while UTF-8 is like an implementation of that standard (called an encoding). Being able to convert to Unicode (the standard) should mean you are then able to convert to any encoding that supports the Unicode characters used.

Unicode defines characters; UTF-8 is one way (of many) to represent
those characters in bytes. UTF-16 and UTF-32 are other ways of
representing those characters in bytes, and internally, Python
probably uses one of them - but there is no guarantee, and you should
never need to know. Unicode strings can be stored in memory and
manipulated in various ways, but they're a high level construct on par
with lists and dictionaries - they can't be stored on disk or
transmitted to another computer without using an encoding system.

UTF-8 is an efficient way to translate Unicode text consisting
primarily of low codepoint characters into bytes. It's not so much an
implementation of Unicode as a means of converting a mythical concept
of "Unicode characters" into a concrete stream of bytes.

Hope that clarifies things a little!

Chris Angelico

[toc] | [prev] | [next] | [standalone]

#6754 — Thanks for all responses

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-05-31 21:52 +0200
Subject	Thanks for all responses
Message-ID	<4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net>
In reply to	#6732

I think it helped me very much to understand the problem.

So if i deal with nonascii strings, i have a 'list of bytes' and need an
encoding to interpret this list and transform it to a meaningful unicode
string. Decoding does the opposite.

Whenever i 'cross the border' of my program, i have to encode the 'list
of bytes' to an unicode string or decode the unicode string to a 'list
of bytes' which is meaningful to the world outside.

So encode early, decode lately means, to do it as near to the border as
possible and to encode/decode i need a coding system, for example 'utf8'

That means, there should be an encoding/decoding possibility to every
interface i can use: files, stdin, stdout, stderr, gui (should be the
most important ones).

While trying to understand this, i wrote the following program. Maybe
someone can give me a hint, how to print correctly:

######################################################
#! python
# -*- coding: utf-8 -*-

class EncTest:
    def __init__(self,Name=None):
        self.Name=unicode(Name, encoding='utf8')

    def __repr__(self):
        return u'My name is %s' % self.Name

if __name__ == '__main__':

    a = EncTest('Müller')

    # this does work
    print a.__repr__()

    # throws an error if default encoding is ascii
    # but works if default encoding is utf8
    print a

    # throws an error because a is not a string
    print unicode(a, encoding='utf8')
######################################################

Wolfgang

[toc] | [prev] | [next] | [standalone]

#6759 — Re: Thanks for all responses

From	Chris Angelico <rosuav@gmail.com>
Date	2011-06-01 07:56 +1000
Subject	Re: Thanks for all responses
Message-ID	<mailman.2337.1306878988.9059.python-list@python.org>
In reply to	#6754

On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> Whenever i 'cross the border' of my program, i have to encode the 'list
> of bytes' to an unicode string or decode the unicode string to a 'list
> of bytes' which is meaningful to the world outside.

Most people use "encode" and "decode" the other way around; you encode
a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
you're correct.

> So encode early, decode lately means, to do it as near to the border as
> possible and to encode/decode i need a coding system, for example 'utf8'

Correct on both counts.

> That means, there should be an encoding/decoding possibility to every
> interface i can use: files, stdin, stdout, stderr, gui (should be the
> most important ones).

The file objects (as returned by open()) have an encoding, which
(IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
might well accept Unicode strings directly - check the docs.

>    def __repr__(self):
>        return u'My name is %s' % self.Name

This means that repr() will return a Unicode string.

>    # this does work
>    print a.__repr__()
>
>    # throws an error if default encoding is ascii
>    # but works if default encoding is utf8
>    print a
>
>    # throws an error because a is not a string
>    print unicode(a, encoding='utf8')

The __repr__ function is supposed to return a string object, in Python
2. See http://docs.python.org/reference/datamodel.html#object.__repr__
for that and other advice on writing __repr__. The problems you're
seeing are a result of the built-in repr() function calling
a.__repr__() and then treating the return value as an ASCII str, not a
Unicode string.

This would work:
    def __repr__(self):
        return (u'My name is %s' % self.Name).encode('utf8')

Alternatively, migrate to Python 3, where the default is Unicode
strings. I tested this in Python 3.2 on Windows, but it should work on
anything in the 3.x branch:

class NoEnc:
	def __init__(self,Name=None):
		self.Name=Name
	def __repr__(self):
		return 'My name is %s' % self.Name

if __name__ == '__main__':

   a = NoEnc('Müller')

   # this will still work (print is now a function, not a statement)
   print(a.__repr__())

   # this will work in Python 3.x
   print(a)

   # 'unicode' has been renamed to 'str', but it's already unicode so
this makes no sense
   print(str(a, encoding='utf8'))

   # to convert it to UTF-8, convert it to a string with str() or
repr() and then print:
   print(str(a).encode('utf8'))
############################

Note that the last one will probably not do what you expect. The
Python 3 'print' function (it's not a statement any more, so you need
parentheses around its argument) wants a Unicode string, so you don't
need to encode it. When you encode a Unicode string as in the last
example, it returns a bytes string (an array of bytes), which looks
like this: b'My name is M\xc3\xbcller'  The print function wants
Unicode, though, so it takes this unexpected object and calls str() on
it, hence the odd display.

Hope that helps!

Chris Angelico

[toc] | [prev] | [next] | [standalone]

#6808 — Re: Thanks for all responses

From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Date	2011-06-01 19:29 +0200
Subject	Re: Thanks for all responses
Message-ID	<4de67709$0$6572$9b4e6d93@newsspool3.arcor-online.net>
In reply to	#6759

Am 31.05.11 23:56, schrieb Chris Angelico:
> On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> Whenever i 'cross the border' of my program, i have to encode the 'list
>> of bytes' to an unicode string or decode the unicode string to a 'list
>> of bytes' which is meaningful to the world outside.
> 
> Most people use "encode" and "decode" the other way around; you encode
> a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
> you're correct.

Ok. I think i will adapt to the majority in this point.
I think i mixed up
unicodestring=unicode(bytestring,encoding='utf8')
and
bytestring=u'unicodestring'.encode('utf8')

> 
>> So encode early, decode lately means, to do it as near to the border as
>> possible and to encode/decode i need a coding system, for example 'utf8'
> 

I think i should change this to decode early, encode lately.

> Correct on both counts.
> 
>> That means, there should be an encoding/decoding possibility to every
>> interface i can use: files, stdin, stdout, stderr, gui (should be the
>> most important ones).
> 
> The file objects (as returned by open()) have an encoding, which
> (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
> might well accept Unicode strings directly - check the docs.
> 
>>    def __repr__(self):
>>        return u'My name is %s' % self.Name
> 
> This means that repr() will return a Unicode string.
> 
>>    # this does work
>>    print a.__repr__()
>>
>>    # throws an error if default encoding is ascii
>>    # but works if default encoding is utf8
>>    print a
>>
>>    # throws an error because a is not a string
>>    print unicode(a, encoding='utf8')
> 
> The __repr__ function is supposed to return a string object, in Python
> 2. See http://docs.python.org/reference/datamodel.html#object.__repr__
> for that and other advice on writing __repr__. The problems you're
> seeing are a result of the built-in repr() function calling
> a.__repr__() and then treating the return value as an ASCII str, not a
> Unicode string.
> 
> This would work:
>     def __repr__(self):
>         return (u'My name is %s' % self.Name).encode('utf8')
> 
> Alternatively, migrate to Python 3, where the default is Unicode
> strings. I tested this in Python 3.2 on Windows, but it should work on
> anything in the 3.x branch:
> 
> class NoEnc:
> 	def __init__(self,Name=None):
> 		self.Name=Name
> 	def __repr__(self):
> 		return 'My name is %s' % self.Name
> 
> if __name__ == '__main__':
> 
>    a = NoEnc('Müller')
> 
>    # this will still work (print is now a function, not a statement)
>    print(a.__repr__())
> 
>    # this will work in Python 3.x
>    print(a)
> 
>    # 'unicode' has been renamed to 'str', but it's already unicode so
> this makes no sense
>    print(str(a, encoding='utf8'))
> 
>    # to convert it to UTF-8, convert it to a string with str() or
> repr() and then print:
>    print(str(a).encode('utf8'))
> ############################
> 
> Note that the last one will probably not do what you expect. The
> Python 3 'print' function (it's not a statement any more, so you need
> parentheses around its argument) wants a Unicode string, so you don't
> need to encode it. When you encode a Unicode string as in the last
> example, it returns a bytes string (an array of bytes), which looks
> like this: b'My name is M\xc3\xbcller'  The print function wants
> Unicode, though, so it takes this unexpected object and calls str() on
> it, hence the odd display.
> 
> Hope that helps!

Yes it helped a lot. One last question here: When i have free choice and
i dont know Python 2 and Python 3 very good: What would be the
recommended choice?

> 
> Chris Angelico

Wolfgang

[toc] | [prev] | [next] | [standalone]

#6814 — Re: Thanks for all responses

From	Chris Angelico <rosuav@gmail.com>
Date	2011-06-02 04:38 +1000
Subject	Re: Thanks for all responses
Message-ID	<mailman.2377.1306953519.9059.python-list@python.org>
In reply to	#6808

On Thu, Jun 2, 2011 at 3:29 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> Yes it helped a lot. One last question here: When i have free choice and
> i dont know Python 2 and Python 3 very good: What would be the
> recommended choice?

Generally, Python 3. Unless there's something you really need in
Python 2 (a module that isn't available in 3.x, for instance, or
you're deploying to a site that doesn't have Python 3 installed), it's
worth going with the newer one.

Chris Angelico

[toc] | [prev] | [standalone]

csiph-web

sqlalchemy and Unicode strings: errormessage

Contents

#6674 — sqlalchemy and Unicode strings: errormessage

#6721

#6734

#6724

#6732

#6735

#6738

#6741

#6746

#6754 — Thanks for all responses

#6759 — Re: Thanks for all responses

#6808 — Re: Thanks for all responses

#6814 — Re: Thanks for all responses