Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #6674 > unrolled thread
| Started by | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| First post | 2011-05-30 23:40 +0200 |
| Last post | 2011-06-02 04:38 +1000 |
| Articles | 13 — 6 participants |
Back to article view | Back to comp.lang.python
sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-30 23:40 +0200
Re: sqlalchemy and Unicode strings: errormessage Chris Withers <chris@simplistix.co.uk> - 2011-05-31 10:55 +0100
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:47 +0200
Re: sqlalchemy and Unicode strings: errormessage Daniel Kluev <dan.kluev@gmail.com> - 2011-05-31 22:32 +1100
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:45 +0200
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 18:10 +0200
Re: sqlalchemy and Unicode strings: errormessage Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-31 09:42 -0700
RE: sqlalchemy and Unicode strings: errormessage "Prasad, Ramit" <ramit.prasad@jpmchase.com> - 2011-05-31 12:31 -0400
Re: sqlalchemy and Unicode strings: errormessage Chris Angelico <rosuav@gmail.com> - 2011-06-01 03:19 +1000
Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 21:52 +0200
Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-01 07:56 +1000
Re: Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-06-01 19:29 +0200
Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-02 04:38 +1000
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-05-30 23:40 +0200 |
| Subject | sqlalchemy and Unicode strings: errormessage |
| Message-ID | <4de40ee8$0$6623$9b4e6d93@newsspool2.arcor-online.net> |
Hi,
I am trying to build an application using sqlalchemy.
in principle i have the structure
#==============================================
from sqlalchemy import *
from sqlalchemy.orm import *
metadata = MetaData('sqlite://')
a_table = Table('tf_lehrer', metadata,
Column('id', Integer, primary_key=True),
Column('Kuerzel', Text),
Column('Name', Text))
A_class = Class(object):
def __init__(self, Kuerzel, Name)
self.Kuerzel=Kuerzel
self.Name=Name
mapper(A_class, a_table)
A_record = A_class('BUM', 'Bäumer')
Session = sessionmaker()
session = Session()
session.add(A_record)
session.flush()
#================================================
At this time it runs to the line
session.flush()
where i get the following errormessage:
sqlalchemy.exc.ProgrammingError: (ProgrammingError) You must not use
8-bit bytestrings unless you use a text_factory that can interpret 8-bit
bytestrings (like text_factory = str). It is highly recommended that you
instead just switch your application to Unicode strings. u'INSERT INTO
tf_lehrer ("Kuerzel", "Name") VALUES (?, ?)' ('BUM', 'B\xc3\xa4umer')
but where can i switch my application to Unicode strings?
Thank you for all hints
Wolfgang
[toc] | [next] | [standalone]
| From | Chris Withers <chris@simplistix.co.uk> |
|---|---|
| Date | 2011-05-31 10:55 +0100 |
| Message-ID | <mailman.2313.1306835718.9059.python-list@python.org> |
| In reply to | #6674 |
Hi Wolfgang,
On 30/05/2011 22:40, Wolfgang Meiners wrote:
> I am trying to build an application using sqlalchemy.
You're likely to get much better help here:
http://www.sqlalchemy.org/support.html#mailinglist
When you post there, make sure you include:
- what python version you're using
- what sqlalchemy version you're using
cheers,
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-05-31 17:47 +0200 |
| Message-ID | <4de50d82$0$6538$9b4e6d93@newsspool4.arcor-online.net> |
| In reply to | #6721 |
Am 31.05.11 11:55, schrieb Chris Withers: > Hi Wolfgang, > > On 30/05/2011 22:40, Wolfgang Meiners wrote: >> I am trying to build an application using sqlalchemy. > > You're likely to get much better help here: > > http://www.sqlalchemy.org/support.html#mailinglist > > When you post there, make sure you include: > > - what python version you're using > - what sqlalchemy version you're using > > cheers, > > Chris > Thank you for pointing me to this list. I will have a look to it. At the moment i think i am really struggeling with python and uft8. Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Daniel Kluev <dan.kluev@gmail.com> |
|---|---|
| Date | 2011-05-31 22:32 +1100 |
| Message-ID | <mailman.2315.1306841548.9059.python-list@python.org> |
| In reply to | #6674 |
On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> metadata = MetaData('sqlite://')
> a_table = Table('tf_lehrer', metadata,
> Column('id', Integer, primary_key=True),
> Column('Kuerzel', Text),
> Column('Name', Text))
Use UnicodeText instead of Text.
> A_record = A_class('BUM', 'Bäumer')
If this is python2.x, use u'Bäumer' instead.
--
With best regards,
Daniel Kluev
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-05-31 17:45 +0200 |
| Message-ID | <4de50cfd$0$6538$9b4e6d93@newsspool4.arcor-online.net> |
| In reply to | #6724 |
Am 31.05.11 13:32, schrieb Daniel Kluev:
> On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> metadata = MetaData('sqlite://')
>> a_table = Table('tf_lehrer', metadata,
>> Column('id', Integer, primary_key=True),
>> Column('Kuerzel', Text),
>> Column('Name', Text))
>
> Use UnicodeText instead of Text.
>
>> A_record = A_class('BUM', 'Bäumer')
>
> If this is python2.x, use u'Bäumer' instead.
>
>
Thank you Daniel.
So i came a little bit closer to the solution. Actually i dont write the
strings in a python program but i read them from a file, which is
utf8-encoded.
So i changed the lines
for line in open(file,'r'):
line = line.strip()
first to
for line in open(file,'r'):
line = unicode(line.strip())
and finally to
for line in open(file,'r'):
line = unicode(line.strip(),'utf8')
and now i get really utf8-strings. It does work but i dont know why it
works. For me it looks like i change an utf8-string to an utf8-string.
By the way: when i run a python program from eclipse, then
print sys.getdefaultencoding()
returns utf-8
and when i run the same python program from the command line, then
print sys.getdefaultencoding()
returns ascii
but my locale is set to
$ locale
LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"
I think, utf8 is somewhat confusing in python - at least to me.
Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-05-31 18:10 +0200 |
| Message-ID | <4de51303$0$6557$9b4e6d93@newsspool4.arcor-online.net> |
| In reply to | #6732 |
I just found a second method on
http://docs.python.org/howto/unicode
you can use tho module codecs and then simply write
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
print repr(line)
Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Benjamin Kaplan <benjamin.kaplan@case.edu> |
|---|---|
| Date | 2011-05-31 09:42 -0700 |
| Message-ID | <mailman.2321.1306860238.9059.python-list@python.org> |
| In reply to | #6732 |
On Tue, May 31, 2011 at 8:45 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> Am 31.05.11 13:32, schrieb Daniel Kluev:
>> On Tue, May 31, 2011 at 8:40 AM, Wolfgang Meiners
>> <WolfgangMeiners01@web.de> wrote:
>>> metadata = MetaData('sqlite://')
>>> a_table = Table('tf_lehrer', metadata,
>>> Column('id', Integer, primary_key=True),
>>> Column('Kuerzel', Text),
>>> Column('Name', Text))
>>
>> Use UnicodeText instead of Text.
>>
>>> A_record = A_class('BUM', 'Bäumer')
>>
>> If this is python2.x, use u'Bäumer' instead.
>>
>>
>
> Thank you Daniel.
> So i came a little bit closer to the solution. Actually i dont write the
> strings in a python program but i read them from a file, which is
> utf8-encoded.
>
> So i changed the lines
>
> for line in open(file,'r'):
> line = line.strip()
>
> first to
>
> for line in open(file,'r'):
> line = unicode(line.strip())
>
> and finally to
>
> for line in open(file,'r'):
> line = unicode(line.strip(),'utf8')
>
> and now i get really utf8-strings. It does work but i dont know why it
> works. For me it looks like i change an utf8-string to an utf8-string.
>
There's no such thing as a UTF-8 string. You have a list of bytes
(byte string) and you have a list of characters (unicode). UTF-8 is a
function that can convert bytes into characters (and the reverse). You
may recognize that the list of bytes was encoded using UTF-8 but the
computer does not unless you explicitly tell it to. Does that help
clear it up?
[toc] | [prev] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmchase.com> |
|---|---|
| Date | 2011-05-31 12:31 -0400 |
| Message-ID | <mailman.2322.1306860978.9059.python-list@python.org> |
| In reply to | #6732 |
>line = unicode(line.strip(),'utf8')
>and now i get really utf8-strings. It does work but i dont know why it works. For me it looks like i change an utf8-string to an utf8-string.
I would like to point out that UTF-8 is not exactly "Unicode". From what I understand, Unicode is a standard while UTF-8 is like an implementation of that standard (called an encoding). Being able to convert to Unicode (the standard) should mean you are then able to convert to any encoding that supports the Unicode characters used.
As you can see below a string in UTF-8 is actually not Unicode. (decode converts to Unicode, encode converts away from Unicode)
>>> type(u'test'.encode('utf8'))
<type 'str'>
>>> type('test'.decode('utf8'))
<type 'unicode'>
>>> type('test'.encode('utf8'))
<type 'str'>
>>> type(u'test')
<type 'unicode'>
Ramit
Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-06-01 03:19 +1000 |
| Message-ID | <mailman.2326.1306862395.9059.python-list@python.org> |
| In reply to | #6732 |
On Wed, Jun 1, 2011 at 2:31 AM, Prasad, Ramit <ramit.prasad@jpmchase.com> wrote: >>line = unicode(line.strip(),'utf8') >>and now i get really utf8-strings. It does work but i dont know why it works. For me it looks like i change an utf8-string to an utf8-string. > > > I would like to point out that UTF-8 is not exactly "Unicode". From what I understand, Unicode is a standard while UTF-8 is like an implementation of that standard (called an encoding). Being able to convert to Unicode (the standard) should mean you are then able to convert to any encoding that supports the Unicode characters used. Unicode defines characters; UTF-8 is one way (of many) to represent those characters in bytes. UTF-16 and UTF-32 are other ways of representing those characters in bytes, and internally, Python probably uses one of them - but there is no guarantee, and you should never need to know. Unicode strings can be stored in memory and manipulated in various ways, but they're a high level construct on par with lists and dictionaries - they can't be stored on disk or transmitted to another computer without using an encoding system. UTF-8 is an efficient way to translate Unicode text consisting primarily of low codepoint characters into bytes. It's not so much an implementation of Unicode as a means of converting a mythical concept of "Unicode characters" into a concrete stream of bytes. Hope that clarifies things a little! Chris Angelico
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-05-31 21:52 +0200 |
| Subject | Thanks for all responses |
| Message-ID | <4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net> |
| In reply to | #6732 |
I think it helped me very much to understand the problem.
So if i deal with nonascii strings, i have a 'list of bytes' and need an
encoding to interpret this list and transform it to a meaningful unicode
string. Decoding does the opposite.
Whenever i 'cross the border' of my program, i have to encode the 'list
of bytes' to an unicode string or decode the unicode string to a 'list
of bytes' which is meaningful to the world outside.
So encode early, decode lately means, to do it as near to the border as
possible and to encode/decode i need a coding system, for example 'utf8'
That means, there should be an encoding/decoding possibility to every
interface i can use: files, stdin, stdout, stderr, gui (should be the
most important ones).
While trying to understand this, i wrote the following program. Maybe
someone can give me a hint, how to print correctly:
######################################################
#! python
# -*- coding: utf-8 -*-
class EncTest:
def __init__(self,Name=None):
self.Name=unicode(Name, encoding='utf8')
def __repr__(self):
return u'My name is %s' % self.Name
if __name__ == '__main__':
a = EncTest('Müller')
# this does work
print a.__repr__()
# throws an error if default encoding is ascii
# but works if default encoding is utf8
print a
# throws an error because a is not a string
print unicode(a, encoding='utf8')
######################################################
Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-06-01 07:56 +1000 |
| Subject | Re: Thanks for all responses |
| Message-ID | <mailman.2337.1306878988.9059.python-list@python.org> |
| In reply to | #6754 |
On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
<WolfgangMeiners01@web.de> wrote:
> Whenever i 'cross the border' of my program, i have to encode the 'list
> of bytes' to an unicode string or decode the unicode string to a 'list
> of bytes' which is meaningful to the world outside.
Most people use "encode" and "decode" the other way around; you encode
a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
you're correct.
> So encode early, decode lately means, to do it as near to the border as
> possible and to encode/decode i need a coding system, for example 'utf8'
Correct on both counts.
> That means, there should be an encoding/decoding possibility to every
> interface i can use: files, stdin, stdout, stderr, gui (should be the
> most important ones).
The file objects (as returned by open()) have an encoding, which
(IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
might well accept Unicode strings directly - check the docs.
> def __repr__(self):
> return u'My name is %s' % self.Name
This means that repr() will return a Unicode string.
> # this does work
> print a.__repr__()
>
> # throws an error if default encoding is ascii
> # but works if default encoding is utf8
> print a
>
> # throws an error because a is not a string
> print unicode(a, encoding='utf8')
The __repr__ function is supposed to return a string object, in Python
2. See http://docs.python.org/reference/datamodel.html#object.__repr__
for that and other advice on writing __repr__. The problems you're
seeing are a result of the built-in repr() function calling
a.__repr__() and then treating the return value as an ASCII str, not a
Unicode string.
This would work:
def __repr__(self):
return (u'My name is %s' % self.Name).encode('utf8')
Alternatively, migrate to Python 3, where the default is Unicode
strings. I tested this in Python 3.2 on Windows, but it should work on
anything in the 3.x branch:
class NoEnc:
def __init__(self,Name=None):
self.Name=Name
def __repr__(self):
return 'My name is %s' % self.Name
if __name__ == '__main__':
a = NoEnc('Müller')
# this will still work (print is now a function, not a statement)
print(a.__repr__())
# this will work in Python 3.x
print(a)
# 'unicode' has been renamed to 'str', but it's already unicode so
this makes no sense
print(str(a, encoding='utf8'))
# to convert it to UTF-8, convert it to a string with str() or
repr() and then print:
print(str(a).encode('utf8'))
############################
Note that the last one will probably not do what you expect. The
Python 3 'print' function (it's not a statement any more, so you need
parentheses around its argument) wants a Unicode string, so you don't
need to encode it. When you encode a Unicode string as in the last
example, it returns a bytes string (an array of bytes), which looks
like this: b'My name is M\xc3\xbcller' The print function wants
Unicode, though, so it takes this unexpected object and calls str() on
it, hence the odd display.
Hope that helps!
Chris Angelico
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
|---|---|
| Date | 2011-06-01 19:29 +0200 |
| Subject | Re: Thanks for all responses |
| Message-ID | <4de67709$0$6572$9b4e6d93@newsspool3.arcor-online.net> |
| In reply to | #6759 |
Am 31.05.11 23:56, schrieb Chris Angelico:
> On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> Whenever i 'cross the border' of my program, i have to encode the 'list
>> of bytes' to an unicode string or decode the unicode string to a 'list
>> of bytes' which is meaningful to the world outside.
>
> Most people use "encode" and "decode" the other way around; you encode
> a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
> you're correct.
Ok. I think i will adapt to the majority in this point.
I think i mixed up
unicodestring=unicode(bytestring,encoding='utf8')
and
bytestring=u'unicodestring'.encode('utf8')
>
>> So encode early, decode lately means, to do it as near to the border as
>> possible and to encode/decode i need a coding system, for example 'utf8'
>
I think i should change this to decode early, encode lately.
> Correct on both counts.
>
>> That means, there should be an encoding/decoding possibility to every
>> interface i can use: files, stdin, stdout, stderr, gui (should be the
>> most important ones).
>
> The file objects (as returned by open()) have an encoding, which
> (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
> might well accept Unicode strings directly - check the docs.
>
>> def __repr__(self):
>> return u'My name is %s' % self.Name
>
> This means that repr() will return a Unicode string.
>
>> # this does work
>> print a.__repr__()
>>
>> # throws an error if default encoding is ascii
>> # but works if default encoding is utf8
>> print a
>>
>> # throws an error because a is not a string
>> print unicode(a, encoding='utf8')
>
> The __repr__ function is supposed to return a string object, in Python
> 2. See http://docs.python.org/reference/datamodel.html#object.__repr__
> for that and other advice on writing __repr__. The problems you're
> seeing are a result of the built-in repr() function calling
> a.__repr__() and then treating the return value as an ASCII str, not a
> Unicode string.
>
> This would work:
> def __repr__(self):
> return (u'My name is %s' % self.Name).encode('utf8')
>
> Alternatively, migrate to Python 3, where the default is Unicode
> strings. I tested this in Python 3.2 on Windows, but it should work on
> anything in the 3.x branch:
>
> class NoEnc:
> def __init__(self,Name=None):
> self.Name=Name
> def __repr__(self):
> return 'My name is %s' % self.Name
>
> if __name__ == '__main__':
>
> a = NoEnc('Müller')
>
> # this will still work (print is now a function, not a statement)
> print(a.__repr__())
>
> # this will work in Python 3.x
> print(a)
>
> # 'unicode' has been renamed to 'str', but it's already unicode so
> this makes no sense
> print(str(a, encoding='utf8'))
>
> # to convert it to UTF-8, convert it to a string with str() or
> repr() and then print:
> print(str(a).encode('utf8'))
> ############################
>
> Note that the last one will probably not do what you expect. The
> Python 3 'print' function (it's not a statement any more, so you need
> parentheses around its argument) wants a Unicode string, so you don't
> need to encode it. When you encode a Unicode string as in the last
> example, it returns a bytes string (an array of bytes), which looks
> like this: b'My name is M\xc3\xbcller' The print function wants
> Unicode, though, so it takes this unexpected object and calls str() on
> it, hence the odd display.
>
> Hope that helps!
Yes it helped a lot. One last question here: When i have free choice and
i dont know Python 2 and Python 3 very good: What would be the
recommended choice?
>
> Chris Angelico
Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-06-02 04:38 +1000 |
| Subject | Re: Thanks for all responses |
| Message-ID | <mailman.2377.1306953519.9059.python-list@python.org> |
| In reply to | #6808 |
On Thu, Jun 2, 2011 at 3:29 AM, Wolfgang Meiners <WolfgangMeiners01@web.de> wrote: > Yes it helped a lot. One last question here: When i have free choice and > i dont know Python 2 and Python 3 very good: What would be the > recommended choice? Generally, Python 3. Unless there's something you really need in Python 2 (a module that isn't available in 3.x, for instance, or you're deploying to a site that doesn't have Python 3 installed), it's worth going with the newer one. Chris Angelico
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web