Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51641 > unrolled thread

Re: Problem with psycopg2, bytea, and memoryview

Started by"Frank Millman" <frank@chagford.com>
First post2013-07-31 13:43 +0200
Last post2013-08-01 10:03 +0200
Articles 3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Problem with psycopg2, bytea, and memoryview "Frank Millman" <frank@chagford.com> - 2013-07-31 13:43 +0200
    Re: Problem with psycopg2, bytea, and memoryview Neil Cerutti <neilc@norwich.edu> - 2013-07-31 14:08 +0000
      Re: Problem with psycopg2, bytea, and memoryview "Frank Millman" <frank@chagford.com> - 2013-08-01 10:03 +0200

#51641 — Re: Problem with psycopg2, bytea, and memoryview

From"Frank Millman" <frank@chagford.com>
Date2013-07-31 13:43 +0200
SubjectRe: Problem with psycopg2, bytea, and memoryview
Message-ID<mailman.10.1375270989.1251.python-list@python.org>
"Antoine Pitrou" <solipsis@pitrou.net> wrote in message 
news:loom.20130731T114936-455@post.gmane.org...
> Frank Millman <frank <at> chagford.com> writes:
>>
>> I have some binary data (a gzipped xml object) that I want to store in a
>> database. For PostgreSQL I use a column with datatype 'bytea', which is
>> their recommended way of storing binary strings.
>>
>> I use psycopg2 to access the database. It returns binary data in the form 
>> of
>> a python 'memoryview'.
>>
> [...]
>>
>> Using MS SQL Server and pyodbc, it returns a byte string, not a 
>> memoryview,
>> and it does compare equal with the original.
>>
>> I can hack my program to use tobytes(), but it would add complication, 
>> and
>> it would be database-specific. I would prefer a cleaner solution.
>
> Just cast the result to bytes (`bytes(row[1])`). It will work both with 
> bytes
> and memoryview objcts.
>
> Regards
>
> Antoine.
>

Thanks for that, Antoine. It is an improvement over tobytes(), but i am 
afraid it is still not ideal for my purposes.

At present, I loop over a range of columns, comparing 'before' and 'after' 
values, without worrying about their types. Strings are returned as str, 
integers are returned as int, etc. Now I will have to check the type of each 
column before deciding whether to cast to 'bytes'.

Can anyone explain *why* the results do not compare equal? If I understood 
the problem, I might be able to find a workaround.

Frank


[toc] | [next] | [standalone]


#51659

FromNeil Cerutti <neilc@norwich.edu>
Date2013-07-31 14:08 +0000
Message-ID<b5sk3cFkiq8U1@mid.individual.net>
In reply to#51641
On 2013-07-31, Frank Millman <frank@chagford.com> wrote:
>
> "Antoine Pitrou" <solipsis@pitrou.net> wrote in message 
> news:loom.20130731T114936-455@post.gmane.org...
>> Frank Millman <frank <at> chagford.com> writes:
>>>
>>> I have some binary data (a gzipped xml object) that I want to store in a
>>> database. For PostgreSQL I use a column with datatype 'bytea', which is
>>> their recommended way of storing binary strings.
>>>
>>> I use psycopg2 to access the database. It returns binary data
>>> in the form of a python 'memoryview'.
>>>
>> [...]
>>>
>>> Using MS SQL Server and pyodbc, it returns a byte string, not
>>> a memoryview, and it does compare equal with the original.
>>>
>>> I can hack my program to use tobytes(), but it would add
>>> complication, and it would be database-specific. I would
>>> prefer a cleaner solution.
>>
>> Just cast the result to bytes (`bytes(row[1])`). It will work
>> both with bytes and memoryview objcts.
>
> Thanks for that, Antoine. It is an improvement over tobytes(),
> but i am afraid it is still not ideal for my purposes.
>
> At present, I loop over a range of columns, comparing 'before'
> and 'after' values, without worrying about their types. Strings
> are returned as str, integers are returned as int, etc. Now I
> will have to check the type of each column before deciding
> whether to cast to 'bytes'.
>
> Can anyone explain *why* the results do not compare equal? If I
> understood the problem, I might be able to find a workaround.

A memoryview will compare equal to another object that supports
the buffer protocol when the format and shape are also equal. The
database must be returning chunks of binary data in a different
shape or format than you are writing it.

Perhaps psycopg2 is returning a chunk of ints when you have
written a chunk of bytes. Check the .format and .shape members of
the return value to see.

>>> x = memoryview(b"12345")
>>> x.format
'B'
>>> x.shape
(5,)
>>> x == b"12345"
True

My guess is you're getting format "I" from psycopg2. Hopefully
there's a way to coerce your desired "B" format interpretation of
the raw data using psycopg2's API.

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#51723

From"Frank Millman" <frank@chagford.com>
Date2013-08-01 10:03 +0200
Message-ID<mailman.70.1375344202.1251.python-list@python.org>
In reply to#51659
"Neil Cerutti" <neilc@norwich.edu> wrote in message 
news:b5sk3cFkiq8U1@mid.individual.net...
> On 2013-07-31, Frank Millman <frank@chagford.com> wrote:
>>
>>
>> Can anyone explain *why* the results do not compare equal? If I
>> understood the problem, I might be able to find a workaround.
>
> A memoryview will compare equal to another object that supports
> the buffer protocol when the format and shape are also equal. The
> database must be returning chunks of binary data in a different
> shape or format than you are writing it.
>
> Perhaps psycopg2 is returning a chunk of ints when you have
> written a chunk of bytes. Check the .format and .shape members of
> the return value to see.
>
>>>> x = memoryview(b"12345")
>>>> x.format
> 'B'
>>>> x.shape
> (5,)
>>>> x == b"12345"
> True
>
> My guess is you're getting format "I" from psycopg2. Hopefully
> there's a way to coerce your desired "B" format interpretation of
> the raw data using psycopg2's API.
>

Thanks very much for the explanation, Neil.

I tried what you suggested, and the object returned by psycopg2 has a format 
of 'c' and a shape of (5,).

I don't know what it means, but luckily I have found a workaround. I 
enquired on the psycopg2 list, and someone explained how I can create an 
extension that forces it to return 'bytes' instead of a 'memoryview'. I 
tested it and it works. Problem solved :-)

For the record, I passed on the suggestion from Antoine and Terry that they 
change their program to return 'bytes'. It will be interesting to see if 
anyone responds.

Thanks again to all for your help.

Frank


[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web