Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
From: "Joseph L. Casale" <jcasale@activenetwerx.com>
To: =?iso-8859-1?Q?Michael_Str=F6der?= <michael@stroeder.com>, "python-list@python.org" <python-list@python.org>
Subject: RE: Ldap module and base64 oncoding
Thread-Topic: Ldap module and base64 oncoding
Thread-Index: AQHOWkoD2gcVydFyA0eXB9j8yy7c/ZkYU51Q
Date: Mon, 27 May 2013 05:15:01 +0000
References: <mailman.2077.1369429274.3114.python-list@python.org>, <knt87q$cm3$1@dont-email.me> <mailman.2178.1369585225.3114.python-list@python.org>, <kntomp$jsn$1@dont-email.me>
In-Reply-To: <kntomp$jsn$1@dont-email.me>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2231.1369631792.3114.python-list@python.org>
Lines: 118
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:46166

Hi Michael,=0A=
=0A=
> Processing LDIF is one thing, doing LDAP operations another.=0A=
> =0A=
> LDIF itself is meant to be ASCII-clean. But each attribute value can carr=
y any=0A=
> byte sequence (e.g. attribute 'jpegPhoto'). There's no further processing=
 by=0A=
> module LDIF - it simply returns byte sequences.=0A=
> =0A=
> The access protocol LDAPv3 mandates UTF-8 encoding for Unicode strings on=
 the=0A=
> wire if attribute syntax is DirectoryString, IA5String (mainly ASCII) or =
similar.=0A=
> =0A=
> So if you're LDIF input returns UTF-16 encoded attribute values for e.g.=
=0A=
> attribute 'cn' or 'o' or another attribute not being of OctetString or Bi=
nary=0A=
> syntax something's wrong with the producer of the LDIF data.=0A=
=0A=
That could be, I am using ms's ldifde.exe to dump a domino and AD directory=
 for=0A=
comparative processing. The problem is I don't have much control on the dat=
a in=0A=
the directory and I do know that DN's have non ascii characters unique to t=
he=0A=
=0A=
> I wonder what the string really is. At least the base64-encoding you prov=
ided=0A=
> before decodes as UTF-8 but I'm not sure whether it's the right sequence =
of=0A=
> Unicode code points you're expecting.=0A=
> =0A=
> >>> 'ZGV0XDMzMTB3YmJccGc=3D'.decode('base64').decode('utf-8')=0A=
> u'det\\3310wbb\\pg'=0A=
> =0A=
> I still can't figure out what you're really doing though. I'd recommend t=
o=0A=
> strip down your operations to a very simple test code snippet illustratin=
g the=0A=
> issue and post that here.=0A=
=0A=
So I have removed all my likely broken attempts at working with this data a=
nd will=0A=
soon have some simple code but at this point I may have an indication of wh=
at is=0A=
awry with my data.=0A=
=0A=
After parsing the data for a user I am simply taking a value from the ldif =
file and writing=0A=
it back out to another which fails, the value parsed is:=0A=
=0A=
officestreetaddress:: T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=3D=3D=0A=
=0A=
=0A=
  File "C:\Python27\lib\site-packages\ldif.py", line 202, in unparse=0A=
    self._unparseChangeRecord(record)=0A=
  File "C:\Python27\lib\site-packages\ldif.py", line 181, in _unparseChange=
Record=0A=
    self._unparseAttrTypeandValue(mod_type,mod_val)=0A=
  File "C:\Python27\lib\site-packages\ldif.py", line 142, in _unparseAttrTy=
peandValue=0A=
    self._unfoldLDIFLine(':: '.join([attr_type,base64.encodestring(attr_val=
ue).replace('\n','')]))=0A=
  File "C:\Python27\lib\base64.py", line 315, in encodestring=0A=
    pieces.append(binascii.b2a_base64(chunk))=0A=
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in positio=
n 7: ordinal not in range(128)=0A=
=0A=
> c:\python27\lib\base64.py(315)encodestring()=0A=
-> pieces.append(binascii.b2a_base64(chunk))=0A=
(Pdb) l=0A=
310     def encodestring(s):=0A=
311         """Encode a string into multiple lines of base-64 data."""=0A=
312         pieces =3D []=0A=
313         for i in range(0, len(s), MAXBINSIZE):=0A=
314             chunk =3D s[i : i + MAXBINSIZE]=0A=
315  ->         pieces.append(binascii.b2a_base64(chunk))=0A=
316         return "".join(pieces)=0A=
317=0A=
318=0A=
319     def decodestring(s):=0A=
320         """Decode a string."""=0A=
(Pdb) args=0A=
s =3D Otto-Me=DFmer-Stra=DFe 1=0A=
=0A=
So moving up a frame or two and looking at the entry dict, I see a modlist =
entry of:=0A=
('streetAddress', [u'Otto-Me\xdfmer-Stra\xdfe 1']) which is correct:=0A=
=0A=
In [2]: 'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=3D=3D'.decode('base64').decode('utf=
-8')=0A=
Out[2]: u'Otto-Me\xdfmer-Stra\xdfe 1'=0A=
=0A=
Looking at the stack trace, I think I see the issue:=0A=
(Pdb) import base64=0A=
(Pdb) base64.encodestring(u'Otto-Me\xdfmer-Stra\xdfe 1'.encode('utf-8')).re=
place('\n','')=0A=
'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=3D=3D'=0A=
=0A=
I now have the exact the value I started with. Ensuring where I ever handle=
 the original=0A=
values that I return utf-8 decoded objects for use in a modlist to later wr=
ite and Sub=0A=
classing LDIFWriter and overriding _unparseAttrTypeandValue to do the encod=
ing has=0A=
eliminated all the errors.=0A=
=0A=
What remains finally is ldifde.exe's output of what looks like U+00BF, or a=
n inverted question=0A=
mark for some values, otherwise this issue looks solved.=0A=
=0A=
Thanks for everything,=0A=
jlc=0A=
=0A=
=0A=
=0A=