Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #46166
| From | "Joseph L. Casale" <jcasale@activenetwerx.com> |
|---|---|
| Subject | RE: Ldap module and base64 oncoding |
| Date | 2013-05-27 05:15 +0000 |
| References | <knt87q$cm3$1@dont-email.me> <kntomp$jsn$1@dont-email.me> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2231.1369631792.3114.python-list@python.org> (permalink) |
Hi Michael,
> Processing LDIF is one thing, doing LDAP operations another.
>
> LDIF itself is meant to be ASCII-clean. But each attribute value can carry any
> byte sequence (e.g. attribute 'jpegPhoto'). There's no further processing by
> module LDIF - it simply returns byte sequences.
>
> The access protocol LDAPv3 mandates UTF-8 encoding for Unicode strings on the
> wire if attribute syntax is DirectoryString, IA5String (mainly ASCII) or similar.
>
> So if you're LDIF input returns UTF-16 encoded attribute values for e.g.
> attribute 'cn' or 'o' or another attribute not being of OctetString or Binary
> syntax something's wrong with the producer of the LDIF data.
That could be, I am using ms's ldifde.exe to dump a domino and AD directory for
comparative processing. The problem is I don't have much control on the data in
the directory and I do know that DN's have non ascii characters unique to the
> I wonder what the string really is. At least the base64-encoding you provided
> before decodes as UTF-8 but I'm not sure whether it's the right sequence of
> Unicode code points you're expecting.
>
> >>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8')
> u'det\\3310wbb\\pg'
>
> I still can't figure out what you're really doing though. I'd recommend to
> strip down your operations to a very simple test code snippet illustrating the
> issue and post that here.
So I have removed all my likely broken attempts at working with this data and will
soon have some simple code but at this point I may have an indication of what is
awry with my data.
After parsing the data for a user I am simply taking a value from the ldif file and writing
it back out to another which fails, the value parsed is:
officestreetaddress:: T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ==
File "C:\Python27\lib\site-packages\ldif.py", line 202, in unparse
self._unparseChangeRecord(record)
File "C:\Python27\lib\site-packages\ldif.py", line 181, in _unparseChangeRecord
self._unparseAttrTypeandValue(mod_type,mod_val)
File "C:\Python27\lib\site-packages\ldif.py", line 142, in _unparseAttrTypeandValue
self._unfoldLDIFLine(':: '.join([attr_type,base64.encodestring(attr_value).replace('\n','')]))
File "C:\Python27\lib\base64.py", line 315, in encodestring
pieces.append(binascii.b2a_base64(chunk))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 7: ordinal not in range(128)
> c:\python27\lib\base64.py(315)encodestring()
-> pieces.append(binascii.b2a_base64(chunk))
(Pdb) l
310 def encodestring(s):
311 """Encode a string into multiple lines of base-64 data."""
312 pieces = []
313 for i in range(0, len(s), MAXBINSIZE):
314 chunk = s[i : i + MAXBINSIZE]
315 -> pieces.append(binascii.b2a_base64(chunk))
316 return "".join(pieces)
317
318
319 def decodestring(s):
320 """Decode a string."""
(Pdb) args
s = Otto-Meßmer-Straße 1
So moving up a frame or two and looking at the entry dict, I see a modlist entry of:
('streetAddress', [u'Otto-Me\xdfmer-Stra\xdfe 1']) which is correct:
In [2]: 'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=='.decode('base64').decode('utf-8')
Out[2]: u'Otto-Me\xdfmer-Stra\xdfe 1'
Looking at the stack trace, I think I see the issue:
(Pdb) import base64
(Pdb) base64.encodestring(u'Otto-Me\xdfmer-Stra\xdfe 1'.encode('utf-8')).replace('\n','')
'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=='
I now have the exact the value I started with. Ensuring where I ever handle the original
values that I return utf-8 decoded objects for use in a modlist to later write and Sub
classing LDIFWriter and overriding _unparseAttrTypeandValue to do the encoding has
eliminated all the errors.
What remains finally is ldifde.exe's output of what looks like U+00BF, or an inverted question
mark for some values, otherwise this issue looks solved.
Thanks for everything,
jlc
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-24 21:00 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-26 17:07 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-26 16:19 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-26 21:48 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-27 05:15 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-27 09:56 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-28 00:12 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-28 09:45 +0200
Re: Ldap module and base64 oncoding dieter <dieter@handshake.de> - 2013-05-27 08:04 +0200
csiph-web