Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #46166
| Path | csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <jcasale@activenetwerx.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'broken': 0.04; 'skip:[ 20': 0.04; 'syntax': 0.04; 'encoding': 0.05; 'output': 0.05; 'args': 0.07; 'attribute': 0.07; 'binary': 0.07; 'encoded': 0.07; 'processing.': 0.07; 'utf-8': 0.07; 'string': 0.09; '[2]:': 0.09; 'ascii': 0.09; 'chunk': 0.09; 'decodes': 0.09; 'encode': 0.09; 'indication': 0.09; 'issue:': 0.09; 'parsed': 0.09; 'parsing': 0.09; 'sequences.': 0.09; 'snippet': 0.09; 'sub': 0.09; 'subject:module': 0.09; 'def': 0.12; '(mainly': 0.16; '(pdb)': 0.16; '314': 0.16; '315': 0.16; '315,': 0.16; 'base64': 0.16; 'codec': 0.16; 'dump': 0.16; 'illustrating': 0.16; 'ldif': 0.16; 'michael,': 0.16; 'ordinal': 0.16; 'overriding': 0.16; 'range(0,': 0.16; 'received:172.18.0': 0.16; "skip:' 60": 0.16; 'module': 0.19; 'pieces': 0.19; "skip:' 30": 0.19; 'skip:1 30': 0.19; 'skip:p 40': 0.19; 'stack': 0.19; 'later': 0.20; 'meant': 0.20; '>>>': 0.22; 'input': 0.22; 'import': 0.22; 'to:name:python- list@python.org': 0.22; 'byte': 0.24; 'errors.': 0.24; 'unicode': 0.24; 'non': 0.24; 'looks': 0.24; 'question': 0.24; 'skip:" 40': 0.26; 'post': 0.26; 'least': 0.26; 'skip:" 20': 0.27; 'skip:_ 20': 0.27; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'to:2**1': 0.27; 'point': 0.28; 'character': 0.29; 'points': 0.29; 'wonder': 0.29; 'characters': 0.30; "i'm": 0.30; 'code': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'another.': 0.31; 'strip': 0.31; 'though.': 0.31; 'file': 0.32; 'figure': 0.32; 'another': 0.32; '(e.g.': 0.33; "i'd": 0.34; 'could': 0.34; 'problem': 0.35; "can't": 0.35; 'skip:s 30': 0.35; 'skip:u 20': 0.35; 'objects': 0.35; 'operations': 0.35; 'test': 0.35; 'but': 0.35; 'really': 0.36; 'sequence': 0.36; 'doing': 0.36; 'entry': 0.36; 'thanks': 0.36; 'wrong': 0.37; 'two': 0.37; 'being': 0.38; 'skip:o 20': 0.38; 'e.g.': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'itself': 0.39; 'moving': 0.39; 'skip:b 40': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'skip:u 10': 0.60; 'is.': 0.60; 'received:unknown': 0.61; 'skip:t 30': 0.61; 'simply': 0.61; 'simple': 0.61; "you're": 0.61; 'further': 0.61; 'back': 0.62; 'soon': 0.63; 'taking': 0.65; 'finally': 0.65; 'of:': 0.68; 'to:charset:iso-8859-1': 0.74; 'comparative': 0.84; 'dict,': 0.84; 'everything,': 0.84; 'fails,': 0.84; 'thing,': 0.91 |
| X-Cloudmark-SP-Filtered | true |
| X-Cloudmark-SP-Result | v=1.1 cv=GLqYwptGXHjY6tPk5kWRtHXJM/YfZPTWiIs1znw4zms= c=1 sm=1 a=CRTDazI5n6YA:10 a=xv9iwkAQU-cA:10 a=7PYXob_7ZXMA:10 a=BLceEmwcHowA:10 a=8nJEP1OIZ-IA:10 a=xqWC_Br6kY4A:10 a=oNw28mxuUhXRB3mVwYQ4Ag==:17 a=2NS8C4AQI35nV9HUg3EA:9 a=wPNLvfGTeEIA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 |
| From | "Joseph L. Casale" <jcasale@activenetwerx.com> |
| To | Michael Ströder <michael@stroeder.com>, "python-list@python.org" <python-list@python.org> |
| Subject | RE: Ldap module and base64 oncoding |
| Thread-Topic | Ldap module and base64 oncoding |
| Thread-Index | AQHOWkoD2gcVydFyA0eXB9j8yy7c/ZkYU51Q |
| Date | Mon, 27 May 2013 05:15:01 +0000 |
| References | <knt87q$cm3$1@dont-email.me> <kntomp$jsn$1@dont-email.me> |
| In-Reply-To | <kntomp$jsn$1@dont-email.me> |
| Accept-Language | en-US |
| Content-Language | en-US |
| X-MS-Has-Attach | |
| X-MS-TNEF-Correlator | |
| x-originating-ip | [172.18.0.200] |
| Content-Type | text/plain; charset="iso-8859-1" |
| Content-Transfer-Encoding | quoted-printable |
| MIME-Version | 1.0 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2231.1369631792.3114.python-list@python.org> (permalink) |
| Lines | 118 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1369631792 news.xs4all.nl 15863 [2001:888:2000:d::a6]:42263 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:46166 |
Show key headers only | View raw
Hi Michael,
> Processing LDIF is one thing, doing LDAP operations another.
>
> LDIF itself is meant to be ASCII-clean. But each attribute value can carry any
> byte sequence (e.g. attribute 'jpegPhoto'). There's no further processing by
> module LDIF - it simply returns byte sequences.
>
> The access protocol LDAPv3 mandates UTF-8 encoding for Unicode strings on the
> wire if attribute syntax is DirectoryString, IA5String (mainly ASCII) or similar.
>
> So if you're LDIF input returns UTF-16 encoded attribute values for e.g.
> attribute 'cn' or 'o' or another attribute not being of OctetString or Binary
> syntax something's wrong with the producer of the LDIF data.
That could be, I am using ms's ldifde.exe to dump a domino and AD directory for
comparative processing. The problem is I don't have much control on the data in
the directory and I do know that DN's have non ascii characters unique to the
> I wonder what the string really is. At least the base64-encoding you provided
> before decodes as UTF-8 but I'm not sure whether it's the right sequence of
> Unicode code points you're expecting.
>
> >>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8')
> u'det\\3310wbb\\pg'
>
> I still can't figure out what you're really doing though. I'd recommend to
> strip down your operations to a very simple test code snippet illustrating the
> issue and post that here.
So I have removed all my likely broken attempts at working with this data and will
soon have some simple code but at this point I may have an indication of what is
awry with my data.
After parsing the data for a user I am simply taking a value from the ldif file and writing
it back out to another which fails, the value parsed is:
officestreetaddress:: T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ==
File "C:\Python27\lib\site-packages\ldif.py", line 202, in unparse
self._unparseChangeRecord(record)
File "C:\Python27\lib\site-packages\ldif.py", line 181, in _unparseChangeRecord
self._unparseAttrTypeandValue(mod_type,mod_val)
File "C:\Python27\lib\site-packages\ldif.py", line 142, in _unparseAttrTypeandValue
self._unfoldLDIFLine(':: '.join([attr_type,base64.encodestring(attr_value).replace('\n','')]))
File "C:\Python27\lib\base64.py", line 315, in encodestring
pieces.append(binascii.b2a_base64(chunk))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 7: ordinal not in range(128)
> c:\python27\lib\base64.py(315)encodestring()
-> pieces.append(binascii.b2a_base64(chunk))
(Pdb) l
310 def encodestring(s):
311 """Encode a string into multiple lines of base-64 data."""
312 pieces = []
313 for i in range(0, len(s), MAXBINSIZE):
314 chunk = s[i : i + MAXBINSIZE]
315 -> pieces.append(binascii.b2a_base64(chunk))
316 return "".join(pieces)
317
318
319 def decodestring(s):
320 """Decode a string."""
(Pdb) args
s = Otto-Meßmer-Straße 1
So moving up a frame or two and looking at the entry dict, I see a modlist entry of:
('streetAddress', [u'Otto-Me\xdfmer-Stra\xdfe 1']) which is correct:
In [2]: 'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=='.decode('base64').decode('utf-8')
Out[2]: u'Otto-Me\xdfmer-Stra\xdfe 1'
Looking at the stack trace, I think I see the issue:
(Pdb) import base64
(Pdb) base64.encodestring(u'Otto-Me\xdfmer-Stra\xdfe 1'.encode('utf-8')).replace('\n','')
'T3R0by1NZcOfbWVyLVN0cmHDn2UgMQ=='
I now have the exact the value I started with. Ensuring where I ever handle the original
values that I return utf-8 decoded objects for use in a modlist to later write and Sub
classing LDIFWriter and overriding _unparseAttrTypeandValue to do the encoding has
eliminated all the errors.
What remains finally is ldifde.exe's output of what looks like U+00BF, or an inverted question
mark for some values, otherwise this issue looks solved.
Thanks for everything,
jlc
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-24 21:00 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-26 17:07 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-26 16:19 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-26 21:48 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-27 05:15 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-27 09:56 +0200
RE: Ldap module and base64 oncoding "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-05-28 00:12 +0000
Re: Ldap module and base64 oncoding Michael Ströder <michael@stroeder.com> - 2013-05-28 09:45 +0200
Re: Ldap module and base64 oncoding dieter <dieter@handshake.de> - 2013-05-27 08:04 +0200
csiph-web