Newsgroups: perl.ldap Path: csiph.com!xmission!news.glorb.com!usenet.stanford.edu!nntp.perl.org Xref: csiph.com perl.ldap:626 Return-Path: Mailing-List: contact perl-ldap-help@perl.org; run by ezmlm Delivered-To: mailing list perl-ldap@perl.org Received: (qmail 32119 invoked from network); 29 Aug 2015 11:54:27 -0000 Received: from x1.develooper.com (207.171.7.70) by x6.develooper.com with SMTP; 29 Aug 2015 11:54:27 -0000 Received: (qmail 20722 invoked by uid 225); 29 Aug 2015 11:54:27 -0000 Delivered-To: perl-ldap@perl.org Received: (qmail 20718 invoked by alias); 29 Aug 2015 11:54:27 -0000 X-Spam-Status: No, hits=-1.9 required=8.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE X-Spam-Check-By: la.mx.develooper.com Received: from kw04.serverdomain.org (HELO kw04.serverdomain.org) (89.107.184.29) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sat, 29 Aug 2015 04:54:24 -0700 Received: from adpm.de (ipservice-092-211-007-006.092.211.pools.vodafone-ip.de [92.211.7.6]) (Authenticated sender: ww4455p1) by kw04.serverdomain.org (kw04.serverdomain.org) with ESMTPSA id AD5B132457F5A for ; Sat, 29 Aug 2015 13:54:18 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at adpm.de Received: from tsetse.adpm.de ([127.0.0.1]) by localhost (tsetse.adpm.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CKLOkEFqZc5u for ; Sat, 29 Aug 2015 13:54:16 +0200 (CEST) Received: from moth.localnet (moth.adpm.de [10.250.2.3]) by tsetse.adpm.de (Postfix) with ESMTPS id ECBD02029A for ; Sat, 29 Aug 2015 13:54:15 +0200 (CEST) To: perl-ldap@perl.org Subject: Re: (Net::LDAP) Automatically convert attributes into utf8 when writting Date: Sat, 29 Aug 2015 13:54:15 +0200 Message-ID: <1722046.SSPNH8pE0O@moth> Organization: ADPM User-Agent: KMail/4.14.2 (Linux/4.1.0-1-amd64; KDE/4.14.2; x86_64; ; ) In-Reply-To: <668301440502635@web4g.yandex.ru> References: <3262561440500066@web15g.yandex.ru> <55DC4BB5.6030501@keutel.de> <668301440502635@web4g.yandex.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Approved: news@nntp.perl.org From: peter@adpm.de (Peter Marschall) Hi, On Tuesday, 25. August 2015 13:37:15 pe rl wrote: > They are not necessary when reading/searching in the ldap server, since > Net::LDAP already has a "raw" option in the constructor to automatically > encode/decode strings. It is working for us, and the only change required > has been to add the "raw" option to the constructor. I think you misinterpret the purpose of the raw option. Its goal is to convert the byte strings coming from the LDAP server that represent UTF-8 encoded directory strings from byte semantics to Perl scalars with character semantics. On the other hand, perl-ldap expects scalars in character semantics when it comes to writing directory strings to an LDAP server. It is not perl-ldap's job to translate between scalars in Perl's character semantics and various input or output encodings of your application. > The problem appears when writting to the ldap server. I have started to > modify our code with utf8::encode(), by adding it to every attribute in all > of our functions. The problem is that it is very inefficient, since I will > have to modify every attribute that appears in our programs. We have a lot > of functions that create/modify/delete entries in the ldap server, so I > will have to change a lot of code to manually encode attribs to utf8, and > then test all of the changes. It is not perl-ldap's job to translate between scalars in Perl's character semantics and various input or output encodings of your application. This is the application's task. If you - as you write - need to convert every attribute using ut8::encode(), then your application seems to use a mixture of byte & character semantics. In that case please do yourself a favour and switch over to character semantics by correctly converting input to character semantics when it happens: - for file & console input you can use the ":encoding(...)" layer to make sure you get character semantics instead of byte semantics - for @ARGV a simple $_ = Encode::decode('UTF-8' ,$_) for @ARGV; should be sufficient. You may also have a look at the 'utf8::all' package that does a lot of the above for you automatically. Please read the perlunicode manual page for more detailed information. Best PEter -- Peter Marschall peter@adpm.de