Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid>
Newsgroups: comp.lang.java.security
Subject: Re: X500Principal and UTF-16 encoded certificates
Date: Fri, 22 Apr 2011 17:35:56 +0200
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <ios78t$c2b$1@dont-email.me>
References: <f3317f71-49c9-448d-9baa-8cb439a19b4b@l36g2000vbp.googlegroups.com> <ed8d8950-6fb4-4082-800f-1609258ceb96@hd10g2000vbb.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Apr 2011 15:35:57 +0000 (UTC)
Injection-Info: mx03.eternal-september.org; posting-host="JgzAXvgbe1leCK0HBfr1eg"; logging-data="12363"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19hHUHK3OU+dmMD6CRKe4On"
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9
In-Reply-To: <ed8d8950-6fb4-4082-800f-1609258ceb96@hd10g2000vbb.googlegroups.com>
Cancel-Lock: sha1:p9IJX94Y2WyXx/X+xlf4DeQU4pI=
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.security:17

On 21/04/2011 17:27, Yosi Izaq allegedly wrote:
> On Apr 21, 4:22 pm, Yosi Izaq<izaq...@gmail.com>  wrote:
>> Hi,
>>
>> I have a java application that parses certificates. It works perfectly
>> for certificates that have their fields encoded in UTF-8.
>> It doesn't work well for UTF-16 encoding. While debugging the problem
>> I've found that getName(X500Principal.RFC2253) function returns the
>> name with extra 0x00 bytes (as if it confuses the first byte of UTF-16
>> to be a UTF-8 byte).
>>
>> I've also found in Java doc (http://download.oracle.com/javase/1.4.2/
>> docs/api/javax/security/auth/x500/
>> X500Principal.html#getName(java.lang.String) ) that:
>> "If "RFC2253" is specified as the format, this method emits the
>> attribute type keywords defined in RFC 2253 (CN, L, ST, O, OU, C,
>> STREET, DC, UID). Any other attribute type is emitted as an OID. Under
>> a strict reading, RFC 2253 only specifies a UTF-8 string
>> representation. The String returned by this method is the Unicode
>> string achieved by decoding this UTF-8 representation."
>> This is consistent with the behavior that I've observed.
>>
>> I would like to ask what are my options for correctly parsing the name
>> value in accordance with RFC2253 when encoded in UTF-16?
>>
>> TIA,
>> Yosi
>
> Just an update, rfc2253 (http://www.ietf.org/rfc/rfc2253.txt) states
> it's objective as "UTF-8 String Representation of Distinguished
> Names". Clearly, the legacy code I'm dealing with didn't take this
> into account.
> I'm currently experimenting with rfc1779 (http://www.ietf.org/rfc/
> rfc1779.txt?number=1779) using all manner of UTF-16 encoded
> certificate subjects.
> Is there any specific reason why
> X500Principal:getName(X500Principal.RFC2253) may be preferable to
> X500Principal:getName(X500Principal.RFC1779)?
>
> 10x,
> Yosi

I doubt your finding, for the very simple reason that
X500Principal#getName returns a String, not a byte[]. So your extra null
byte would have to come from whichever part it is that transforms the
String to a byte[], or possibly from X500Principal#getEncoded(). The
problem may also be with the input, i.e. when and if the X500Principal
instance is created using the byte[] or java.io.InputStream c'tor.

I would suggest you posted an SSCCE <http://sscce.org/>.

AFAIK, there is no intrinsic reason to use RFC2253 over RFC1779,
although the former appears to me more recently widespread. I would say
it boils down to what the entity you communicate with (be it a library
or a third party) understands.

-- 
DF.
An escaped convict once said to me:
"Alcatraz is the place to be"