Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!69.16.185.21.MISMATCH!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post01.iad!not-for-mail From: lbrt chx _ gemale Newsgroups: comp.lang.java.programmer Subject: number of bytes for each (uni)code point while using utf-8 as encoding ... In-Reply-To: <1341965282.664308@nntp.aceinnovative.com> X-Newsreader: NetComponents Organization: Acecape, Inc. Organization: Newshosting.com - Highest quality at a great price! www.newshosting.com X-Complaints-To: abuse(at)newshosting.com Message-ID: <1342045748.366554@nntp.aceinnovative.com> Cache-Post-Path: nntp.aceinnovative.com!unknown@p70-44.acedsl.com X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/) Date: 11 Jul 2012 22:29:08 GMT Lines: 47 X-Received-Bytes: 2358 Xref: csiph.com comp.lang.java.programmer:15952 ~ OK, in case someone is looking for something like that. There was some little statement that could (and should!) be optimized if you want for the compiler to inline your code. No conditional statement whatsoever, so the sanity checks should be done in the calling env ~ // __ class UniKd00{ // __ unicode.org/versions/Unicode6.1.0/ private final long[] lKpPntLims = new long[]{ 128 , 2048 , 65536 , 2097152 , 67108864 , 2147483648L }; // __ public final long lLastUniKd = lKpPntLims[lKpPntLims.length - 1]; // __ public final String aUniKdVer = "6.1.0"; // __ UniKd00(){} // __ ((lKdPnt > -1) && (lKdPnt < lLastUniKd)) should be checked in calling env // __ fewer conditional statements -> more inlin[e|able> by the compiler public final int getKdPntLBytes(long lKdPnt){ int iByts = 0; boolean Is = false; for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){ Is = (lKdPnt < lKpPntLims[iByts]); }// iByts [0, lKpPntLims.length) return(iByts); } } ~ and the test harness in the calling env. looks like this: ~ lKdPnt = (long)" ... get() codepoint"; if((lKdPnt > -1) && (lKdPnt < UniKd.lLastUniKd)){ System.out.printf("// __ |%2d|%10d|%1d|\n", l, lKdPnt, UniKd.getKdPntLBytes(lKdPnt)); } else{ throw new IOException("// __ Code point not mapped by Unicode Standard " + UniKd.aUniKdVer + "! lKdPnt: |" + lKdPnt + "|"); } ~ > Would you also disclose why you need that information btw. what you want to do with it? I don't see the use case. ~ Well, I probably was so into those things (I mentioned) that I "naturally" thought it should be part of the API ~ lbrtchx