Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad!not-for-mail
From: lbrt chx _ gemale
Newsgroups: comp.lang.java.programmer
Subject: number of bytes for each (uni)code point while using utf-8 as encoding ...
In-Reply-To: <1341965282.664308@nntp.aceinnovative.com>
X-Newsreader: NetComponents
Organization: Acecape, Inc.
Organization: Newshosting.com - Highest quality at a great price! www.newshosting.com
X-Complaints-To: abuse(at)newshosting.com
Message-ID: <1342030685.407730@nntp.aceinnovative.com>
Cache-Post-Path: nntp.aceinnovative.com!unknown@p70-44.acedsl.com
X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/)
Date: 11 Jul 2012 18:18:05 GMT
Lines: 36
X-Received-Bytes: 1946
Xref: csiph.com comp.lang.java.programmer:15943

>> how to get the length of the sequence of bytes defining a code point

>Use a look up table.
~ 
 Yes, rossum, this is what I was trying to get around ;-)
~ 
// __ unicode.org/versions/Unicode6.1.0/
 private final long[] lKpPntLims = new long[]{ 
           128
        , 2048
       , 65536
     , 2097152
    , 67108864
  , 2147483648L
 };

// __ 
 private final int getKdPntLBytes(long lKdPnt) throws IOException{
  int iByts = 0;
  boolean Is = false;
  for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){
   Is = (lKdPnt < lKpPntLims[iByts]);
  }// iByts [0, lKpPntLims.length)
  if(!Is){ throw new IOException("// __ Code point not mapped by Unicode Standard 6.1.0! lKdPnt: |" + lKdPnt + "|"); }
  return(iByts);
 }
~ 
 The thing is that the constant casting gets expensive and even if you declare a function (and all its functional context) to be final, you have no guarantee that the compiler will inline it 
~ 
 IMO, I still think that this functionality should be part of the API or I just haven't found a way around it. I had even found silly one-off errors in presumably committed code:
~ 
 http://code.google.com/p/xbird/source/browse/trunk/xbird-open/main/src/java/xbird/util/codec/UTF8Codec.java
~ 
 and yes, I let them know
~ 
 lbrtchx