Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15943

number of bytes for each (uni)code point while using utf-8 as encoding ...

From lbrt chx _ gemale
Newsgroups comp.lang.java.programmer
Subject number of bytes for each (uni)code point while using utf-8 as encoding ...
Organization Acecape, Inc.
Organization Newshosting.com - Highest quality at a great price! www.newshosting.com
Message-ID <1342030685.407730@nntp.aceinnovative.com> (permalink)
Date 2012-07-11 18:18 +0000

Show all headers | View raw


>> how to get the length of the sequence of bytes defining a code point

>Use a look up table.
~ 
 Yes, rossum, this is what I was trying to get around ;-)
~ 
// __ unicode.org/versions/Unicode6.1.0/
 private final long[] lKpPntLims = new long[]{ 
           128
        , 2048
       , 65536
     , 2097152
    , 67108864
  , 2147483648L
 };

// __ 
 private final int getKdPntLBytes(long lKdPnt) throws IOException{
  int iByts = 0;
  boolean Is = false;
  for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){
   Is = (lKdPnt < lKpPntLims[iByts]);
  }// iByts [0, lKpPntLims.length)
  if(!Is){ throw new IOException("// __ Code point not mapped by Unicode Standard 6.1.0! lKdPnt: |" + lKdPnt + "|"); }
  return(iByts);
 }
~ 
 The thing is that the constant casting gets expensive and even if you declare a function (and all its functional context) to be final, you have no guarantee that the compiler will inline it 
~ 
 IMO, I still think that this functionality should be part of the API or I just haven't found a way around it. I had even found silly one-off errors in presumably committed code:
~ 
 http://code.google.com/p/xbird/source/browse/trunk/xbird-open/main/src/java/xbird/util/codec/UTF8Codec.java
~ 
 and yes, I let them know
~ 
 lbrtchx 

Back to comp.lang.java.programmer | Previous | Next | Find similar | Unroll thread


Thread

number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 18:18 +0000

csiph-web