Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15952

number of bytes for each (uni)code point while using utf-8 as encoding ...

From lbrt chx _ gemale
Newsgroups comp.lang.java.programmer
Subject number of bytes for each (uni)code point while using utf-8 as encoding ...
Organization Acecape, Inc.
Organization Newshosting.com - Highest quality at a great price! www.newshosting.com
Message-ID <1342045748.366554@nntp.aceinnovative.com> (permalink)
Date 2012-07-11 22:29 +0000

Show all headers | View raw


~ 
 OK, in case someone is looking for something like that. There was some little statement that could (and should!) be optimized if you want for the compiler to inline your code. No conditional statement whatsoever, so the sanity checks should be done in the calling env
~ 
// __ 
class UniKd00{
// __ unicode.org/versions/Unicode6.1.0/
 private final long[] lKpPntLims = new long[]{ 
           128
        , 2048
       , 65536
     , 2097152
    , 67108864
  , 2147483648L
 };

// __ 
 public final long lLastUniKd = lKpPntLims[lKpPntLims.length - 1];

// __ 
 public final String aUniKdVer = "6.1.0";

// __ 
 UniKd00(){}

// __ ((lKdPnt > -1) && (lKdPnt < lLastUniKd)) should be checked in calling env
// __ fewer conditional statements -> more inlin[e|able> by the compiler
 public  final int getKdPntLBytes(long lKdPnt){
  int iByts = 0;
  boolean Is = false;
  for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){ Is = (lKdPnt < lKpPntLims[iByts]); }// iByts [0, lKpPntLims.length)
  return(iByts);
 }
}
~ 
 and the test harness in the calling env. looks like this:
~ 
    lKdPnt = (long)" ... get() codepoint";
    if((lKdPnt > -1) && (lKdPnt < UniKd.lLastUniKd)){
     System.out.printf("// __ |%2d|%10d|%1d|\n", l, lKdPnt, UniKd.getKdPntLBytes(lKdPnt));
    }
    else{ throw new IOException("// __ Code point not mapped by Unicode Standard " + UniKd.aUniKdVer + "! lKdPnt: |" + lKdPnt + "|"); }
~ 
> Would you also disclose why you need that information btw. what you want to do with it?  I don't see the use case.
~ 
 Well, I probably was so into those things (I mentioned) that I "naturally" thought it should be part of the API
~ 
 lbrtchx

Back to comp.lang.java.programmer | Previous | Next | Find similar | Unroll thread


Thread

number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 22:29 +0000

csiph-web