Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15952
| From | lbrt chx _ gemale |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | number of bytes for each (uni)code point while using utf-8 as encoding ... |
| Organization | Acecape, Inc. |
| Organization | Newshosting.com - Highest quality at a great price! www.newshosting.com |
| Message-ID | <1342045748.366554@nntp.aceinnovative.com> (permalink) |
| Date | 2012-07-11 22:29 +0000 |
~
OK, in case someone is looking for something like that. There was some little statement that could (and should!) be optimized if you want for the compiler to inline your code. No conditional statement whatsoever, so the sanity checks should be done in the calling env
~
// __
class UniKd00{
// __ unicode.org/versions/Unicode6.1.0/
private final long[] lKpPntLims = new long[]{
128
, 2048
, 65536
, 2097152
, 67108864
, 2147483648L
};
// __
public final long lLastUniKd = lKpPntLims[lKpPntLims.length - 1];
// __
public final String aUniKdVer = "6.1.0";
// __
UniKd00(){}
// __ ((lKdPnt > -1) && (lKdPnt < lLastUniKd)) should be checked in calling env
// __ fewer conditional statements -> more inlin[e|able> by the compiler
public final int getKdPntLBytes(long lKdPnt){
int iByts = 0;
boolean Is = false;
for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){ Is = (lKdPnt < lKpPntLims[iByts]); }// iByts [0, lKpPntLims.length)
return(iByts);
}
}
~
and the test harness in the calling env. looks like this:
~
lKdPnt = (long)" ... get() codepoint";
if((lKdPnt > -1) && (lKdPnt < UniKd.lLastUniKd)){
System.out.printf("// __ |%2d|%10d|%1d|\n", l, lKdPnt, UniKd.getKdPntLBytes(lKdPnt));
}
else{ throw new IOException("// __ Code point not mapped by Unicode Standard " + UniKd.aUniKdVer + "! lKdPnt: |" + lKdPnt + "|"); }
~
> Would you also disclose why you need that information btw. what you want to do with it? I don't see the use case.
~
Well, I probably was so into those things (I mentioned) that I "naturally" thought it should be part of the API
~
lbrtchx
Back to comp.lang.java.programmer | Previous | Next | Find similar | Unroll thread
number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 22:29 +0000
csiph-web