Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15943
| From | lbrt chx _ gemale |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | number of bytes for each (uni)code point while using utf-8 as encoding ... |
| Organization | Acecape, Inc. |
| Organization | Newshosting.com - Highest quality at a great price! www.newshosting.com |
| Message-ID | <1342030685.407730@nntp.aceinnovative.com> (permalink) |
| Date | 2012-07-11 18:18 +0000 |
>> how to get the length of the sequence of bytes defining a code point
>Use a look up table.
~
Yes, rossum, this is what I was trying to get around ;-)
~
// __ unicode.org/versions/Unicode6.1.0/
private final long[] lKpPntLims = new long[]{
128
, 2048
, 65536
, 2097152
, 67108864
, 2147483648L
};
// __
private final int getKdPntLBytes(long lKdPnt) throws IOException{
int iByts = 0;
boolean Is = false;
for(; ((iByts < lKpPntLims.length) && !Is); ++iByts){
Is = (lKdPnt < lKpPntLims[iByts]);
}// iByts [0, lKpPntLims.length)
if(!Is){ throw new IOException("// __ Code point not mapped by Unicode Standard 6.1.0! lKdPnt: |" + lKdPnt + "|"); }
return(iByts);
}
~
The thing is that the constant casting gets expensive and even if you declare a function (and all its functional context) to be final, you have no guarantee that the compiler will inline it
~
IMO, I still think that this functionality should be part of the API or I just haven't found a way around it. I had even found silly one-off errors in presumably committed code:
~
http://code.google.com/p/xbird/source/browse/trunk/xbird-open/main/src/java/xbird/util/codec/UTF8Codec.java
~
and yes, I let them know
~
lbrtchx
Back to comp.lang.java.programmer | Previous | Next | Find similar | Unroll thread
number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 18:18 +0000
csiph-web