Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15914

number of bytes for each (uni)code point while using utf-8 as encoding ...

From lbrt chx _ gemale
Newsgroups comp.lang.java.programmer
Subject number of bytes for each (uni)code point while using utf-8 as encoding ...
Organization Acecape, Inc.
Organization Newshosting.com - Highest quality at a great price! www.newshosting.com
Message-ID <1341915690.235464@nntp.aceinnovative.com> (permalink)
Date 2012-07-10 10:21 +0000

Show all headers | View raw


number of bytes for each (uni)code point while using utf-8 as encoding ...
~ 
 you may iterate through all (uni)code points in a file encoded as utf-8 (or any other encoding) by going like this:
~ 
 ...
// __ 
    String aOEnc = "UTF-8";
    Charset InChrSt = Charset.forName(aOEnc);
    CharsetDecoder InDec = InChrSt.newDecoder();
    InDec.onMalformedInput(CodingErrorAction.REPORT);
    InDec.onUnmappableCharacter(CodingErrorAction.REPORT);
// __ 
    FIS = new FileInputStream(new File(<file path as string>));
    FileChannel IFlChnl = FIS.getChannel();
    MappedByteBuffer MptBytBfr = IFlChnl.map(FileChannel.MapMode.READ_ONLY, 0, (int)IFlChnl.size());
    CharBuffer MptChrBfr = InDec.decode(MptBytBfr);
// __ 
    for (int j = 0; (j < MptChrBfr.length()); ++j){
     MptChrBfr.get();
    }
 ...
~ 
 each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right?
~ 
 I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs
~ 
 How can you get the number of bytes you "get()"?
~ 
 thank you
 lbrtchx
 comp.lang.java.programmer: number of bytes for each (uni)code point while using utf-8 as encoding ...

Back to comp.lang.java.programmer | Previous | NextNext in thread | Find similar | Unroll thread


Thread

number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-10 10:21 +0000
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-07-10 20:13 +0200
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Roedy Green <see_website@mindprod.com.invalid> - 2012-07-11 19:04 -0700
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Jason Bailey <Jason.Bailey@sas.com> - 2012-07-12 10:43 -0400

csiph-web