Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad!not-for-mail From: lbrt chx _ gemale Newsgroups: comp.lang.java.programmer Subject: number of bytes for each (uni)code point while using utf-8 as encoding ... X-Newsreader: NetComponents Organization: Acecape, Inc. Organization: Newshosting.com - Highest quality at a great price! www.newshosting.com X-Complaints-To: abuse(at)newshosting.com Message-ID: <1341915690.235464@nntp.aceinnovative.com> Cache-Post-Path: nntp.aceinnovative.com!unknown@p70-44.acedsl.com X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/) Date: 10 Jul 2012 10:21:30 GMT Lines: 31 X-Received-Bytes: 1911 Xref: csiph.com comp.lang.java.programmer:15914 number of bytes for each (uni)code point while using utf-8 as encoding ... ~ you may iterate through all (uni)code points in a file encoded as utf-8 (or any other encoding) by going like this: ~ ... // __ String aOEnc = "UTF-8"; Charset InChrSt = Charset.forName(aOEnc); CharsetDecoder InDec = InChrSt.newDecoder(); InDec.onMalformedInput(CodingErrorAction.REPORT); InDec.onUnmappableCharacter(CodingErrorAction.REPORT); // __ FIS = new FileInputStream(new File()); FileChannel IFlChnl = FIS.getChannel(); MappedByteBuffer MptBytBfr = IFlChnl.map(FileChannel.MapMode.READ_ONLY, 0, (int)IFlChnl.size()); CharBuffer MptChrBfr = InDec.decode(MptBytBfr); // __ for (int j = 0; (j < MptChrBfr.length()); ++j){ MptChrBfr.get(); } ... ~ each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right? ~ I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs ~ How can you get the number of bytes you "get()"? ~ thank you lbrtchx comp.lang.java.programmer: number of bytes for each (uni)code point while using utf-8 as encoding ...