Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15919

Re: number of bytes for each (uni)code point while using utf-8 as encoding ...

From Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: number of bytes for each (uni)code point while using utf-8 as encoding ...
Date 2012-07-10 20:13 +0200
Organization A noiseless patient Spider
Message-ID <jthrd2$p5g$1@dont-email.me> (permalink)
References <1341915690.235464@nntp.aceinnovative.com>

Show all headers | View raw


On 10/07/2012 12:21, lbrt chx _ gemale allegedly wrote:
> number of bytes for each (uni)code point while using utf-8 as encoding ...
> <snip />
>  each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right?
> ~ 
>  I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs
> ~ 
>  How can you get the number of bytes you "get()"?

Well, UTF-8 always encodes the same char to the same (number of) bytes,
doesn't it? So you could just build a map char -> size /a priori/.

But really, what's the use? Knowing how big in bytes your text will be?
Probably just as cheap to just write the text to a Writer backed by a
counting /dev/null OutputStream.

-- 
DF.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-10 10:21 +0000
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-07-10 20:13 +0200
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Roedy Green <see_website@mindprod.com.invalid> - 2012-07-11 19:04 -0700
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Jason Bailey <Jason.Bailey@sas.com> - 2012-07-12 10:43 -0400

csiph-web