Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15919
| From | Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: number of bytes for each (uni)code point while using utf-8 as encoding ... |
| Date | 2012-07-10 20:13 +0200 |
| Organization | A noiseless patient Spider |
| Message-ID | <jthrd2$p5g$1@dont-email.me> (permalink) |
| References | <1341915690.235464@nntp.aceinnovative.com> |
On 10/07/2012 12:21, lbrt chx _ gemale allegedly wrote: > number of bytes for each (uni)code point while using utf-8 as encoding ... > <snip /> > each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right? > ~ > I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs > ~ > How can you get the number of bytes you "get()"? Well, UTF-8 always encodes the same char to the same (number of) bytes, doesn't it? So you could just build a map char -> size /a priori/. But really, what's the use? Knowing how big in bytes your text will be? Probably just as cheap to just write the text to a Writer backed by a counting /dev/null OutputStream. -- DF.
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-10 10:21 +0000 Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-07-10 20:13 +0200 Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Roedy Green <see_website@mindprod.com.invalid> - 2012-07-11 19:04 -0700 Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Jason Bailey <Jason.Bailey@sas.com> - 2012-07-12 10:43 -0400
csiph-web