Groups > comp.lang.java.programmer > #15940

Re: number of bytes for each (uni)code point while using utf-8 as encoding ...

From	rossum <rossum48@coldmail.com>
Newsgroups	comp.lang.java.programmer
Subject	Re: number of bytes for each (uni)code point while using utf-8 as encoding ...
Date	2012-07-11 16:09 +0100
Message-ID	<0t4rv7d9lokdbm0287lf7h76u41a0qunvu@4ax.com> (permalink)
References	<1341965282.664308@nntp.aceinnovative.com>

Show all headers | View raw

On 11 Jul 2012 00:08:02 GMT, lbrt chx _ gemale wrote:

> how to get the length of the sequence of bytes defining a code point
Use a look up table.

Start Code Point   End Code Point   Num Bytes  
----------------   --------------   ---------
     U+0000           U+007F            1
     U+0080           U+07FF            2
     U+0800           U+FFFF            3
     U+10000          U+1FFFFF          4
     U+200000         U+3FFFFFF         5
     U+4000000        U+7FFFFFFF        6


rossum

Thread

number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 00:08 +0000
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... rossum <rossum48@coldmail.com> - 2012-07-11 16:09 +0100
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Robert Klemme <shortcutter@googlemail.com> - 2012-07-11 22:03 +0200
  Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Lew <lewbloch@gmail.com> - 2012-07-11 14:05 -0700

csiph-web