Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15933 > unrolled thread

number of bytes for each (uni)code point while using utf-8 as encoding ...

Started bylbrt chx _ gemale
First post2012-07-11 00:08 +0000
Last post2012-07-11 14:05 -0700
Articles 4 — 4 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 00:08 +0000
    Re: number of bytes for each (uni)code point while using utf-8 as encoding ... rossum <rossum48@coldmail.com> - 2012-07-11 16:09 +0100
    Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Robert Klemme <shortcutter@googlemail.com> - 2012-07-11 22:03 +0200
    Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Lew <lewbloch@gmail.com> - 2012-07-11 14:05 -0700

#15933 — number of bytes for each (uni)code point while using utf-8 as encoding ...

Fromlbrt chx _ gemale
Date2012-07-11 00:08 +0000
Subjectnumber of bytes for each (uni)code point while using utf-8 as encoding ...
Message-ID<1341965282.664308@nntp.aceinnovative.com>
~ 
 I obviously and I would say -very clearly- meant a -file's encoding- is either incorrectly set by authors or is corrupted in transit. (I never said anything about failing disks ...)
~ 
 Sometimes we technical people sound like lawyers/politicians trying to correct peoples' minds and/or trying to prove something to one self
~ 
 What I asked is an entirely technical question, namely; how to get the length of the sequence of bytes defining a code point
~ 
 lbrtchx

[toc] | [next] | [standalone]


#15940

Fromrossum <rossum48@coldmail.com>
Date2012-07-11 16:09 +0100
Message-ID<0t4rv7d9lokdbm0287lf7h76u41a0qunvu@4ax.com>
In reply to#15933
On 11 Jul 2012 00:08:02 GMT, lbrt chx _ gemale wrote:

> how to get the length of the sequence of bytes defining a code point
Use a look up table.

Start Code Point   End Code Point   Num Bytes  
----------------   --------------   ---------
     U+0000           U+007F            1
     U+0080           U+07FF            2
     U+0800           U+FFFF            3
     U+10000          U+1FFFFF          4
     U+200000         U+3FFFFFF         5
     U+4000000        U+7FFFFFFF        6


rossum

[toc] | [prev] | [next] | [standalone]


#15944

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-07-11 22:03 +0200
Message-ID<a664fsFnrhU1@mid.individual.net>
In reply to#15933
On 11.07.2012 02:08, lbrt chx _ gemale wrote:
> What I
> asked is an entirely technical question, namely; how to get the
> length of the sequence of bytes defining a code point ~ lbrtchx

Would you also disclose why you need that information btw. what you want 
to do with it?  I don't see the use case.

And please try to keep the thread together - it's quite tedious to 
follow a discussion spread across a number of threads.  Thank you!

Cheers

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#15946

FromLew <lewbloch@gmail.com>
Date2012-07-11 14:05 -0700
Message-ID<6d3b2b50-0404-40fa-b611-7cf242b51c4f@googlegroups.com>
In reply to#15933
On Tuesday, July 10, 2012 5:08:02 PM UTC-7, (unknown) wrote:
> ~ 
>  I obviously and I would say -very clearly- meant a -file&#39;s encoding- is either incorrectly set by authors or is corrupted in transit. (I never said anything about failing disks ...)

And those two cases were answered in your other thread. 
Obviously, and very clearly.

Drop your attitude.

Um, please.

> ~ 
>  Sometimes we technical people sound like lawyers/politicians trying to correct peoples&#39; minds and/or trying to prove something to one self

Is that what you're doing?

> ~ 
>  What I asked is an entirely technical question, namely; how to get the length of the sequence of bytes defining a code point

And what was answered was a set of entirely technical responses, 
namely how to get the length of the sequence of bytes 
defining a code point.

What is your problem?

-- 
Lew

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web