Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15933 > unrolled thread
| Started by | lbrt chx _ gemale |
|---|---|
| First post | 2012-07-11 00:08 +0000 |
| Last post | 2012-07-11 14:05 -0700 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.lang.java.programmer
number of bytes for each (uni)code point while using utf-8 as encoding ... lbrt chx _ gemale - 2012-07-11 00:08 +0000
Re: number of bytes for each (uni)code point while using utf-8 as encoding ... rossum <rossum48@coldmail.com> - 2012-07-11 16:09 +0100
Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Robert Klemme <shortcutter@googlemail.com> - 2012-07-11 22:03 +0200
Re: number of bytes for each (uni)code point while using utf-8 as encoding ... Lew <lewbloch@gmail.com> - 2012-07-11 14:05 -0700
| From | lbrt chx _ gemale |
|---|---|
| Date | 2012-07-11 00:08 +0000 |
| Subject | number of bytes for each (uni)code point while using utf-8 as encoding ... |
| Message-ID | <1341965282.664308@nntp.aceinnovative.com> |
~ I obviously and I would say -very clearly- meant a -file's encoding- is either incorrectly set by authors or is corrupted in transit. (I never said anything about failing disks ...) ~ Sometimes we technical people sound like lawyers/politicians trying to correct peoples' minds and/or trying to prove something to one self ~ What I asked is an entirely technical question, namely; how to get the length of the sequence of bytes defining a code point ~ lbrtchx
[toc] | [next] | [standalone]
| From | rossum <rossum48@coldmail.com> |
|---|---|
| Date | 2012-07-11 16:09 +0100 |
| Message-ID | <0t4rv7d9lokdbm0287lf7h76u41a0qunvu@4ax.com> |
| In reply to | #15933 |
On 11 Jul 2012 00:08:02 GMT, lbrt chx _ gemale wrote:
> how to get the length of the sequence of bytes defining a code point
Use a look up table.
Start Code Point End Code Point Num Bytes
---------------- -------------- ---------
U+0000 U+007F 1
U+0080 U+07FF 2
U+0800 U+FFFF 3
U+10000 U+1FFFFF 4
U+200000 U+3FFFFFF 5
U+4000000 U+7FFFFFFF 6
rossum
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-07-11 22:03 +0200 |
| Message-ID | <a664fsFnrhU1@mid.individual.net> |
| In reply to | #15933 |
On 11.07.2012 02:08, lbrt chx _ gemale wrote: > What I > asked is an entirely technical question, namely; how to get the > length of the sequence of bytes defining a code point ~ lbrtchx Would you also disclose why you need that information btw. what you want to do with it? I don't see the use case. And please try to keep the thread together - it's quite tedious to follow a discussion spread across a number of threads. Thank you! Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Lew <lewbloch@gmail.com> |
|---|---|
| Date | 2012-07-11 14:05 -0700 |
| Message-ID | <6d3b2b50-0404-40fa-b611-7cf242b51c4f@googlegroups.com> |
| In reply to | #15933 |
On Tuesday, July 10, 2012 5:08:02 PM UTC-7, (unknown) wrote: > ~ > I obviously and I would say -very clearly- meant a -file's encoding- is either incorrectly set by authors or is corrupted in transit. (I never said anything about failing disks ...) And those two cases were answered in your other thread. Obviously, and very clearly. Drop your attitude. Um, please. > ~ > Sometimes we technical people sound like lawyers/politicians trying to correct peoples' minds and/or trying to prove something to one self Is that what you're doing? > ~ > What I asked is an entirely technical question, namely; how to get the length of the sequence of bytes defining a code point And what was answered was a set of entirely technical responses, namely how to get the length of the sequence of bytes defining a code point. What is your problem? -- Lew
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web