Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48038

Re: A few questiosn about encoding

From Dennis Lee Bieber <wlfraed@ix.netcom.com>
Subject Re: A few questiosn about encoding
Date 2013-06-13 18:46 -0400
Organization IISS Elusive Unicorn
References <6dfa3707-80f4-407a-a109-66dbb0130513@googlegroups.com> <mailman.2923.1370797972.3114.python-list@python.org> <kp9drh$1o0t$1@news.ntua.gr>
Newsgroups comp.lang.python
Message-ID <mailman.3238.1371163584.3114.python-list@python.org> (permalink)

Show all headers | View raw


On Wed, 12 Jun 2013 09:09:05 +0000 (UTC), ???????? ??????
<support@superhost.gr> declaimed the following:

>>> (*) infact UTF8 also indicates the end of each character
>
>> Up to a point.  The initial byte encodes the length and the top few
>> bits, but the subsequent octets aren’t distinguishable as final in
>> isolation.  0x80-0xBF can all be either medial or final.
>
>
>So, the first high-bits are a directive that UTF-8 uses to know how many 
>bytes each character is being represented as.
>
>0-127 codepoints(characters) use 1 bit to signify they need 1 bit for 
>storage and the rest 7 bits to actually store the character ?
>
	Not quite... The leading bit is a 0 -> which means 0..127 are sent
as-is, no manipulation.

>while
>
>128-256 codepoints(characters) use 2 bit to signify they need 2 bits for 
>storage and the rest 14 bits to actually store the character ?
>
	128..255 -- in what encoding? These all have the leading bit with a
value of 1. In 8-bit encodings (ISO-Latin-1) the meaning of those values is
inherent in the specified encoding and they are sent as-is.

	BUT, in UTF-8, a byte with a leading 1-bit signals that the byte
identifies a multi-byte sequence. CF:
https://en.wikipedia.org/wiki/UTF-8#Description

	So anything that starts with bits 110 is a two byte sequence (and the
second byte must start with bits 10 to be valid)

	1110 starts a three byte sequence, 11110 starts a four byte sequence...
Basically, count the number of leading 1-bits before a 0 bit, and that
tells you how many bytes are in the multi-byte sequence -- and all bytes
that start with 10 are supposed to be the continuations of a multibyte set
(and not a signal that this is a 1-byte entry -- those only have a leading
0)

>Isn't 14 bits way to many to store a character ? 

Original UTF-8 allowed for 31-bits to specify a character in the Unicode
set. It used 6 bytes -- 48 bits total, but 7 bits of the first byte were
the flag (6 leading 1 bits and a 0 bit), and two bits (leading 10) of each
continuation.

	
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

A few questiosn about encoding Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 03:44 -0700
  Re: A few questiosn about encoding Fábio Santos <fabiosantosart@gmail.com> - 2013-06-09 13:18 +0100
  Re: A few questiosn about encoding Nobody <nobody@nowhere.com> - 2013-06-09 18:01 +0100
  Re: A few questiosn about encoding Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-06-09 19:12 +0200
    Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 09:09 +0000
      Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-12 09:24 +0000
        Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 14:23 +0300
          Re: A few questiosn about encoding Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-06-12 14:52 +0200
          Re: A few questiosn about encoding Nobody <nobody@nowhere.com> - 2013-06-12 21:30 +0100
            Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 01:40 +0000
              Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 12:01 +1000
                Re: A few questiosn about encoding Nobody <nobody@nowhere.com> - 2013-06-13 11:02 +0100
            Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 09:21 +0300
              Re: A few questiosn about encoding jmfauth <wxjmfauth@gmail.com> - 2013-06-12 23:28 -0700
              Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 16:48 +1000
          Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 00:13 +0000
            Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 09:09 +0300
              Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 07:11 +0000
                Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 10:42 +0300
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 17:58 +1000
                Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 11:08 +0300
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 18:20 +1000
                Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 12:41 +0300
                Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 11:49 +0000
                Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 17:19 +0300
                Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 11:00 +1000
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 09:59 +0300
                Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 20:14 +1000
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 16:58 +0300
                Re: A few questiosn about encoding Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-14 11:21 -0400
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 18:26 +0300
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-15 03:03 +1000
                Re: A few questiosn about encoding Walter Hurry <walterhurry@lavabit.com> - 2013-06-14 23:32 +0000
                Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-15 10:26 +1000
                Re: A few questiosn about encoding Denis McMahon <denismfmcmahon@gmail.com> - 2013-06-15 06:34 +0000
                Re: A few questiosn about encoding Grant Edwards <invalid@invalid.invalid> - 2013-06-15 14:44 +0000
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 17:49 +0300
                Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-15 15:30 +0000
                Re: A few questiosn about encoding Roy Smith <roy@panix.com> - 2013-06-15 10:59 -0400
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 18:14 +0300
                Re: A few questiosn about encoding Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-15 11:35 -0400
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 22:26 +0300
                Re: A few questiosn about encoding Benjamin Schollnick <benjamin@schollnick.net> - 2013-06-15 16:35 -0400
                Re: A few questiosn about encoding Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-06-16 15:45 +0200
                Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 09:36 +0200
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 10:49 +0300
                Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 10:22 +0200
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 11:37 +0300
                Don't feed the troll... (was: Re: A few questiosn about encoding) Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 11:06 +0200
                Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 12:32 +0300
                Re: Don't feed the troll... Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 13:09 +0200
                Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:36 +0300
                Re: Don't feed the troll... Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-14 08:44 -0400
                Re: Don't feed the troll... Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 15:25 +0200
                Re: Don't feed the troll... Neil Cerutti <neilc@norwich.edu> - 2013-06-14 15:54 +0000
                Re: Don't feed the troll... Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 12:15 +0200
                Re: Don't feed the troll... Guy Scree <nobody@nowhere.com> - 2013-06-14 18:50 -0400
                Re: Don't feed the troll... Denis McMahon <denismfmcmahon@gmail.com> - 2013-06-15 06:31 +0000
                Re: Don't feed the troll... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-06-15 13:04 -0400
                Re: Don't feed the troll... Guy Scree <nobody@nowhere.com> - 2013-06-17 16:15 -0400
                Re: Don't feed the troll... Chris Angelico <rosuav@gmail.com> - 2013-06-18 07:46 +1000
                Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 20:19 +1000
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:41 +0300
                Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Fábio Santos <fabiosantosart@gmail.com> - 2013-06-14 11:20 +0100
                Re: Don't feed the troll... (was: Re: A few questiosn about encoding) rusi <rustompmody@gmail.com> - 2013-06-14 04:51 -0700
                Re: Don't feed the help-vampire rusi <rustompmody@gmail.com> - 2013-06-14 05:09 -0700
                Re: Don't feed the help-vampire Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 14:31 +0200
                Re: Don't feed the help-vampire Ian Kelly <ian.g.kelly@gmail.com> - 2013-06-14 10:51 -0600
                Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:50 +0300
                Re: Don't feed the troll... Zero Piraeus <schesis@gmail.com> - 2013-06-14 09:33 -0400
                Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:45 +0300
                Re: Don't feed the troll... Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 14:58 +0200
                Re: Don't feed the troll... Fábio Santos <fabiosantosart@gmail.com> - 2013-06-14 14:25 +0100
                Re: Don't feed the troll... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-14 17:12 +0100
                Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 12:50 +0200
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:59 +0300
                Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 15:52 +0200
                Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-15 10:28 +1000
                Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-17 08:49 +0200
                Re: Don't feed the troll... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-14 12:57 +0100
                Re: Don't feed the troll... (was: Re: A few questiosn about encoding) "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 13:13 -0400
                Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Chris Angelico <rosuav@gmail.com> - 2013-06-15 03:31 +1000
                Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Grant Edwards <invalid@invalid.invalid> - 2013-06-14 19:40 +0000
                Re: Don't feed the troll "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 13:56 -0400
                Re: Don't feed the troll Tim Chase <python.list@tim.thechases.com> - 2013-06-14 14:00 -0500
                Re: Don't feed the troll "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 15:17 -0400
                Re: Don't feed the troll... Ben Finney <ben+python@benfinney.id.au> - 2013-06-15 10:42 +1000
                Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-19 18:46 -0700
                Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-20 06:26 +0000
                Re: A few questiosn about encoding MRAB <python@mrabarnett.plus.com> - 2013-06-20 12:43 +0100
                Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-20 09:27 -0700
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 02:37 +1000
                Re: A few questiosn about encoding MRAB <python@mrabarnett.plus.com> - 2013-06-20 18:17 +0100
                Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-23 08:51 -0700
                Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-23 16:30 +0000
                Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-25 13:16 -0700
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 03:21 +1000
                Re: A few questiosn about encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-20 20:43 +0100
                Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-20 06:40 -0700
                Re: A few questiosn about encoding Andrew Berg <robotsondrugs@gmail.com> - 2013-06-20 09:04 -0500
                Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-20 08:12 -0700
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 01:26 +1000
                Re: A few questiosn about encoding Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-06-20 20:25 +0300
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 01:28 +1000
                Re: A few questiosn about encoding Andreas Perstinger <andipersti@gmail.com> - 2013-06-20 19:08 +0200
        Re: A few questiosn about encoding Dave Angel <davea@davea.name> - 2013-06-12 08:43 -0400
      Re: A few questiosn about encoding Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-06-13 18:46 -0400
        Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 08:34 +0300
          Re: A few questiosn about encoding Zero Piraeus <schesis@gmail.com> - 2013-06-14 02:00 -0400
            Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 10:28 +0300

csiph-web