Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #47322 > unrolled thread
| Started by | Cameron Simpson <cs@zip.com.au> |
|---|---|
| First post | 2013-06-07 18:53 +1000 |
| Last post | 2013-06-10 13:28 -0700 |
| Articles | 8 on this page of 68 — 14 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-07 18:53 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) alex23 <wuwei23@gmail.com> - 2013-06-07 02:41 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 04:53 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-07 15:29 +0100
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 11:52 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Zero Piraeus <schesis@gmail.com> - 2013-06-07 15:31 -0400
Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-07 21:45 +0100
Re: Changing filenames from Greeklish => Greek (subprocess complain) Zero Piraeus <schesis@gmail.com> - 2013-06-07 19:24 -0400
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-08 12:52 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 23:49 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-08 16:58 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-08 07:26 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-08 17:40 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-08 17:32 +0100
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 09:53 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 10:35 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-08 18:48 +0100
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-07 15:33 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-08 12:49 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 21:01 +0300
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-08 19:01 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 14:14 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 08:32 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 07:46 +0300
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 06:25 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 18:02 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:03 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 14:21 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-09 08:10 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 01:11 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-09 04:47 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-08 22:09 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 06:45 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-09 00:00 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 08:15 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:14 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 03:32 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 19:16 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 12:36 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-09 10:25 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Lele Gaifax <lele@metapensiero.it> - 2013-06-09 10:55 +0200
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:08 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Lele Gaifax <lele@metapensiero.it> - 2013-06-09 11:20 +0200
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:38 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-09 14:24 +0200
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 13:13 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-06-09 13:05 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:42 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 03:37 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Larry Hudson <orgnut@yahoo.com> - 2013-06-10 00:51 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 01:11 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Larry Hudson <orgnut@yahoo.com> - 2013-06-11 00:20 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 11:50 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 05:18 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:00 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 19:12 +1000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:20 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-06-09 13:01 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 12:31 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-10 00:10 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-10 10:15 +0200
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 01:54 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 02:59 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-10 12:42 +0200
Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-10 11:59 +0000
Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 07:27 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) jmfauth <wxjmfauth@gmail.com> - 2013-06-10 12:48 -0700
Re: Changing filenames from Greeklish => Greek (subprocess complain) Ned Batchelder <ned@nedbatchelder.com> - 2013-06-10 13:28 -0700
Page 4 of 4 — ← Prev page 1 2 3 [4]
| From | Andreas Perstinger <andipersti@gmail.com> |
|---|---|
| Date | 2013-06-10 10:15 +0200 |
| Message-ID | <mailman.2959.1370852149.3114.python-list@python.org> |
| In reply to | #47522 |
On 10.06.2013 09:10, nagia.retsina@gmail.com wrote:
> Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
>
>> py> c = 'α'
>> py> ord(c)
>> 945
>
> The number 945 is the characters 'α' ordinal value in the unicode charset correct?
Yes, the unicode character set is just a big list of characters. The
946th character in that list (starting from 0) happens to be 'α'.
> The command in the python interactive session to show me how many bytes
> this character will take upon encoding to utf-8 is:
>
>>>> s = 'α'
>>>> s.encode('utf-8')
> b'\xce\xb1'
>
> I see that the encoding of this char takes 2 bytes. But why two exactly?
That's how the encoding is designed. Haven't you read the wikipedia
article which was already mentioned several times?
> How do i calculate how many bits are needed to store this char into bytes?
You need to understand how UTF-8 works. Read the wikipedia article.
> Trying to to the same here but it gave me no bytes back.
>
>>>> s = 'a'
>>>> s.encode('utf-8')
> b'a'
The encode method returns a byte object. It's length will tell you how
many bytes there are:
>>> len(b'a')
1
>>> len(b'\xce\xb1')
2
The python interpreter will represent all values below 256 as ASCII
characters if they are printable:
>>> ord(b'a')
97
>>> hex(97)
'0x61'
>>> b'\x61' == b'a'
True
The Python designers have decided to use b'a' instead of b'\x61'.
>>py> c.encode('utf-8')
>> b'\xce\xb1'
>
> 2 bytes here. why 2?
Same as your first question.
>> py> c.encode('utf-16be')
>> b'\x03\xb1'
>
> 2 byets here also. but why 3 different bytes? the ordinal value of
> char 'a' is the same in unicode. the encodign system just takes the
> ordinal value end encode, but sinc eit uses 2 bytes should these 2 bytes
> be the same?
'utf-16be' is a different encoding scheme, thus it uses other rules to
determine how each character is translated into a byte sequence.
>> py> c.encode('iso-8859-7')
>> b'\xe1'
>
> And also does '\x' means that the value is being respresented in hex way?
> and when i bin(6) i see '0b1000001'
>
> I should expect to see 8 bits of 1s and 0's. what the 'b' is tryign to say?
>
'\x' is an escape sequence and means that the following two characters
should be interpreted as a number in hexadecimal notation (see also the
table of allowed escape sequences:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
).
'0b' tells you that the number is printed in binary notation.
Leading zeros are usually discarded when a number is printed:
>>> bin(70)
'0b1000110'
>>> 0b100110 == 0b00100110
True
>>> 0b100110 == 0b0000000000100110
True
It's the same with decimal notation. You wouldn't say 00123 is different
from 123, would you?
Bye, Andreas
[toc] | [prev] | [next] | [standalone]
| From | Νικόλαος Κούρας <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-06-10 01:54 -0700 |
| Message-ID | <c6c9a67a-8eab-41e3-b8bb-d013fd7805b5@googlegroups.com> |
| In reply to | #47532 |
Τη Δευτέρα, 10 Ιουνίου 2013 11:15:38 π.μ. UTC+3, ο χρήστης Andreas Perstinger έγραψε:
What is the difference between len('nikos') and len(b'nikos')
First beeing the length of string nikos in characters while the second being the length of an ???
> The python interpreter will represent all values below 256 as ASCII
> characters if they are printable:
> >>> ord(b'a')
> 97
> >>> hex(97)
> '0x61'
> >>> b'\x61' == b'a'
> True
> The Python designers have decided to use b'a' instead of b'\x61'.
b'a' and b'\x61' are the bytestrings of char 'a' after utf-8 encoding?
This ord(b'a' )should give an error in my opinion:
ord('a') should return the ordinal value of char 'a', not ord(b'a')
[toc] | [prev] | [next] | [standalone]
| From | Νικόλαος Κούρας <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-06-10 02:59 -0700 |
| Message-ID | <349f7474-fce3-4891-8eb2-92fc53606fb2@googlegroups.com> |
| In reply to | #47535 |
> >>>> s = 'α'
> >>>> s.encode('utf-8')
> > b'\xce\xb1'
'b' stands for binary right?
b'\xce\xb1' = we are looking at a byte in a hexadecimal format?
if yes how could we see it in binary and decimal represenation?
> > I see that the encoding of this char takes 2 bytes. But why two exactly?
> > How do i calculate how many bits are needed to store this char into bytes?
> Because utf-8 takes 1 to 4 bytes to encode characters
Since 2^8 = 256, utf-8 should store the first 256 chars of unicode charset using 1 byte.
Also Since 2^16 = 65535, utf-8 should store the first 65535 chars of unicode charset using 2 bytes and so on.
But i know that this is not the case.
But i dont understand why.
> >>>> s = 'a'
> >>>> s.encode('utf-8')
> > b'a'
> utf-8 takes ASCII as it is, as 1 byte. They are the same
EBCDIC and ASCII and Unicode are charactet sets, correct?
iso-8859-1, iso-8859-7, utf-8, utf-16, utf-32 and so on are encoding methods, right?
[toc] | [prev] | [next] | [standalone]
| From | Andreas Perstinger <andipersti@gmail.com> |
|---|---|
| Date | 2013-06-10 12:42 +0200 |
| Message-ID | <mailman.2962.1370860953.3114.python-list@python.org> |
| In reply to | #47539 |
On 10.06.2013 11:59, Νικόλαος Κούρας wrote:
>> >>>> s = 'α'
>> >>>> s.encode('utf-8')
>> > b'\xce\xb1'
>
> 'b' stands for binary right?
No, here it stands for bytes:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
> b'\xce\xb1' = we are looking at a byte in a hexadecimal format?
No, b'\xce\xb1' represents a byte object containing 2 bytes.
Yes, each byte is represented in hexadecimal format.
> if yes how could we see it in binary and decimal represenation?
>>> s = b'\xce\xb1'
>>> s[0]
206
>>> bin(s[0])
'0b11001110'
>>> s[1]
177
>>> bin(s[1])
'0b10110001'
A byte object is a sequence of bytes (= integer values) and support
indexing.
http://docs.python.org/3/library/stdtypes.html#bytes
> Since 2^8 = 256, utf-8 should store the first 256 chars of unicode
> charset using 1 byte.
>
> Also Since 2^16 = 65535, utf-8 should store the first 65535 chars of
> unicode charset using 2 bytes and so on.
>
> But i know that this is not the case. But i dont understand why.
Because your method doesn't work.
If you use all possible 256 bit-combinations to represent a valid
character, how do you decide where to stop in a sequence of bytes?
>> >>>> s = 'a'
>> >>>> s.encode('utf-8')
>> > b'a'
>> utf-8 takes ASCII as it is, as 1 byte. They are the same
>
> EBCDIC and ASCII and Unicode are charactet sets, correct?
>
> iso-8859-1, iso-8859-7, utf-8, utf-16, utf-32 and so on are encoding methods, right?
>
Look at http://www.unicode.org/glossary/ for an explanation of all the
terms.
Bye, Andreas
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-06-10 11:59 +0000 |
| Message-ID | <51b5bf86$0$29997$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #47522 |
On Mon, 10 Jun 2013 00:10:38 -0700, nagia.retsina wrote:
> Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano
> έγραψε:
>
>> py> c = 'α'
>> py> ord(c)
>> 945
>
> The number 945 is the characters 'α' ordinal value in the unicode
> charset correct?
Correct.
> The command in the python interactive session to show me how many bytes
> this character will take upon encoding to utf-8 is:
>
>>>> s = 'α'
>>>> s.encode('utf-8')
> b'\xce\xb1'
>
> I see that the encoding of this char takes 2 bytes. But why two exactly?
Because that's how UTF-8 works. If it was a different encoding, it might
be 4 bytes, or 2, or 1, or 101, or 7, or 3. But it is UTF-8, so it takes
2 bytes. If you want to understand how UTF-8 works, look it up on
Wikipedia.
> How do i calculate how many bits are needed to store this char into
> bytes?
Every byte is made of 8 bits. There are two bytes. So multiply 8 by 2.
> Trying to to the same here but it gave me no bytes back.
>
>>>> s = 'a'
>>>> s.encode('utf-8')
> b'a'
There is a byte there. The byte is printed by Python as b'a', which in my
opinion is a design mistake. That makes it look like a string, but it is
not a string, and would be better printed as b'\x61'. But regardless of
the display, it is still a single byte.
>>py> c.encode('utf-8')
>> b'\xce\xb1'
>
> 2 bytes here. why 2?
Because that's how UTF-8 works.
>> py> c.encode('utf-16be')
>> b'\x03\xb1'
>
> 2 byets here also. but why 3 different bytes?
Because it is a different encoding.
> the ordinal value of char 'a' is the same in unicode.
The same as what?
> the encodign system just takes the ordinal value end encode, but
> sinc eit uses 2 bytes should these 2 bytes be the same?
No.
That's like saying that since a dog in Germany has four legs and one
head, and a dog in France has four legs and one head, dog should be
spelled "Hund" in both Germany and France.
Different encodings are like different languages. They spell the same
word differently.
>> py> c.encode('utf-32be')
>> b'\x00\x00\x03\xb1
>
> every char here takes exactly 4 bytes to be stored. okey.
>
>> py> c.encode('iso-8859-7')
>> b'\xe1'
>
> And also does '\x' means that the value is being respresented in hex
> way? and when i bin(6) i see '0b1000001'
>
> I should expect to see 8 bits of 1s and 0's. what the 'b' is tryign to
> say?
"b" for Binary.
Just like 0o1234 uses octal, "o" for Octal.
And 0x123EF uses hexadecimal. "x" for heXadecimal.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Νικόλαος Κούρας <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-06-10 07:27 -0700 |
| Message-ID | <49c31f79-1258-47a3-af11-d61b63b9db78@googlegroups.com> |
| In reply to | #47550 |
Τη Δευτέρα, 10 Ιουνίου 2013 2:59:03 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
> On Mon, 10 Jun 2013 00:10:38 -0700, nagia.retsina wrote:
>
>
>
> > Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano
>
> > έγραψε:
>
> >
>
> >> py> c = 'α'
>
> >> py> ord(c)
>
> >> 945
>
> >
>
> > The number 945 is the characters 'α' ordinal value in the unicode
>
> > charset correct?
>
>
>
> Correct.
>
>
>
>
>
> > The command in the python interactive session to show me how many bytes
>
> > this character will take upon encoding to utf-8 is:
>
> >
>
> >>>> s = 'α'
>
> >>>> s.encode('utf-8')
>
> > b'\xce\xb1'
>
> >
>
> > I see that the encoding of this char takes 2 bytes. But why two exactly?
>
>
>
> Because that's how UTF-8 works. If it was a different encoding, it might
>
> be 4 bytes, or 2, or 1, or 101, or 7, or 3. But it is UTF-8, so it takes
>
> 2 bytes. If you want to understand how UTF-8 works, look it up on
>
> Wikipedia.
>
>
>
>
>
> > How do i calculate how many bits are needed to store this char into
>
> > bytes?
>
>
>
> Every byte is made of 8 bits. There are two bytes. So multiply 8 by 2.
>
>
>
>
>
> > Trying to to the same here but it gave me no bytes back.
>
> >
>
> >>>> s = 'a'
>
> >>>> s.encode('utf-8')
>
> > b'a'
>
>
>
> There is a byte there. The byte is printed by Python as b'a', which in my
> opinion is a design mistake. That makes it look like a string, but it is
> not a string, and would be better printed as b'\x61'. But regardless of
> the display, it is still a single byte.
Perhaps, up to 127 ASCII chars python thinks its better for human to read the character representaion of the stored byte, instead of hex's. Just a guess.
> Just like 0o1234 uses octal, "o" for Octal.
> And 0x123EF uses hexadecimal. "x" for heXadecimal.
Why the leadin zero before octal's 'o' and hex's 'x' and binary's 'b' ?
Iam not goin to tired you any more, because ia hve exhaust myself tlo days now tryign to get my head around this.
Please confirm i ahve understood correctly:
I did but docs confuse me even more. Can you pleas ebut it simple.
Unicode as i understand it was created out of need for a bigger character set compared to ASCII which could hold up to 127 chars(and extended versions of it up to 256), that could be able to hold all worlds symbols.
ASCII and Unicode are character sets.
Everything else sees to be an encoding system that work upon those characters sets.
If what i said is true the last thing that still confuses me is that
iso-8859-7(256 chars) seems like charactet set and an encoding method too.
Can it be both or it is iso-8859-7 encoding method of Unicode character set similar as uTF8 is also Unicode's encoding method?
[toc] | [prev] | [next] | [standalone]
| From | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| Date | 2013-06-10 12:48 -0700 |
| Message-ID | <d82ed1a4-22b6-4484-8e76-07f901d407cf@q9g2000vbj.googlegroups.com> |
| In reply to | #47564 |
----- A coding scheme works with three sets. A *unique* set of CHARACTERS, a *unique* set of CODE POINTS and a *unique* set of ENCODED CODE POINTS, unicode or not. The relation between the set of characters and the set of the code points is a *human* table, created with a sheet of paper and a pencil, a deliberate choice of characters with integers as "labels". The relation between the set of the code points and the set of encoded code points is a "mathematical" operation. In the case of an "8bits" coding scheme, like iso-XXX, this operation is a no-op, the relation is an identity. Shortly: set of code points == set of encoded code points. In the case of unicode, The Unicode consortium endorses three such mathematical operations called UTF-8, UTF-16 and UTF-32 where UTF means Unicode Transformation Format, a confusing wording meaning at the same time, the process and the result of the process. This Unicode Transformation does not produce bytes, it produces words/chunks/tokens of *bits* with lengths 8, 16, 32, called Unicode Transformation Units (from this the names UTF-8, -16, -32). At this level, only a structure has been defined (there is no computing). Very important, an healthy coding scheme works conceptually only with this *unique" set of encoded code points, not with bytes, characters or code points. The last step, the machine implementation: it is up to the processor, the compiler, the language to implement all these Unicode Transformation Units with of course their related specifities: char, w_char, int, long, endianess, rune (Go language), ... Not too over-simplified or not too over-complicated and enough to understand one, if not THE, design mistake of the flexible string representation. jmf
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2013-06-10 13:28 -0700 |
| Message-ID | <2ea2a1c2-5df8-4c86-bb29-474925498f5f@googlegroups.com> |
| In reply to | #47594 |
On Monday, June 10, 2013 3:48:08 PM UTC-4, jmfauth wrote: > ----- > > > > A coding scheme works with three sets. A *unique* set > of CHARACTERS, a *unique* set of CODE POINTS and a *unique* > set of ENCODED CODE POINTS, unicode or not. > > The relation between the set of characters and the set of the > code points is a *human* table, created with a sheet of paper > and a pencil, a deliberate choice of characters with integers > as "labels". > > The relation between the set of the code points and the > set of encoded code points is a "mathematical" operation. > > In the case of an "8bits" coding scheme, like iso-XXX, > this operation is a no-op, the relation is an identity. > Shortly: set of code points == set of encoded code points. > > In the case of unicode, The Unicode consortium endorses > three such mathematical operations called UTF-8, UTF-16 and > UTF-32 where UTF means Unicode Transformation Format, a > confusing wording meaning at the same time, the process > and the result of the process. This Unicode Transformation does > not produce bytes, it produces words/chunks/tokens of *bits* with > lengths 8, 16, 32, called Unicode Transformation Units (from this > the names UTF-8, -16, -32). At this level, only a structure has > been defined (there is no computing). This is a really good description of the issues involved with character sets and encodings, thanks. > Very important, an healthy > coding scheme works conceptually only with this *unique" set > of encoded code points, not with bytes, characters or code points. > You don't explain why it is important to work with encoded code points. What's wrong with working with code points? > > The last step, the machine implementation: it is up to the > processor, the compiler, the language to implement all these > Unicode Transformation Units with of course their related > specifities: char, w_char, int, long, endianess, rune (Go > language), ... > > Not too over-simplified or not too over-complicated and enough > to understand one, if not THE, design mistake of the flexible > string representation. > > jmf Again you've made the claim that the flexible string representation is a mistake. But you haven't said WHY. I can't tell if you are trolling us, or are deluded, or genuinely don't understand what you are talking about. Some day you might explain yourself. I look forward to it. --Ned.
[toc] | [prev] | [standalone]
Page 4 of 4 — ← Prev page 1 2 3 [4]
Back to top | Article view | comp.lang.python
csiph-web