Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #47322 > unrolled thread

Re: Changing filenames from Greeklish => Greek (subprocess complain)

Started byCameron Simpson <cs@zip.com.au>
First post2013-06-07 18:53 +1000
Last post2013-06-10 13:28 -0700
Articles 8 on this page of 68 — 14 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-07 18:53 +1000
    Re: Changing filenames from Greeklish => Greek (subprocess complain) alex23 <wuwei23@gmail.com> - 2013-06-07 02:41 -0700
    Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 04:53 -0700
      Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-07 15:29 +0100
        Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 11:52 -0700
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Zero Piraeus <schesis@gmail.com> - 2013-06-07 15:31 -0400
          Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-07 21:45 +0100
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Zero Piraeus <schesis@gmail.com> - 2013-06-07 19:24 -0400
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-08 12:52 +1000
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-07 23:49 -0700
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-08 16:58 +1000
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-08 07:26 +0000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-08 17:40 +1000
              Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-08 17:32 +0100
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 09:53 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 10:35 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) MRAB <python@mrabarnett.plus.com> - 2013-06-08 18:48 +0100
      Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-07 15:33 +0000
      Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-08 12:49 +1000
      Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 21:01 +0300
        Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-08 19:01 +0000
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 14:14 -0700
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 08:32 +1000
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 07:46 +0300
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 06:25 +0000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 18:02 +1000
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:03 -0700
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-08 14:21 -0700
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-09 08:10 +1000
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 01:11 -0700
      Re: Changing filenames from Greeklish => Greek (subprocess complain) Chris Angelico <rosuav@gmail.com> - 2013-06-09 04:47 +1000
        Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-08 22:09 -0700
          Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 06:45 +0000
            Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-09 00:00 -0700
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 08:15 +0000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:14 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 03:32 -0700
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 19:16 +1000
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 12:36 +0000
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-09 10:25 -0700
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Lele Gaifax <lele@metapensiero.it> - 2013-06-09 10:55 +0200
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:08 -0700
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Lele Gaifax <lele@metapensiero.it> - 2013-06-09 11:20 +0200
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:38 -0700
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-09 14:24 +0200
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 13:13 +0000
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-06-09 13:05 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:42 -0700
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 03:37 -0700
                      Re: Changing filenames from Greeklish => Greek (subprocess complain) Larry Hudson <orgnut@yahoo.com> - 2013-06-10 00:51 -0700
                        Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 01:11 -0700
                          Re: Changing filenames from Greeklish => Greek (subprocess complain) Larry Hudson <orgnut@yahoo.com> - 2013-06-11 00:20 -0700
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 11:50 +0000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 05:18 -0700
            Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:00 -0700
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Cameron Simpson <cs@zip.com.au> - 2013-06-09 19:12 +1000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-09 02:20 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-06-09 13:01 -0700
              Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-09 12:31 +0000
                Re: Changing filenames from Greeklish => Greek (subprocess complain) nagia.retsina@gmail.com - 2013-06-10 00:10 -0700
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-10 10:15 +0200
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 01:54 -0700
                      Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 02:59 -0700
                        Re: Changing filenames from Greeklish => Greek (subprocess complain) Andreas Perstinger <andipersti@gmail.com> - 2013-06-10 12:42 +0200
                  Re: Changing filenames from Greeklish => Greek (subprocess complain) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-10 11:59 +0000
                    Re: Changing filenames from Greeklish => Greek (subprocess complain) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 07:27 -0700
                      Re: Changing filenames from Greeklish => Greek (subprocess complain) jmfauth <wxjmfauth@gmail.com> - 2013-06-10 12:48 -0700
                        Re: Changing filenames from Greeklish => Greek (subprocess complain) Ned Batchelder <ned@nedbatchelder.com> - 2013-06-10 13:28 -0700

Page 4 of 4 — ← Prev page 1 2 3 [4]


#47532

FromAndreas Perstinger <andipersti@gmail.com>
Date2013-06-10 10:15 +0200
Message-ID<mailman.2959.1370852149.3114.python-list@python.org>
In reply to#47522
On 10.06.2013 09:10, nagia.retsina@gmail.com wrote:
> Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
>
>> py> c = 'α'
>> py> ord(c)
>> 945
>
> The number 945 is the characters 'α' ordinal value in the unicode charset correct?

Yes, the unicode character set is just a big list of characters. The 
946th character in that list (starting from 0) happens to be 'α'.

> The command in the python interactive session to show me how many bytes
> this character will take upon encoding to utf-8 is:
>
>>>> s = 'α'
>>>> s.encode('utf-8')
> b'\xce\xb1'
>
> I see that the encoding of this char takes 2 bytes. But why two exactly?

That's how the encoding is designed. Haven't you read the wikipedia 
article which was already mentioned several times?

> How do i calculate how many bits are needed to store this char into bytes?

You need to understand how UTF-8 works. Read the wikipedia article.

> Trying to to the same here but it gave me no bytes back.
>
>>>> s = 'a'
>>>> s.encode('utf-8')
> b'a'

The encode method returns a byte object. It's length will tell you how 
many bytes there are:

 >>> len(b'a')
1
 >>> len(b'\xce\xb1')
2

The python interpreter will represent all values below 256 as ASCII 
characters if they are printable:

 >>> ord(b'a')
97
 >>> hex(97)
'0x61'
 >>> b'\x61' == b'a'
True

The Python designers have decided to use b'a' instead of b'\x61'.

>>py> c.encode('utf-8')
>> b'\xce\xb1'
>
> 2 bytes here. why 2?

Same as your first question.

>> py> c.encode('utf-16be')
>> b'\x03\xb1'
>
> 2 byets here also. but why 3 different bytes? the ordinal value of
> char 'a' is the same in unicode. the encodign system just takes the
> ordinal value end encode, but sinc eit uses 2 bytes should these 2 bytes
> be the same?

'utf-16be' is a different encoding scheme, thus it uses other rules to 
determine how each character is translated into a byte sequence.

>> py> c.encode('iso-8859-7')
>> b'\xe1'
>
> And also does '\x' means that the value is being respresented in hex way?
> and when i bin(6) i see '0b1000001'
>
> I should expect to see 8 bits of 1s and 0's. what the 'b' is tryign to say?
>
'\x' is an escape sequence and means that the following two characters 
should be interpreted as a number in hexadecimal notation (see also the 
table of allowed escape sequences: 
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals 
).

'0b' tells you that the number is printed in binary notation.
Leading zeros are usually discarded when a number is printed:
 >>> bin(70)
'0b1000110'
 >>> 0b100110 == 0b00100110
True
 >>> 0b100110 == 0b0000000000100110
True

It's the same with decimal notation. You wouldn't say 00123 is different 
from 123, would you?

Bye, Andreas

[toc] | [prev] | [next] | [standalone]


#47535

FromΝικόλαος Κούρας <nikos.gr33k@gmail.com>
Date2013-06-10 01:54 -0700
Message-ID<c6c9a67a-8eab-41e3-b8bb-d013fd7805b5@googlegroups.com>
In reply to#47532
Τη Δευτέρα, 10 Ιουνίου 2013 11:15:38 π.μ. UTC+3, ο χρήστης Andreas Perstinger έγραψε:

What is the difference between len('nikos') and len(b'nikos')
First beeing the length of string nikos in characters while the second being the length of an ???


> The python interpreter will represent all values below 256 as ASCII 
> characters if they are printable:

>  >>> ord(b'a')
> 97
>  >>> hex(97)
> '0x61'
>  >>> b'\x61' == b'a'
> True
> The Python designers have decided to use b'a' instead of b'\x61'.

b'a' and b'\x61' are the bytestrings of char 'a' after utf-8 encoding?

This ord(b'a' )should give an error in my opinion:

ord('a') should return the ordinal value of char 'a', not ord(b'a')

[toc] | [prev] | [next] | [standalone]


#47539

FromΝικόλαος Κούρας <nikos.gr33k@gmail.com>
Date2013-06-10 02:59 -0700
Message-ID<349f7474-fce3-4891-8eb2-92fc53606fb2@googlegroups.com>
In reply to#47535
> >>>> s = 'α' 
> >>>> s.encode('utf-8') 
> > b'\xce\xb1' 

'b' stands for binary right? 
 b'\xce\xb1' = we are looking at a byte in a hexadecimal format? 
if yes how could we see it in binary and decimal represenation? 
  
> > I see that the encoding of this char takes 2 bytes. But why two exactly? 
> > How do i calculate how many bits are needed to store this char into bytes? 
  
> Because utf-8 takes 1 to 4 bytes to encode characters 

Since 2^8 = 256, utf-8 should store the first 256 chars of unicode charset using 1 byte. 

Also Since 2^16 = 65535, utf-8 should store the first 65535 chars of unicode charset using 2 bytes and so on. 

But i know that this is not the case. 
But i dont understand why. 


> >>>> s = 'a' 
> >>>> s.encode('utf-8') 
> > b'a' 
> utf-8 takes ASCII as it is, as 1 byte. They are the same 

EBCDIC and ASCII and Unicode are charactet sets, correct? 

iso-8859-1, iso-8859-7, utf-8, utf-16, utf-32 and so on are encoding methods, right?

[toc] | [prev] | [next] | [standalone]


#47545

FromAndreas Perstinger <andipersti@gmail.com>
Date2013-06-10 12:42 +0200
Message-ID<mailman.2962.1370860953.3114.python-list@python.org>
In reply to#47539
On 10.06.2013 11:59, Νικόλαος Κούρας wrote:
>> >>>> s = 'α'
>> >>>> s.encode('utf-8')
>> > b'\xce\xb1'
>
> 'b' stands for binary right?

No, here it stands for bytes:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

>   b'\xce\xb1' = we are looking at a byte in a hexadecimal format?

No, b'\xce\xb1' represents a byte object containing 2 bytes.
Yes, each byte is represented in hexadecimal format.

> if yes how could we see it in binary and decimal represenation?

 >>> s = b'\xce\xb1'
 >>> s[0]
206
 >>> bin(s[0])
'0b11001110'
 >>> s[1]
177
 >>> bin(s[1])
'0b10110001'

A byte object is a sequence of bytes (= integer values) and support 
indexing.
http://docs.python.org/3/library/stdtypes.html#bytes

> Since 2^8 = 256, utf-8 should store the first 256 chars of unicode
> charset using 1 byte.
>
> Also Since 2^16 = 65535, utf-8 should store the first 65535 chars of
> unicode charset using 2 bytes and so on.
>
> But i know that this is not the case. But i dont understand why.

Because your method doesn't work.
If you use all possible 256 bit-combinations to represent a valid 
character, how do you decide where to stop in a sequence of bytes?

>> >>>> s = 'a'
>> >>>> s.encode('utf-8')
>> > b'a'
>> utf-8 takes ASCII as it is, as 1 byte. They are the same
>
> EBCDIC and ASCII and Unicode are charactet sets, correct?
>
> iso-8859-1, iso-8859-7, utf-8, utf-16, utf-32 and so on are encoding methods, right?
>

Look at http://www.unicode.org/glossary/ for an explanation of all the 
terms.

Bye, Andreas

[toc] | [prev] | [next] | [standalone]


#47550

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-06-10 11:59 +0000
Message-ID<51b5bf86$0$29997$c3e8da3$5496439d@news.astraweb.com>
In reply to#47522
On Mon, 10 Jun 2013 00:10:38 -0700, nagia.retsina wrote:

> Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano
> έγραψε:
> 
>> py> c = 'α'
>> py> ord(c)
>> 945
> 
> The number 945 is the characters 'α' ordinal value in the unicode
> charset correct?

Correct.


> The command in the python interactive session to show me how many bytes
> this character will take upon encoding to utf-8 is:
> 
>>>> s = 'α'
>>>> s.encode('utf-8')
> b'\xce\xb1'
> 
> I see that the encoding of this char takes 2 bytes. But why two exactly?

Because that's how UTF-8 works. If it was a different encoding, it might 
be 4 bytes, or 2, or 1, or 101, or 7, or 3. But it is UTF-8, so it takes 
2 bytes. If you want to understand how UTF-8 works, look it up on 
Wikipedia. 


> How do i calculate how many bits are needed to store this char into
> bytes?

Every byte is made of 8 bits. There are two bytes. So multiply 8 by 2.


> Trying to to the same here but it gave me no bytes back.
> 
>>>> s = 'a'
>>>> s.encode('utf-8')
> b'a'

There is a byte there. The byte is printed by Python as b'a', which in my 
opinion is a design mistake. That makes it look like a string, but it is 
not a string, and would be better printed as b'\x61'. But regardless of 
the display, it is still a single byte.

 
>>py> c.encode('utf-8')
>> b'\xce\xb1'
> 
> 2 bytes here. why 2?

Because that's how UTF-8 works.


>> py> c.encode('utf-16be')
>> b'\x03\xb1'
> 
> 2 byets here also. but why 3 different bytes? 

Because it is a different encoding.


> the ordinal value of char 'a' is the same in unicode.

The same as what?


> the encodign system just takes the ordinal value end encode, but 
> sinc eit uses 2 bytes should these 2 bytes be the same?

No.

That's like saying that since a dog in Germany has four legs and one 
head, and a dog in France has four legs and one head, dog should be 
spelled "Hund" in both Germany and France.

Different encodings are like different languages. They spell the same 
word differently.


>> py> c.encode('utf-32be')
>> b'\x00\x00\x03\xb1
> 
> every char here takes exactly 4 bytes to be stored. okey.
> 
>> py> c.encode('iso-8859-7')
>> b'\xe1'
> 
> And also does '\x' means that the value is being respresented in hex
> way? and when i bin(6) i see '0b1000001'
> 
> I should expect to see 8 bits of 1s and 0's. what the 'b' is tryign to
> say?

"b" for Binary.

Just like 0o1234 uses octal, "o" for Octal.

And 0x123EF uses hexadecimal. "x" for heXadecimal.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#47564

FromΝικόλαος Κούρας <nikos.gr33k@gmail.com>
Date2013-06-10 07:27 -0700
Message-ID<49c31f79-1258-47a3-af11-d61b63b9db78@googlegroups.com>
In reply to#47550
Τη Δευτέρα, 10 Ιουνίου 2013 2:59:03 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
> On Mon, 10 Jun 2013 00:10:38 -0700, nagia.retsina wrote:
> 
> 
> 
> > Τη Κυριακή, 9 Ιουνίου 2013 3:31:44 μ.μ. UTC+3, ο χρήστης Steven D'Aprano
> 
> > έγραψε:
> 
> > 
> 
> >> py> c = 'α'
> 
> >> py> ord(c)
> 
> >> 945
> 
> > 
> 
> > The number 945 is the characters 'α' ordinal value in the unicode
> 
> > charset correct?
> 
> 
> 
> Correct.
> 
> 
> 
> 
> 
> > The command in the python interactive session to show me how many bytes
> 
> > this character will take upon encoding to utf-8 is:
> 
> > 
> 
> >>>> s = 'α'
> 
> >>>> s.encode('utf-8')
> 
> > b'\xce\xb1'
> 
> > 
> 
> > I see that the encoding of this char takes 2 bytes. But why two exactly?
> 
> 
> 
> Because that's how UTF-8 works. If it was a different encoding, it might 
> 
> be 4 bytes, or 2, or 1, or 101, or 7, or 3. But it is UTF-8, so it takes 
> 
> 2 bytes. If you want to understand how UTF-8 works, look it up on 
> 
> Wikipedia. 
> 
> 
> 
> 
> 
> > How do i calculate how many bits are needed to store this char into
> 
> > bytes?
> 
> 
> 
> Every byte is made of 8 bits. There are two bytes. So multiply 8 by 2.
> 
> 
> 
> 
> 
> > Trying to to the same here but it gave me no bytes back.
> 
> > 
> 
> >>>> s = 'a'
> 
> >>>> s.encode('utf-8')
> 
> > b'a'
> 
> 
> 
> There is a byte there. The byte is printed by Python as b'a', which in my  
> opinion is a design mistake. That makes it look like a string, but it is  
> not a string, and would be better printed as b'\x61'. But regardless of 
> the display, it is still a single byte.


Perhaps, up to 127 ASCII chars python thinks its better for human to read the character representaion of the stored byte, instead of hex's. Just a guess.

> Just like 0o1234 uses octal, "o" for Octal.
> And 0x123EF uses hexadecimal. "x" for heXadecimal.

Why the leadin zero before octal's 'o' and hex's 'x'  and binary's 'b' ?


Iam not goin to tired you any more, because ia hve exhaust myself tlo days now tryign to get my head around this.

Please confirm i ahve understood correctly:

I did but docs confuse me even more. Can you pleas ebut it simple.

Unicode as i understand it was created out of need for a bigger character set compared to ASCII which could hold up to 127 chars(and extended versions of it up to 256), that could be able to hold all worlds symbols.

ASCII and Unicode are character sets.

Everything else sees to be an encoding system that work upon those characters sets.

If what i said is true the last thing that still confuses me is that

iso-8859-7(256 chars) seems like charactet set and an encoding method too.
Can it be both or it is iso-8859-7 encoding method of Unicode character set similar as uTF8 is also Unicode's encoding method?

[toc] | [prev] | [next] | [standalone]


#47594

Fromjmfauth <wxjmfauth@gmail.com>
Date2013-06-10 12:48 -0700
Message-ID<d82ed1a4-22b6-4484-8e76-07f901d407cf@q9g2000vbj.googlegroups.com>
In reply to#47564
-----

A coding scheme works with three sets. A *unique* set
of CHARACTERS, a *unique* set of CODE POINTS and a *unique*
set of ENCODED CODE POINTS, unicode or not.

The relation between the set of characters and the set of the
code points is a *human* table, created with a sheet of paper
and a pencil, a deliberate choice of characters with integers
as "labels".

The relation between the set of the code points and the
set of encoded code points is a "mathematical" operation.

In the case of an "8bits" coding scheme, like iso-XXX,
this operation is a no-op, the relation is an identity.
Shortly: set of code points == set of encoded code points.

In the case of unicode, The Unicode consortium endorses
three such mathematical operations called UTF-8, UTF-16 and
UTF-32 where UTF means Unicode Transformation Format, a
confusing wording meaning at the same time, the process
and the result of the process. This Unicode Transformation does
not produce bytes, it produces words/chunks/tokens of *bits* with
lengths 8, 16, 32, called Unicode Transformation Units (from this
the names UTF-8, -16, -32). At this level, only a structure has
been defined (there is no computing). Very important, an healthy
coding scheme works conceptually only with this *unique" set
of encoded code points, not with bytes, characters or code points.

The last step, the machine implementation: it is up to the
processor, the compiler, the language to implement all these
Unicode Transformation Units with of course their related
specifities: char, w_char, int, long, endianess, rune (Go
language), ...

Not too over-simplified or not too over-complicated and enough
to understand one, if not THE, design mistake of the flexible
string representation.

jmf

[toc] | [prev] | [next] | [standalone]


#47601

FromNed Batchelder <ned@nedbatchelder.com>
Date2013-06-10 13:28 -0700
Message-ID<2ea2a1c2-5df8-4c86-bb29-474925498f5f@googlegroups.com>
In reply to#47594
On Monday, June 10, 2013 3:48:08 PM UTC-4, jmfauth wrote:
> -----
> 
> 
> 
> A coding scheme works with three sets. A *unique* set
> of CHARACTERS, a *unique* set of CODE POINTS and a *unique*
> set of ENCODED CODE POINTS, unicode or not.
> 
> The relation between the set of characters and the set of the
> code points is a *human* table, created with a sheet of paper
> and a pencil, a deliberate choice of characters with integers
> as "labels".
> 
> The relation between the set of the code points and the
> set of encoded code points is a "mathematical" operation.
> 
> In the case of an "8bits" coding scheme, like iso-XXX,
> this operation is a no-op, the relation is an identity.
> Shortly: set of code points == set of encoded code points.
> 
> In the case of unicode, The Unicode consortium endorses
> three such mathematical operations called UTF-8, UTF-16 and
> UTF-32 where UTF means Unicode Transformation Format, a
> confusing wording meaning at the same time, the process
> and the result of the process. This Unicode Transformation does
> not produce bytes, it produces words/chunks/tokens of *bits* with
> lengths 8, 16, 32, called Unicode Transformation Units (from this
> the names UTF-8, -16, -32). At this level, only a structure has
> been defined (there is no computing). 

This is a really good description of the issues involved with character sets and encodings, thanks.

> Very important, an healthy
> coding scheme works conceptually only with this *unique" set
> of encoded code points, not with bytes, characters or code points.
> 

You don't explain why it is important to work with encoded code points.  What's wrong with working with code points?

> 
> The last step, the machine implementation: it is up to the
> processor, the compiler, the language to implement all these
> Unicode Transformation Units with of course their related
> specifities: char, w_char, int, long, endianess, rune (Go
> language), ...
> 
> Not too over-simplified or not too over-complicated and enough
> to understand one, if not THE, design mistake of the flexible
> string representation.
> 
> jmf

Again you've made the claim that the flexible string representation is a mistake.  But you haven't said WHY.  I can't tell if you are trolling us, or are deluded, or genuinely don't understand what you are talking about.

Some day you might explain yourself. I look forward to it.

--Ned.

[toc] | [prev] | [standalone]


Page 4 of 4 — ← Prev page 1 2 3 [4]

Back to top | Article view | comp.lang.python


csiph-web