X-Received: by 10.224.36.66 with SMTP id s2mr4494602qad.6.1370726041166; Sat, 08 Jun 2013 14:14:01 -0700 (PDT)
X-Received: by 10.49.0.200 with SMTP id 8mr180829qeg.38.1370726041148; Sat, 08 Jun 2013 14:14:01 -0700 (PDT)
Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!ch1no3212252qab.0!news-out.google.com!y6ni1323qax.0!nntp.google.com!ch1no3212244qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Sat, 8 Jun 2013 14:14:01 -0700 (PDT)
In-Reply-To: <51b37fa4$0$29966$c3e8da3$5496439d@news.astraweb.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=79.103.41.173; posting-account=DYJQ-woAAACEPH85Au2BhUVfFTfSfVa4
NNTP-Posting-Host: 79.103.41.173
References: <7d8da6c9-fb92-4329-b207-4280f29ba664@googlegroups.com> <20130608024931.GA77888@cskk.homeip.net> <mailman.2891.1370714502.3114.python-list@python.org> <51b37fa4$0$29966$c3e8da3$5496439d@news.astraweb.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e1cfd5ed-798d-44fa-8bf7-17f3549a288e@googlegroups.com>
Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain)
From: =?ISO-8859-7?B?zenq/Ovh7/Igyu/98eHy?= <nikos.gr33k@gmail.com>
Injection-Date: Sat, 08 Jun 2013 21:14:01 +0000
Content-Type: text/plain; charset=ISO-8859-7
Content-Transfer-Encoding: quoted-printable
Xref: csiph.com comp.lang.python:47406

=D4=E7 =D3=DC=E2=E2=E1=F4=EF, 8 =C9=EF=F5=ED=DF=EF=F5 2013 10:01:57 =EC.=EC=
. UTC+3, =EF =F7=F1=DE=F3=F4=E7=F2 Steven D'Aprano =DD=E3=F1=E1=F8=E5:

> ASCII actually needs 7 bits to store a character. Since computers are =20
> optimized to work with bytes, not bits, normally ASCII characters are
> stored in a single byte, with one bit wasted.

So ASCII and Unicode are 2 Encoding Systems currently in use.
How should i imagine them, visualize them?
Like tables 'A' =3D 65, 'B' =3D 66 and so on?

But if i do then that would be the visualization of a 'charset' not of an e=
ncoding system.
What the diffrence of an encoding system and of a charset?

ebcdic - ascii - unicode =3D al of them are encoding systems

greek-iso - latin-iso - utf8 - utf16 =3D all of them are charsets.

What are these differences? i cant imagine them all, i can only imagine cha=
rsets not encodign systems.

Why python interprets by default all given strings as unicode and not ascii=
? because the former supports many positions while ascii only 127 positions=
 , hence can interpet only 127 different characters?=20


> "Narrow" Unicode uses two bytes per character. Since two bytes is only=20
> enough for about 65,000 characters, not 1,000,000+, the rest of the=20
> characters are stored as pairs of two-byte "surrogates".

surrogates literal means a replacemnt?


> Latin-1 is similar, except there are 256 positions. Greek ISO-8859-7 is=
=20
> also similar, also 256 positions, but the characters are different. And=
=20
> so on, with dozens of charsets.=20

Latin has to display english chars(capital, small) + numbers + symbols. tha=
t would be 127 why 256?

greek =3D all of the above plus greek chars, no?

> And then there is Unicode, which includes *every* character is all of=20
> those dozens of charsets. It has 1114111 positions (most are currently =
=20
> unfilled).

Shouldt the positions that Unicode has to use equal to the summary of all a=
vailable characters of all the languages of the worlds plus numbers and spe=
cial chars? why 1.000.000+ why the need for so many positions? Narrow Unico=
de format (2 byted) can cover all ofmthe worlds symbols.

> An encoding is simply a program that takes a character and returns a=20
> byte, or visa versa. For instance, the ASCII encoding will take character=
=20
> 'A'. That is found at position 65, which is 0x41 in hexadecimal, so the=
=20
> ASCII encoding turns character 'A' into byte 0x41, and visa versa.

Why you say ASCII turn a character into HEX format and not as in binary for=
mat?
Isnt the latter the way bytes are stored into hdd, like 010101111010101 etc=
?
Are they stored as hex instead or you just said so to avoid printing 0s and=
 1s?