X-Received: by 10.224.36.66 with SMTP id s2mr4494602qad.6.1370726041166; Sat, 08 Jun 2013 14:14:01 -0700 (PDT) X-Received: by 10.49.0.200 with SMTP id 8mr180829qeg.38.1370726041148; Sat, 08 Jun 2013 14:14:01 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!ch1no3212252qab.0!news-out.google.com!y6ni1323qax.0!nntp.google.com!ch1no3212244qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Sat, 8 Jun 2013 14:14:01 -0700 (PDT) In-Reply-To: <51b37fa4$0$29966$c3e8da3$5496439d@news.astraweb.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=79.103.41.173; posting-account=DYJQ-woAAACEPH85Au2BhUVfFTfSfVa4 NNTP-Posting-Host: 79.103.41.173 References: <7d8da6c9-fb92-4329-b207-4280f29ba664@googlegroups.com> <20130608024931.GA77888@cskk.homeip.net> <51b37fa4$0$29966$c3e8da3$5496439d@news.astraweb.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) From: =?ISO-8859-7?B?zenq/Ovh7/Igyu/98eHy?= Injection-Date: Sat, 08 Jun 2013 21:14:01 +0000 Content-Type: text/plain; charset=ISO-8859-7 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:47406 =D4=E7 =D3=DC=E2=E2=E1=F4=EF, 8 =C9=EF=F5=ED=DF=EF=F5 2013 10:01:57 =EC.=EC= . UTC+3, =EF =F7=F1=DE=F3=F4=E7=F2 Steven D'Aprano =DD=E3=F1=E1=F8=E5: > ASCII actually needs 7 bits to store a character. Since computers are =20 > optimized to work with bytes, not bits, normally ASCII characters are > stored in a single byte, with one bit wasted. So ASCII and Unicode are 2 Encoding Systems currently in use. How should i imagine them, visualize them? Like tables 'A' =3D 65, 'B' =3D 66 and so on? But if i do then that would be the visualization of a 'charset' not of an e= ncoding system. What the diffrence of an encoding system and of a charset? ebcdic - ascii - unicode =3D al of them are encoding systems greek-iso - latin-iso - utf8 - utf16 =3D all of them are charsets. What are these differences? i cant imagine them all, i can only imagine cha= rsets not encodign systems. Why python interprets by default all given strings as unicode and not ascii= ? because the former supports many positions while ascii only 127 positions= , hence can interpet only 127 different characters?=20 > "Narrow" Unicode uses two bytes per character. Since two bytes is only=20 > enough for about 65,000 characters, not 1,000,000+, the rest of the=20 > characters are stored as pairs of two-byte "surrogates". surrogates literal means a replacemnt? > Latin-1 is similar, except there are 256 positions. Greek ISO-8859-7 is= =20 > also similar, also 256 positions, but the characters are different. And= =20 > so on, with dozens of charsets.=20 Latin has to display english chars(capital, small) + numbers + symbols. tha= t would be 127 why 256? greek =3D all of the above plus greek chars, no? > And then there is Unicode, which includes *every* character is all of=20 > those dozens of charsets. It has 1114111 positions (most are currently = =20 > unfilled). Shouldt the positions that Unicode has to use equal to the summary of all a= vailable characters of all the languages of the worlds plus numbers and spe= cial chars? why 1.000.000+ why the need for so many positions? Narrow Unico= de format (2 byted) can cover all ofmthe worlds symbols. > An encoding is simply a program that takes a character and returns a=20 > byte, or visa versa. For instance, the ASCII encoding will take character= =20 > 'A'. That is found at position 65, which is 0x41 in hexadecimal, so the= =20 > ASCII encoding turns character 'A' into byte 0x41, and visa versa. Why you say ASCII turn a character into HEX format and not as in binary for= mat? Isnt the latter the way bytes are stored into hdd, like 010101111010101 etc= ? Are they stored as hex instead or you just said so to avoid printing 0s and= 1s?