X-Received: by 10.224.215.68 with SMTP id hd4mr5602159qab.5.1370768928842; Sun, 09 Jun 2013 02:08:48 -0700 (PDT) X-Received: by 10.49.104.8 with SMTP id ga8mr238038qeb.18.1370768928827; Sun, 09 Jun 2013 02:08:48 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!p1no3811797qaj.0!news-out.google.com!y6ni1323qax.0!nntp.google.com!ch1no3267758qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Sun, 9 Jun 2013 02:08:48 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=79.103.41.173; posting-account=DYJQ-woAAACEPH85Au2BhUVfFTfSfVa4 NNTP-Posting-Host: 79.103.41.173 References: <7d8da6c9-fb92-4329-b207-4280f29ba664@googlegroups.com> <20130608024931.GA77888@cskk.homeip.net> <51B37173.9060601@gmail.com> <3fbb5d0e-51fb-4aed-b829-8388304a9885@googlegroups.com> <51b4249d$0$30001$c3e8da3$5496439d@news.astraweb.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <9a0ea98b-f37b-48da-9933-e2caf6fdfdff@googlegroups.com> Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) From: =?ISO-8859-7?B?zenq/Ovh7/Igyu/98eHy?= Injection-Date: Sun, 09 Jun 2013 09:08:48 +0000 Content-Type: text/plain; charset=ISO-8859-7 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:47436 =D4=E7 =CA=F5=F1=E9=E1=EA=DE, 9 =C9=EF=F5=ED=DF=EF=F5 2013 11:55:43 =F0.=EC= . UTC+3, =EF =F7=F1=DE=F3=F4=E7=F2 Lele Gaifax =DD=E3=F1=E1=F8=E5: > Steven D'Aprano writes: >=20 >=20 >=20 > > On Sat, 08 Jun 2013 22:09:57 -0700, nagia.retsina wrote: >=20 > > >=20 > >> chr('A') would give me the mapping of this char, the number 65 while >=20 > >> ord(65) would output the char 'A' likewise. >=20 > > >=20 > > Correct. Python uses Unicode, where code-point 65 ("ordinal value 65")= =20 >=20 > > means letter "A". >=20 >=20 >=20 > Actually, that's the other way around: >=20 >=20 >=20 > >>> chr(65) >=20 > 'A' >=20 > >>> ord('A') >=20 > 65 >=20 >=20 >=20 > >> What would happen if we we try to re-encode bytes on the disk? like >=20 > >> trying: >=20 > >>=20 >=20 > >> s =3D "=ED=DF=EA=EF=F2" >=20 > >> utf8_bytes =3D s.encode('utf-8') >=20 > >> greek_bytes =3D utf_bytes.encode('iso-8869-7') >=20 > >>=20 >=20 > >> Can we re-encode twice or as many times we want and then decode back >=20 > >> respectively lke? >=20 > > >=20 > > Of course. Bytes have no memory of where they came from, or what they a= re=20 >=20 > > used for. All you are doing is flipping bits on a memory chip, or on a= =20 >=20 > > hard drive. So long as *you* remember which encoding is the right one,= =20 >=20 > > there is no problem. If you forget, and start using the wrong one, you= =20 >=20 > > will get garbage characters, mojibake, or errors. >=20 >=20 >=20 > Uhm, no: "encode" transforms a Unicode string into an array of bytes, >=20 > "decode" does the opposite transformation. You cannot do the former on >=20 > an "arbitrary" array of bytes: >=20 >=20 >=20 > >>> s =3D "=ED=DF=EA=EF=F2" >=20 > >>> utf8_bytes =3D s.encode('utf-8') >=20 > >>> greek_bytes =3D utf8_bytes.encode('iso-8869-7') >=20 > Traceback (most recent call last): >=20 > File "", line 1, in >=20 > AttributeError: 'bytes' object has no attribute 'encode' So something encoded into bytes cannot be re-encoded to some other bytes. How about a string i wonder? s =3D "=ED=DF=EA=EF=F2" what_are these_bytes =3D s.encode('iso-8869-7').encode(utf-8')