Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'syntax': 0.04; 'argument': 0.05; 'encoding': 0.05; 'encoded': 0.07; 'utf-8': 0.07; 'string': 0.09; "'a'": 0.09; 'ascii': 0.09; 'encode': 0.09; 'variant': 0.09; 'python': 0.11; '127': 0.16; '3.3,': 0.16; 'argument.': 0.16; 'codec': 0.16; 'encoding.': 0.16; 'ignoring': 0.16; 'ordinal': 0.16; 'unexpected': 0.16; 'wrote:': 0.18; 'else,': 0.19; "python's": 0.19; 'to:name:python-list@python.org': 0.22; 'error': 0.23; 'byte': 0.24; 'bytes': 0.24; 'skip:" 20': 0.27; 'header:In- Reply-To:1': 0.27; 'am,': 0.29; 'character': 0.29; "doesn't": 0.30; 'specified': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'produces': 0.31; 'subject:from': 0.34; 'subject: (': 0.35; "can't": 0.35; 'received:209.85': 0.35; 'skip:s 30': 0.35; 'usual': 0.35; 'but': 0.35; 'received:google.com': 0.35; '(we': 0.36; 'turn': 0.37; 'received:209': 0.37; 'same.': 0.38; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'anything': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'no.': 0.61; "you're": 0.61; 'back': 0.62; 'name': 0.63; '8bit%:92': 0.71; '8bit%:100': 0.72; 'different.': 0.84; '2013': 0.98 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-gm-message-state; bh=OBWyNSyQx89O63ryL7BIn1uXEwsezU4wLfCpyjCCn7s=; b=JEYwXCfwFdvFA1O/rDDcA8NbXjym6O6D2THfgIEuKXcwE/7ThV22dMuRNVk+7rODo1 I6Zri3M0bzqJX9tT1U/cZeVWcH7xZpO9/66T/a8R9dzVLFue1AYa5W59AAFMNajeo7qT g/ph9576y19atU/ytLrRydeHllO1FwWOTvekKztkyiRHHSWQSGMNoU1su3MWMI6f6u9S I12XM0Gj8mAwr0pkSxmDI3NOMsyxNuIIK4btQ8ZC+RWl4NZXOTGR0+C8FwrsaZHJogs1 speoP3oQ3br7VPKlaV+BdD5aG7Lmf+reYv2UpjxObaxTELn8RfiukXHuKA6I9giFWl1X 0u8A== X-Received: by 10.60.33.202 with SMTP id t10mr5679991oei.2.1370808349971; Sun, 09 Jun 2013 13:05:49 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.33.202 with SMTP id t10mr5679985oei.2.1370808349881; Sun, 09 Jun 2013 13:05:49 -0700 (PDT) In-Reply-To: <7e01dc4a-ffc0-43ce-8d6b-8bc069a63f19@googlegroups.com> References: <7d8da6c9-fb92-4329-b207-4280f29ba664@googlegroups.com> <20130608024931.GA77888@cskk.homeip.net> <51B37173.9060601@gmail.com> <3fbb5d0e-51fb-4aed-b829-8388304a9885@googlegroups.com> <51b4249d$0$30001$c3e8da3$5496439d@news.astraweb.com> <9a0ea98b-f37b-48da-9933-e2caf6fdfdff@googlegroups.com> <7e01dc4a-ffc0-43ce-8d6b-8bc069a63f19@googlegroups.com> Date: Sun, 9 Jun 2013 13:05:49 -0700 Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) From: Benjamin Kaplan To: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQn8QIOvV3IoIjW6gebMU4N3eQqd+TaaNoQ23Nk5zw6mz4c3rAOxjQlUFkKBRNCV+Sv1hGps8qVbLPl477PokOtKCbP2YqGTNX7XIkASyarAjZTFs2fnD34RmAQEAPeN6xKRVx/tR+MAN5vH55DvuAzK6LsNfA== X-Junkmail-Whitelist: YES (by domain whitelist at mpv2.tis.cwru.edu) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 44 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370808420 news.xs4all.nl 15931 [2001:888:2000:d::a6]:38482 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:47486 On Sun, Jun 9, 2013 at 2:38 AM, =CE=9D=CE=B9=CE=BA=CF=8C=CE=BB=CE=B1=CE=BF= =CF=82 =CE=9A=CE=BF=CF=8D=CF=81=CE=B1=CF=82 wrote: > =CE=A4=CE=B7 =CE=9A=CF=85=CF=81=CE=B9=CE=B1=CE=BA=CE=AE, 9 =CE=99=CE=BF= =CF=85=CE=BD=CE=AF=CE=BF=CF=85 2013 12:20:58 =CE=BC.=CE=BC. UTC+3, =CE=BF = =CF=87=CF=81=CE=AE=CF=83=CF=84=CE=B7=CF=82 Lele Gaifax =CE=AD=CE=B3=CF=81= =CE=B1=CF=88=CE=B5: > >> > How about a string i wonder? >> > s =3D "=CE=BD=CE=AF=CE=BA=CE=BF=CF=82" >> > what_are these_bytes =3D s.encode('iso-8869-7').encode(utf-8') > >> Ignoring the usual syntax error, this is just a variant of the code I >> posted: "s.encode('iso-8869-7')" produces a bytes instance which >> *cannot* be "re-encoded" again in whatever encoding. > > s =3D 'a' > s =3D s.encode('iso-8859-7').decode('utf-8') > print( s ) > > a (we got the original character back) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > s =3D '=CE=B1' > s =3D s.encode('iso-8859-7').decode('utf-8') > print( s ) > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: u= nexpected end of data > > Why this error? because 'a' ordinal value > 127 ? > -- No. You get that error because the string is not encoded in UTF-8. It's encoded in ISO-8859-7. For ASCII strings (ord(x) < 127), ISO-8859-7 and UTF-8 look exactly the same. For anything else, they are different. If you were to try to decode it as ISO-8859-1, it would succeed, but you would get the character "=C3=A1" back instead of =CE=B1. You're misunderstanding the decode function. Decode doesn't turn it into a string with the specified encoding. It takes it *from* the string with the specified encoding and turns it into Python's internal string representation. In Python 3.3, that encoding doesn't even have a name because it's not a standard encoding. So you want the decode argument to match the encode argument.