Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'encoding': 0.05; 'bytes,': 0.09; 'bytes.': 0.09; 'data:': 0.09; 'http': 0.09; 'root,': 0.09; 'brackets': 0.16; 'compute': 0.16; 'expected,': 0.16; 'files:': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'set()': 0.16; 'skip:^ 20': 0.16; 'tried:': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'print': 0.22; 'header:User-Agent:1': 0.23; 'bytes': 0.24; 'unicode': 0.24; 'header': 0.24; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'am,': 0.29; 'converting': 0.30; 'skip:( 40': 0.30; '(which': 0.31; 'there.': 0.32; 'worked': 0.33; 'subject:from': 0.34; 'subject: (': 0.35; 'created': 0.35; 'convert': 0.35; 'no,': 0.35; 'there': 0.35; 'scheme': 0.36; 'message-id:@gmail.com': 0.38; 'to:addr:python- list': 0.38; 'files': 0.38; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'even': 0.60; 'letters': 0.60; 'browser': 0.61; 'back': 0.62; 'invalid': 0.68; '8bit%:100': 0.72; 'greek': 0.84; 'here)': 0.84 X-Virus-Scanned: amavisd-new at torriefamily.org Date: Tue, 04 Jun 2013 09:07:19 -0600 From: Michael Torrie User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) References: <2c425f2b-99de-4453-964e-c585f2043f71@googlegroups.com> <18755849-35bc-4925-811a-8f6f9fb5bf9c@googlegroups.com> <8c16324f-da12-44ff-bf2f-4ae56f9127c0@googlegroups.com> <51ac3bd6$0$11118$c3e8da3@news.astraweb.com> <51ad1cdd$0$11118$c3e8da3@news.astraweb.com> <306a22ea-fbf7-4097-af31-121a999957d6@googlegroups.com> <9c482ba0-23ac-4e66-a0e1-a18be9fd82d8@googlegroup s.com> In-Reply-To: <9c482ba0-23ac-4e66-a0e1-a18be9fd82d8@googlegroups.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 34 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370358447 news.xs4all.nl 15927 [2001:888:2000:d::a6]:57853 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:46914 On 06/04/2013 08:18 AM, Νικόλαος Κούρας wrote: > No, brackets are all there. Just tried: > > # Compute a set of current fullpaths > fullpaths = set() > path = "/home/nikos/www/data/apps/" > > for root, dirs, files in os.walk(path): > for fullpath in files: > fullpaths.add( os.path.join(root, fullpath) ) > print (fullpath ) > print (fullpath.encode('iso-8859-7').decode('latin-1') ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is wrong. You are converting unicode to iso-8859-7 bytes, then trying to convert those bytes back to unicode by pretending they are latin-1 bytes. Even if this worked it will generate garbage. > What are these 'surrogate' things? It means that when you tried to decode greek bytes using latin-1, there were some invalid unicode letters created (which is expected, since the bytes are not latin-1, they are iso-8859-7!). If you want the browser to use a particular encoding scheme (utf-8), then you have to print out an HTTP header before you start printing your other HTML data: print("Content-Type: text/html;charset=UTF-8\r\n") print("\r\n) print("html data goes here)