Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder5.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'encoding': 0.05; 'output': 0.05; 'rename': 0.07; 'utf-8': 0.07; 'string': 0.09; 'ascii': 0.09; 'encode': 0.09; 'filename': 0.09; 'try:': 0.09; 'runs': 0.10; 'cc:addr:python-list': 0.11; 'question.': 0.14; "%r'": 0.16; '(default': 0.16; 'codec': 0.16; 'compute': 0.16; 'files:': 0.16; 'from:addr:cs': 0.16; 'from:addr:zip.com.au': 0.16; 'from:name:cameron simpson': 0.16; 'message-id:@cskk.homeip.net': 0.16; 'ordinal': 0.16; 'received:211.29': 0.16; 'received:211.29.132': 0.16; 'received:optusnet.com.au': 0.16; 'received:syd.optusnet.com.au': 0.16; 'resist': 0.16; 'simpson': 0.16; 'to:addr:pearwood.info': 0.16; 'to:addr:steve+comp.lang.python': 0.16; "to:name:steven d'aprano": 0.16; 'tty': 0.16; 'writen': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'cc:addr:python.org': 0.22; 'print': 0.22; 'header :User-Agent:1': 0.23; 'error': 0.23; 'exists': 0.24; 'rid': 0.24; 'unicode': 0.24; 'guys': 0.24; 'cheers,': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'skip:" 40': 0.26; 'header:In-Reply- To:1': 0.27; 'generally': 0.29; 'raise': 0.29; 'characters': 0.30; "skip:' 10": 0.31; '-0700,': 0.31; 'assert': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'file': 0.32; '(most': 0.33; 'skip:b 30': 0.33; 'subject:from': 0.34; 'subject: (': 0.35; "can't": 0.35; 'except': 0.35; 'something': 0.35; 'test': 0.35; "he's": 0.36; 'received:com.au': 0.36; 'charset:us-ascii': 0.36; 'received:211': 0.38; 'files': 0.38; 'recent': 0.39; 'explain': 0.39; 'skip:u 10': 0.60; 'break': 0.61; 'you.': 0.62; 'content-disposition:inline': 0.62; 'kind': 0.63; 'reached': 0.63; 'skip:n 10': 0.64; 'greek': 0.84; 'mae': 0.84; 'noise': 0.84; 'lucky': 0.93; '2013': 0.98 Date: Sun, 9 Jun 2013 19:16:06 +1000 From: Cameron Simpson To: "Steven D'Aprano" Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51b4398a$0$30001$c3e8da3$5496439d@news.astraweb.com> User-Agent: Mutt/1.5.21 (2010-09-15) References: <51b4398a$0$30001$c3e8da3$5496439d@news.astraweb.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=K8x6hFqI c=1 sm=1 a=wom5GMh1gUkA:10 a=AdgiQdVXbpoA:10 a=kj9zAlcOel0A:10 a=vrnE16BAAAAA:8 a=ZtCCktOnAAAA:8 a=uw23S90zXSUA:10 a=kZ7UWmmPAAAA:8 a=hjwdmcXS21r_oiWgoFsA:9 a=CjuIK1q_8ugA:10 a=pyH5b1fOeEsA:10 a=ChdAjXE5lkUvdteQbhpnkQ==:117 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 51 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370769399 news.xs4all.nl 15952 [2001:888:2000:d::a6]:46678 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:47439 On 09Jun2013 08:15, Steven D'Aprano wrote: | On Sun, 09 Jun 2013 00:00:53 -0700, nagia.retsina wrote: | > path = b'/home/nikos/public_html/data/apps/' | > files = os.listdir( path ) | > | > for filename in files: | > # Compute 'path/to/filename' | > filepath_bytes = path + filename | > for encoding in ('utf-8', 'iso-8859-7', 'latin-1'): | > try: | > filepath = filepath_bytes.decode( encoding ) | > except UnicodeDecodeError: | > continue | > | > # Rename to something valid in UTF-8 | > if encoding != 'utf-8': | > os.rename( filepath_bytes, | > filepath.encode('utf-8') ) | > assert os.path.exists( filepath ) | > break | > else: | > # This only runs if we never reached the break | > raise ValueError( | > 'unable to clean filename %r' % filepath_bytes ) | | Editing the traceback to get rid of unnecessary noise from the logging: | | Traceback (most recent call last): | File "/home/nikos/public_html/cgi-bin/files.py", line 83, in | assert os.path.exists( filepath ) | File "/usr/local/lib/python3.3/genericpath.py", line 18, in exists | os.stat(path) | UnicodeEncodeError: 'ascii' codec can't encode characters in position | 34-37: ordinal not in range(128) | | > Why am i still receing unicode decore errors? With the help of you guys | > we have writen a prodecure just to avoid this kind of decoding issues | > and rename all greek_byted_filenames to utf-8_byted. | | That's a very good question. It works for me when I test it, so I cannot | explain why it fails for you. If he's lucky the UnicodeEncodeError occurred while trying to print an error message, printing a greek Unicode string in the error with ASCII as the output encoding (default when not a tty IIRC). Cheers, -- Cameron Simpson I generally avoid temptation unless I can't resist it. - Mae West