Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'charset:iso-8859-7': 0.04; 'encoding': 0.05; 'string.': 0.05; 'encode': 0.09; 'filenames': 0.09; 'indicates': 0.09; 'python': 0.11; 'codec': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'skip:> 20': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'wed,': 0.18; 'bytes': 0.24; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; 'that.': 0.31; 'subject:from': 0.34; 'subject: (': 0.35; "can't": 0.35; 'received:google.com': 0.35; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'different': 0.65; 'invalid': 0.68; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=wcCocP33+VH4o16XvtEWDX9NlO1rI3OyEHp6kNfhT6k=; b=rTGOHMe7iAdFAcJZ2C1BJAZhxTICSH1WgdFvi2pdQMQWGydoejNOvlOEwH2O7u33u+ 4JuPbIc41oWNH34WQU9hJOocN4bZcQp+pDh0zgeYnEstl/0xTTWulONrrnoK9N0mWFDF 9BgjpWATHpbCO/nFaFfIFF4kkHMU326k0b7UqG0/etNwEQwz/lyzrcuODkUTWh92kvZQ bhEPkqSVgdPiOUSQdYXa13avob7qUdXU/210HAkkUr5YV2dOyJXOs+czcq3UZhEVXjLT +iZ1AVx9tQl2fjxiZRZO+FZE6zZdAeWLRUH4Mrw8pKeKMKJhrf88JW4g/vLjzPASL4J1 b+9w== MIME-Version: 1.0 X-Received: by 10.58.100.234 with SMTP id fb10mr3846010veb.5.1370382437996; Tue, 04 Jun 2013 14:47:17 -0700 (PDT) In-Reply-To: <0a05fe41-12a9-47e5-a4e9-170140eb3eea@googlegroups.com> References: <2c425f2b-99de-4453-964e-c585f2043f71@googlegroups.com> <18755849-35bc-4925-811a-8f6f9fb5bf9c@googlegroups.com> <8c16324f-da12-44ff-bf2f-4ae56f9127c0@googlegroups.com> <51ac3bd6$0$11118$c3e8da3@news.astraweb.com> <51ad1cdd$0$11118$c3e8da3@news.astraweb.com> <306a22ea-fbf7-4097-af31-121a999957d6@googlegroups.com> <9c482ba0-23ac-4e66-a0e1-a18be9fd82d8@googlegroup> <06a19483-65df-4fcd-9430-b45a01c9dbab@googlegroups.com> <0c215f6d-c1eb-4dbf-b132-80e83ece0992@googlegroups.com> <4c271468-22f4-4c93-af14-02b978b2e6bd@googlegroups.com> <0a05fe41-12a9-47e5-a4e9-170140eb3eea@googlegroups.com> Date: Wed, 5 Jun 2013 07:47:17 +1000 Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-7 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 15 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370382939 news.xs4all.nl 16007 [2001:888:2000:d::a6]:38832 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:46978 On Wed, Jun 5, 2013 at 6:03 AM, =CD=E9=EA=FC=EB=E1=EF=F2 =CA=EF=FD=F1=E1=F2= wrote: >>UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc5' in posi= tion >61: surrogates not allowed > > This indicates that i'am reading the filenames in a different encoding th= an what they actually are? What is i try to use bytes for path specificatio= ns, and have Python decode them in 'utf-8' ? > > fullpaths.add( os.path.join(root, fullpath).encode('utf-8') ) For some reason you have an invalid Unicode codepoint in your string. Fix t= hat. ChrisA