X-Received: by 10.224.205.138 with SMTP id fq10mr22061662qab.1.1370552371794; Thu, 06 Jun 2013 13:59:31 -0700 (PDT) X-Received: by 10.49.104.50 with SMTP id gb18mr2430432qeb.26.1370552371775; Thu, 06 Jun 2013 13:59:31 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!p1no3347424qaj.0!news-out.google.com!y6ni1123qax.0!nntp.google.com!ch1no2789452qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Thu, 6 Jun 2013 13:59:31 -0700 (PDT) In-Reply-To: <05418431-0881-4271-9683-367070b99ab5@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=79.103.41.173; posting-account=DYJQ-woAAACEPH85Au2BhUVfFTfSfVa4 NNTP-Posting-Host: 79.103.41.173 References: <2c425f2b-99de-4453-964e-c585f2043f71@googlegroups.com> <306a22ea-fbf7-4097-af31-121a999957d6@googlegroups.com> <9c482ba0-23ac-4e66-a0e1-a18be9fd82d8@googlegroup> <06a19483-65df-4fcd-9430-b45a01c9dbab@googlegroups.com> <51aed313$0$11118$c3e8da3@news.astraweb.com> <4c19b71d-4de5-41ad-b6ae-fb133a6c331e@googlegroups.com> <2be143c4-77c6-4c84-ba1c-46b02bd503ff@googlegroups.com> <1465c96b-c33e-4d5b-894e-b184c031a185@googlegroups.com> <808b7897-8ed4-4e0e-8976-4a22ae7f24cd@googlegroups.com> <332812d7-71b3-4bbb-a846-09827a6df65d@googlegroups.com> <152c78f5-b777-44e2-a83d-a23ecf2f84a3@googlegroups.com> <05418431-0881-4271-9683-367070b99ab5@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) From: =?ISO-8859-7?B?zenq/Ovh7/Igyu/98eHy?= Injection-Date: Thu, 06 Jun 2013 20:59:31 +0000 Content-Type: text/plain; charset=ISO-8859-1 Xref: csiph.com comp.lang.python:47274 I'm very sorry for continuous pastes. Didnt include the whole thing before. Here it is: #======================================================== # Get filenames of the path dir as bytestrings path = os.listdir( b'/home/nikos/public_html/data/apps/' ) # iterate over all filenames in the apps directory for filename in path: # Grabbing just the filename from path try: # Is this name encoded in utf-8? filename.decode('utf-8') except UnicodeDecodeError: # Decoding from UTF-8 failed, which means that the name is not valid utf-8 # It appears that this filename is encoded in greek-iso, so decode from that and re-encode to utf-8 new_filename = filename.decode('iso-8859-7').encode('utf-8') # rename filename form greek bytestreams --> utf-8 bytestreams old_path = b'/home/nikos/public_html/data/apps/' + b'filename') new_path = b'/home/nikos/public_html/data/apps/' + b'new_filename') os.rename( old_path, new_path ) #======================================================== # Get filenames of the apps directory as unicode path = os.listdir( '/home/nikos/public_html/data/apps/' ) # Load'em for filename in path: try: # Check the presence of a file against the database and insert if it doesn't exist cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) ) data = cur.fetchone() #filename is unique, so should only be one if not data: # First time for file; primary key is automatic, hit is defaulted cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) ) except pymysql.ProgrammingError as e: print( repr(e) ) #======================================================== path = os.listdir( '/home/nikos/public_html/data/apps/' ) filenames = () # Build a set of 'path/to/filename' based on the objects of path dir for filename in path filenames.add( filename ) # Delete spurious cur.execute('''SELECT url FROM files''') data = cur.fetchall() # Check database's filenames against path's filenames for filename in data: if filename not in filenames cur.execute('''DELETE FROM files WHERE url = %s''', (filename,) ) ===================================== Just the bytestream error and then i belive its ready this time.