Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'syntax': 0.04; 'insert': 0.05; 'permission.': 0.07; 'rename': 0.07; 'utf-8': 0.07; 'arguments': 0.09; 'data:': 0.09; 'encode': 0.09; 'filename': 0.09; 'filenames': 0.09; 'filenames:': 0.09; 'get.': 0.09; 'host,': 0.09; 'iterate': 0.09; 'try:': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.11; 'assume': 0.14; '%s,': 0.16; '&&': 0.16; '(filename,': 0.16; '10px;': 0.16; '30px;': 0.16; 'ater': 0.16; 'be:': 0.16; 'codec': 0.16; 'colons': 0.16; 'comma': 0.16; 'comma,': 0.16; 'commas,': 0.16; 'compute': 0.16; 'docs.': 0.16; 'file;': 0.16; 'flavors': 0.16; 'justify;': 0.16; 'means.': 0.16; 'raised.': 0.16; 'renaming': 0.16; 'rgb(255,': 0.16; 'seperated.': 0.16; 'skip:" 80': 0.16; 'spurious': 0.16; 'surrogate': 0.16; 'unix,': 0.16; 'variables,': 0.16; 'weblog': 0.16; 'apps': 0.16; 'wrote:': 0.18; 'all,': 0.19; 'file,': 0.19; "skip:' 30": 0.19; 'skip:f 30': 0.19; 'successful,': 0.19; 'cc:addr:python.org': 0.22; 'header:User-Agent:1': 0.23; 'error': 0.23; 'bytes': 0.24; 'either.': 0.24; 'unicode': 0.24; 'file.': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'source': 0.25; 'query': 0.26; 'skip:" 40': 0.26; 'code:': 0.26; 'primary': 0.26; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'michael': 0.29; '(this': 0.29; '0);': 0.29; 'character': 0.29; 'unix': 0.29; "doesn't": 0.30; "skip:' 10": 0.31; '255,': 0.31; 'raised': 0.31; 'skip:7 10': 0.31; 'skip:q 20': 0.31; 'unique,': 0.31; 'file': 0.32; 'url:python': 0.33; 'announce': 0.33; 'rgb(0,': 0.33; 'skip:b 30': 0.33; 'skip:d 20': 0.34; 'subject:from': 0.34; 'subject: (': 0.35; "can't": 0.35; 'except': 0.35; 'objects': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'right?': 0.36; 'up!': 0.36; "didn't": 0.36; 'url:org': 0.36; 'should': 0.36; 'skip:- 20': 0.37; 'auto;': 0.38; 'implement': 0.38; 'message-id:@gmail.com': 0.38; 'skip:& 10': 0.38; 'url:library': 0.38; 'files': 0.38; 'delete': 0.39; 'skip:p 20': 0.39; 'even': 0.60; 'skip:u 10': 0.60; 'skip:2 20': 0.60; 'matter': 0.61; 'from:charset:utf-8': 0.61; 'first': 0.61; 'different': 0.65; 'to:addr:gmail.com': 0.65; 'here': 0.66; 'dont': 0.67; 'safe': 0.72; '8bit%:27': 0.74; 'sans-serif;': 0.78; 'url:wordpress': 0.78; '88,': 0.84; '8bit%:16': 0.84; 'atomic': 0.84; 'correcting': 0.84; 'greek': 0.84; 'idiotic': 0.84; 'silently': 0.84; '8bit%:18': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=pXtXXQeOUk3E5Kydez7aMPfTBVnwuG/age3sGcu5bFs=; b=QaN6+npP2h4kIGbQ96evpD++A0H3LV40uKrmMhv1x4iXcONbQLvkfGYP3dEe+aVAS9 mdrxb3De2hrhQU3CkehdU1ozvqzB1yDlazMRdo72mvAX56OzSanICXTiAW0TvJUsigFi b9MCSWIJn79Bd5CA953LUkCmmpF2tpoUh5c1oBw0nwgS+JchxUNsy1lqRqZAf4XZkGkg 3pyZ2jaAG86ThJ8sk8itskQv1SKCFzJGaE+5BcbB+uLTgUlWw7Bu2P85bi+jZQtYXBZs hqK27tEfhevktDTPR2GSB1O5F4dYmLqSjFoFxKU5D+UDKPqRfuF99M+nSeR2Ud7yAYz7 820g== X-Received: by 10.204.239.9 with SMTP id ku9mr12169198bkb.51.1370592652569; Fri, 07 Jun 2013 01:10:52 -0700 (PDT) Date: Fri, 07 Jun 2013 11:10:47 +0300 From: =?UTF-8?B?zp3Ouc66z4zOu86xzr/PgiDOms6/z43Pgc6xz4I=?= User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Thunderbird/22.0 MIME-Version: 1.0 To: Michael Weylandt Subject: Re: Changing filenames from Greeklish => Greek (subprocess complain) References: <2c425f2b-99de-4453-964e-c585f2043f71@googlegroups.com> <51aed313$0$11118$c3e8da3@news.astraweb.com> <4c19b71d-4de5-41ad-b6ae-fb133a6c331e@googlegroups.com> <2be143c4-77c6-4c84-ba1c-46b02bd503ff@googlegroups.com> <1465c96b-c33e-4d5b-894e-b184c031a185@googlegroups.com> <808b7897-8ed4-4e0e-8976-4a22ae7f24cd@googlegroups.com> <332812d7-71b3-4bbb-a846-09827a6df65d@googlegroups.com> <152c78f5-b777-44e2-a83d-a23ecf2f84a3@googlegroups.com> <51b13693$0$29966$c3e8da3$5496439d@news.astraweb.com> <3c1e7a3f-5e41-4ab8-bced-755a9ad6327d@googlegroups.com> <9204ec80-a272-4733-aabe-1e319a0c7add@googlegroups.com> <5FFE659B-8271-4AF7-9116-96B763972F95@gmail.com> In-Reply-To: <5FFE659B-8271-4AF7-9116-96B763972F95@gmail.com> Content-Type: multipart/alternative; boundary="------------020303050908070206070400" Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 293 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370592660 news.xs4all.nl 15988 [2001:888:2000:d::a6]:33622 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:47318 This is a multi-part message in MIME format. --------------020303050908070206070400 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 7/6/2013 10:42 πμ, Michael Weylandt wrote: > > os.rename( filepath_bytes filepath.encode('utf-8') > Missing comma, which is, after all, just a matter of syntax so it can't matter, right? > I doubted that os.rename arguments must be comma seperated. But ater reading the docs. s.rename(/src/,/dst/) Rename the file or directory/src/to/dst/. If/dst/is a directory,OSError will be raised. On Unix, if/dst/exists and is a file, it will be replaced silently if the user has permission. The operation may fail on some Unix flavors if/src/and/dst/are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). On Windows, if/dst/already exists,OSError will be raised even if it is a file; there may be no way to implement an atomic rename when/dst/names an existing file. Availability: Unix, Windows. Indeed it has to be: os.rename( filepath_bytes, filepath.encode('utf-8') 'mv source target' didn't require commas so i though it was safe to assume that os.rename did not either. I'am happy to announce that after correcting many idiotic error like commas, missing colons and declaring of variables, this surrogate erro si the last i get. I still dont understand what surrogate means. In english means replacement. Here is the code: #======================================================== # Collect filenames of the path dir as bytes filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' ) # Iterate over all filenames in the path dir for filename in filename_bytes: # Compute 'path/to/filename' in bytes filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename' try: filepath = filepath_bytes.decode('utf-8') except UnicodeDecodeError: try: filepath = filepath_bytes.decode('iso-8859-7') # Rename current filename from greek bytes => utf-8 bytes os.rename( filepath_bytes, filepath.encode('utf-8') ) except UnicodeDecodeError: print( '''I give up! This filename is unreadable! ''') #======================================================== # Get filenames of the apps directory as unicode filenames = os.listdir( '/home/nikos/public_html/data/apps/' ) # Load'em for filename in filenames: try: # Check the presence of a file against the database and insert if it doesn't exist cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) ) data = cur.fetchone() #filename is unique, so should only be one if not data: # First time for file; primary key is automatic, hit is defaulted cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) ) except pymysql.ProgrammingError as e: print( repr(e) ) #======================================================== filenames = os.listdir( '/home/nikos/public_html/data/apps/' ) filenames = () # Build a set of 'path/to/filename' based on the objects of path dir for filename in filenames: filenames.add( filename ) # Delete spurious cur.execute('''SELECT url FROM files''') data = cur.fetchall() # Check database's filenames against path's filenames for filename in data: if filename not in filenames: cur.execute('''DELETE FROM files WHERE url = %s''', (filename,) ) ================================= [Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] File "/home/nikos/public_html/cgi-bin/files.py", line 88, in [Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] cur.execute('''SELECT url FROM files WHERE url = %s''', filename ) [Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute [Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] query = query.encode(charset) [Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcce' in position 35: surrogates not allowed -- Webhost && Weblog --------------020303050908070206070400 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
On 7/6/2013 10:42 πμ, Michael Weylandt wrote:

os.rename( filepath_bytes filepath.encode('utf-8') 
Missing comma, which is, after all, just a matter of syntax so it can't matter, right?

I doubted that os.rename arguments must be comma seperated.
But ater reading the docs.

s.rename(src, dst)

Rename the file or directory src to dst. If dst is a directory, OSError will be raised. On Unix, if dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail on some Unix flavors if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). On Windows, if dst already exists, OSError will be raised even if it is a file; there may be no way to implement an atomic rename when dst names an existing file.

Availability: Unix, Windows.

Indeed it has to be:
os.rename( filepath_bytes, filepath.encode('utf-8') 

'mv source target' didn't require commas so i though it was safe to assume that os.rename did not either.


I'am happy to announce that after correcting many idiotic error like commas, missing colons and declaring of variables, this surrogate erro si the last i get.
I still dont understand what surrogate means. In english means replacement.
Here is the code:


#========================================================
# Collect filenames of the path dir as bytes
filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )

# Iterate over all filenames in the path dir
for filename in filename_bytes:
	# Compute 'path/to/filename' in bytes
	filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
	try:
		filepath = filepath_bytes.decode('utf-8')
	except UnicodeDecodeError:
		try:
			filepath = filepath_bytes.decode('iso-8859-7')
			
			# Rename current filename from greek bytes => utf-8 bytes
			os.rename( filepath_bytes, filepath.encode('utf-8') )
		except UnicodeDecodeError:
			print( '''I give up! This filename is unreadable! ''')


#========================================================
# Get filenames of the apps directory as unicode
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
		data = cur.fetchone()        #filename is unique, so should only be one
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted 
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filenames = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
	filenames.add( filename )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for filename in data:
	if filename not in filenames:
		cur.execute('''DELETE FROM files WHERE url = %s''', (filename,) )



=================================

[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 88, in <module>
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]     cur.execute('''SELECT url FROM files WHERE url = %s''', filename )
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]   File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]     query = query.encode(charset)
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcce' in position 35: surrogates not allowed



--------------020303050908070206070400--