Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #47751

Turnign greek-iso filenames => utf-8 iso

From Νικόλαος Κούρας <support@superhost.gr>
Newsgroups comp.lang.python
Subject Turnign greek-iso filenames => utf-8 iso
Date 2013-06-12 08:02 +0000
Organization National Technical University of Athens, Greece
Message-ID <kp99ug$v4c$1@news.ntua.gr> (permalink)

Show all headers | View raw


#========================================================
# Collect directory and its filenames as bytes
path = b'/home/nikos/public_html/data/apps/'
files = os.listdir( path )

for filename in files:
	# Compute 'path/to/filename'
	filepath_bytes = path + filename
	
	for encoding in ('utf-8', 'iso-8859-7', 'latin-1'):
		try: 
			filepath = filepath_bytes.decode( encoding )
		except UnicodeDecodeError:
			continue
        
		# Rename to something valid in UTF-8 
		if encoding != 'utf-8': 
			os.rename( filepath_bytes, filepath.encode('utf-8') 
)

		assert os.path.exists( filepath.encode('utf-8') )
		break 
	else: 
		# This only runs if we never reached the break
		raise ValueError( 'unable to clean filename %r' % 
filepath_bytes ) 


# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Build a set of 'path/to/filename' based on the objects of path dir
filepaths = set()
for filename in filenames:
	filepaths.add( filename )

==================
# Load'em
for filename in filenames:
	try:
		# Check the presence of a file against the database and 
insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', 
filename )
		data = cur.fetchone()

====================
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173] Traceback (most 
recent call last):, referer: http://superhost.gr/
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173]   File "/home/
nikos/public_html/cgi-bin/files.py", line 102, in <module>, referer: 
http://superhost.gr/
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173]     print
( filename ), referer: http://superhost.gr/
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173]   File "/usr/
local/lib/python3.3/codecs.py", line 355, in write, referer: http://
superhost.gr/
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173]     data, 
consumed = self.encode(object, self.errors), referer: http://superhost.gr/
[Wed Jun 12 10:56:56 2013] [error] [client 79.103.41.173] 
UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcce' in 
position 0: surrogates not allowed, referer: http://superhost.gr/
=====================

i tried to insert
print( filename )
sys.exit(0)

just before the execute
and the output is just Pacman.exe as seen in 

http://superhost.gr/?page=files.py

Seens the encoding precedure successfully turned all the filenames from 
greek-iso to utf-8 without failing, why woul it still be encoding issues 
when it comes to execute?

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 08:02 +0000
  Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-12 08:31 +0000
    Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 12:00 +0300
      Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-12 09:17 +0000
        Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 12:24 +0300
          Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-12 09:37 +0000
            Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 14:32 +0300
              Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 15:42 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-12 15:42 +0100
                Re: Turnign greek-iso filenames => utf-8 iso rusi <rustompmody@gmail.com> - 2013-06-12 09:14 -0700
                Re: Turnign greek-iso filenames => utf-8 iso Neil Cerutti <neilc@norwich.edu> - 2013-06-12 16:18 +0000
                Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 20:16 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 00:22 +0000
                Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 20:14 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 20:20 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 00:20 +0000
                Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 20:27 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 22:05 +0300
    Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 12:04 +0300
  Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-12 09:12 +0000
    Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-12 13:40 +0300
      Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 09:49 +0300
        Re: Turnign greek-iso filenames => utf-8 iso Chris Angelico <rosuav@gmail.com> - 2013-06-13 17:54 +1000
          Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 11:15 +0300
            Re: Turnign greek-iso filenames => utf-8 iso Chris Angelico <rosuav@gmail.com> - 2013-06-13 19:25 +1000
              Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 12:43 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Chris Angelico <rosuav@gmail.com> - 2013-06-14 00:05 +1000
        Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 17:28 +0300
        Re: Turnign greek-iso filenames => utf-8 iso Zero Piraeus <schesis@gmail.com> - 2013-06-13 10:16 -0400
          Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 19:20 +0300
            Re: Turnign greek-iso filenames => utf-8 iso Grant Edwards <invalid@invalid.invalid> - 2013-06-13 17:17 +0000
            Re: Turnign greek-iso filenames => utf-8 iso Zero Piraeus <schesis@gmail.com> - 2013-06-13 13:27 -0400
              Re: Turnign greek-iso filenames => utf-8 iso Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 20:48 +0300
                Re: Turnign greek-iso filenames => utf-8 iso Grant Edwards <invalid@invalid.invalid> - 2013-06-13 17:53 +0000
                Re: Turnign greek-iso filenames => utf-8 iso Chris Angelico <rosuav@gmail.com> - 2013-06-14 07:46 +1000
                Re: Turnign greek-iso filenames => utf-8 iso Dave Angel <davea@davea.name> - 2013-06-13 18:20 -0400
            Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-14 03:05 +0000
          Re: Turnign greek-iso filenames => utf-8 iso Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-14 01:28 +0000

csiph-web