Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #47793
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <nikos.gr33k@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'subject:error': 0.03; 'else:': 0.03; 'encoding': 0.05; 'insert': 0.05; 'rename': 0.07; 'utf-8': 0.07; '204);': 0.09; '34,': 0.09; '8bit%:78': 0.09; 'data:': 0.09; 'default.': 0.09; 'encode': 0.09; 'filename': 0.09; 'filenames': 0.09; 'filenames:': 0.09; 'host,': 0.09; 'locale': 0.09; 'try:': 0.09; 'runs': 0.10; 'python': 0.11; "%r'": 0.16; '%s,': 0.16; '(filename,': 0.16; 'baseline;': 0.16; 'codec': 0.16; 'compute': 0.16; 'exists,': 0.16; 'file;': 0.16; 'files:': 0.16; 'ordinal': 0.16; 'rgb(255,': 0.16; 'set()': 0.16; 'skip:n 50': 0.16; 'spurious': 0.16; 'ssh': 0.16; 'exception': 0.16; 'skip:# 20': 0.16; 'trying': 0.19; "skip:' 30": 0.19; '8bit%:5': 0.22; 'header:User-Agent:1': 0.23; 'error': 0.23; 'bytes': 0.24; 'please?': 0.24; 'skip:n 60': 0.24; 'unicode': 0.24; 'guys': 0.24; 'skip:" 30': 0.26; 'skip:" 40': 0.26; 'primary': 0.26; 'values': 0.27; 'raise': 0.29; "doesn't": 0.30; 'characters': 0.30; "skip:' 10": 0.31; '255,': 0.31; 'assert': 0.31; 'names.': 0.31; 'padding:': 0.31; 'post.': 0.31; 'skip:7 10': 0.31; 'file': 0.32; 'linux': 0.33; '(most': 0.33; 'skip:b 30': 0.33; "can't": 0.35; 'except': 0.35; 'skip:u 20': 0.35; 'something': 0.35; 'objects': 0.35; 'received:google.com': 0.35; 'there': 0.35; '8bit%:80': 0.36; '8bit%:9': 0.36; 'shows': 0.36; 'skip:- 20': 0.37; 'auto;': 0.38; 'remote': 0.38; 'message-id:@gmail.com': 0.38; 'server': 0.38; 'skip:& 10': 0.38; 'thank': 0.38; '8bit%:86': 0.38; 'to:addr :python-list': 0.38; 'files': 0.38; 'recent': 0.39; 'delete': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; '8bit%:6': 0.40; 'skip:u 10': 0.60; 'skip:n 30': 0.60; 'break': 0.61; 'from:charset:utf-8': 0.61; 'first': 0.61; 'you.': 0.62; 'reached': 0.63; 'skip:n 10': 0.64; '8bit%:10': 0.64; 'chance': 0.65; '8bit%:74': 0.68; '8bit%:43': 0.74; 'arial,': 0.74; 'helvetica,': 0.74; 'inline': 0.74; 'sans-serif;': 0.78; 'skip:n 40': 0.81; '8bit%:77': 0.84; 'greek': 0.84; 'rgb(102,': 0.84; '8bit%:56': 0.91; '8bit%:55': 0.93; '8bit%:67': 0.93 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type; bh=/dJGhU59SAwNVWPxs9NziIR9V6tCHodScc2nJnwgTjQ=; b=FyCCftb1Jsn4t2qsUo6B3rGqFpOFmih580IyB10P3HjFQ1D1txKm2KMJ0VrkmvWOZQ CJi9z4Hc3K+23MG8A1zYa1gdYDX4+G34DpWg8HKsD0DVKIDN7UHmnbYp58hpBliDsrt0 BPPafjErJNVpsCERbxbjEESmeRxPuAVlkWqMkb8pzxQ46tTFw8tZbIJXY+ybxNoe+75n 1ThC4tkfJPf7PMn0Ey136g+BEy1R9GY2MON64eNpLS6+i5Vw0NLGfb/AkkT5PKaR1fEE ELZaEKR89iHgY++P4Eai/GbR7eGSZz8UfBJ3NQgKYhmUVqY2zctqyOctAizqPqBJpymX jRYA== |
| X-Received | by 10.204.185.70 with SMTP id cn6mr1459035bkb.100.1370863138562; Mon, 10 Jun 2013 04:18:58 -0700 (PDT) |
| Date | Mon, 10 Jun 2013 14:18:54 +0300 |
| From | Νικόλαος Κούρας <nikos.gr33k@gmail.com> |
| User-Agent | Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Thunderbird/22.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | files.py (weird encoding error) |
| Content-Type | multipart/alternative; boundary="------------060601050700000603050507" |
| X-Mailman-Approved-At | Wed, 12 Jun 2013 14:26:58 +0200 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.3103.1371040092.3114.python-list@python.org> (permalink) |
| Lines | 1691 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1371040092 news.xs4all.nl 15926 [2001:888:2000:d::a6]:52020 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:47793 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
All happened when using FileZilla to upload greek filenames to my remote
linux server and putty as an ssh cleint, using greek-iso as a locale
encoding setting, because win8 used that by default.
Everything work when filenames in the directorry are ngleish file names.
IF i rename an eglish filename to greek filename i get the error that
shows upo at the end my post.
I know you guys know linu and there is a good chance you know python
too, so you can help me out.
thank you.
[CODE]
#====================
# Collect directory and its filenames as bytes
path = b'/home/nikos/public_html/data/apps/'
files = os.listdir( path )
for filename in files:
# Compute 'path/to/filename'
filepath_bytes = path + filename
for encoding in ('utf-8', 'iso-8859-7', 'latin-1'):
try:
filepath = filepath_bytes.decode( encoding )
except UnicodeDecodeError:
continue
# Rename to something valid in UTF-8
if encoding != 'utf-8':
os.rename( filepath_bytes,
filepath.encode('utf-8') )
assert os.path.exists( filepath )
break
else:
# This only runs if we never reached the break
raise ValueError( 'unable to clean filename %r' %
filepath_bytes )
#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
# Load'em
for filename in filenames:
try:
# Check the presence of a file against the database and
insert if it doesn't exist
cur.execute('''SELECT url FROM files WHERE url = %s''',
(filename,) )
data = cur.fetchone()
if not data:
# First time for file; primary key is
automatic, hit is defaulted
print( "iam here", filename + '\n' )
cur.execute('''INSERT INTO files (url, host,
lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
except pymysql.ProgrammingError as e:
print( repr(e) )
#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = set()
# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
filepaths.add( filename )
# Delete spurious
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()
# Check database's filenames against path's filenames
for rec in data:
if rec not in filepaths:
cur.execute('''DELETE FROM files WHERE url = %s''', rec )
[/CODE]
When trying to runt he above i get:
[CODE]
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Original
exception was:, referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Traceback
(most recent call last):, referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] File
"/home/nikos/public_html/cgi-bin/files.py", line 83, in <module>,
referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] assert
os.path.exists( filepath ), referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] File
"/usr/local/lib/python3.3/genericpath.py", line 18, in exists,
referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]
os.stat(path), referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]
UnicodeEncodeError: 'ascii' codec can't encode characters in position
34-37: ordinal not in range(128), refere
[/CODE]
Why am i still receing unicode decore errors?
i have write a prodecure just to avoid decoding issues and rename all
greek_bytes filenames to utf-8_bytes.
Can you help please?
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
files.py (weird encoding error) Νικόλαος Κούρας <nikos.gr33k@gmail.com> - 2013-06-10 14:18 +0300
csiph-web