Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76418

Re: Unicode in cgi-script with apache2

Date 2014-08-17 11:40 +0200
From Dominique Ramaekers <dominique@ramaekers-stassart.be>
Subject Re: Unicode in cgi-script with apache2
References <mailman.13038.1408130249.18130.python-list@python.org> <satHv.195207$ze2.61877@fx28.am4> <mailman.13054.1408229123.18130.python-list@python.org> <lsp5ab$sjv$1@dont-email.me> <53f05ed9$0$30003$c3e8da3$5496439d@news.astraweb.com>
Newsgroups comp.lang.python
Message-ID <mailman.13061.1408268785.18130.python-list@python.org> (permalink)

Show all headers | View raw


Wow, everybody keeps on chewing on this problem. As a bonus, I've 
reconfigured my server to do some testings.
http://cloudserver.ramaekers-stassart.be/test.html => is the file I want 
to read. Going to this url displays the file...
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => is the 
cgi-script of this test
http://cloudserver.ramaekers-stassart.be/wsgi => is the wsgi sollution 
(but for now it just says 'Hello world'...)

----------------This configuration-----------------------------

dominique@cloudserver:/var/www/cgi-python$ cat /etc/default/locale
LANG="en_US.UTF-8"
LANGUAGE="en_US:"

dominique@cloudserver:/var/www/cgi-python$ cat 
/etc/apache2/sites-enabled/000-default.conf
<VirtualHost *:80>

     ServerAdmin dominique@ramaekers-stassart.be
     WSGIScriptAlias /wsgi /var/www/wsgi/application

     <Directory /var/www/wsgi>
             Order allow,deny
             Allow from all
         </Directory>

     DocumentRoot /var/www/html

     ScriptAlias /cgi-python /var/www/cgi-python/
     <Directory /var/www/cgi-python>
             Options ExecCGI
             SetHandler cgi-script
         </Directory>

     ErrorLog ${APACHE_LOG_DIR}/error.log
     CustomLog ${APACHE_LOG_DIR}/access.log combined

</VirtualHost>

dominique@cloudserver:/var/www/cgi-python$ cat encoding1
#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/html/test.html", "r")
for line in f:
     print(line,end='')

dominique@cloudserver:/var/www/cgi-python$ cat ../html/test.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Testing my cgi...</title>
</head>
<body>
<p>Ok, Testing my cgi... Lets try some characters: é ë ü</p>
</body>
</html>

dominique@cloudserver:/var/www/cgi-python$ file ../html/test.html
../html/test.html: HTML document, UTF-8 Unicode text

---------Start test----------------------
In brower: http://cloudserver.ramaekers-stassart.be/test.html => page 
displays ok (try it yourself...)

In terminal: => all go's wel....
dominique@cloudserver:/var/www/cgi-python$ ./encoding1
Content-Type: text/html
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Testing my cgi...</title>
</head>
<body>
<p>Ok, Testing my cgi... Lets try some characters: é ë ü</p>
</body>
</html>

In the browser (firefox):
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => gives a 
blank page!

The error log says:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 6
[Sun Aug 17 11:09:21.102003 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:09:21.102129 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:   File "/var/www/cgi-python/encoding1", 
line 7, in <module>
[Sun Aug 17 11:09:21.102149 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:     for line in f:
[Sun Aug 17 11:09:21.102201 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:   File 
"/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Sun Aug 17 11:09:21.102243 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215:     return codecs.ascii_decode(input, 
self.errors)[0]
[Sun Aug 17 11:09:21.102318 2014] [cgi:error] [pid 32146] [client 
84.194.120.161:36707] AH01215: UnicodeDecodeError: 'ascii' codec can't 
decode byte 0xc3 in position 162: ordinal not in range(128)

--------------Conclusion-----------------------------
In my current configuration, the bug is recreated!!!

-------------------Test 2: new configuration-----------------------------
I change the line f = open("/var/www/html/test.html", "r") into f = 
open("/var/www/html/test.html", "r", encoding="utf-8") and save the 
script as encoding2

In the terminal: => All ok

In the browser: => blank page!!!

Error log in apache:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 4
[Sun Aug 17 11:13:47.372353 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:13:47.372461 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215:   File "/var/www/cgi-python/encoding2", 
line 8, in <module>
[Sun Aug 17 11:13:47.372483 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215:     print(line,end='')
[Sun Aug 17 11:13:47.372572 2014] [cgi:error] [pid 32147] [client 
84.194.120.161:36711] AH01215: UnicodeEncodeError: 'ascii' codec can't 
encode character '\\xe9' in position 51: ordinal not in range(128)

---------Conclusion------------------
Steven was right. It was a read error => with encoding2 script the file 
is read in UTF-8. Dough, I find it strange. The file is in UTF-8 and 
Python3 has UTF-8 as standard..... But reading the file is fixed.

Now the writing is still broken....

Here are some tests hinted before:

Tip from Steven => getting the encoding:
dominique@cloudserver:/var/www/cgi-python$ cat readencoding
#!/usr/bin/env python3
import sys
print("Content-Type: text/html")
print("")
print(sys.getfilesystemencoding())

Gives in the terminal: utf-8
Gives in the browes: ascii

Found the problem!!!!!

Now, why apache starts Python in ascii????

Putting the lines in my apache config:
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf-8

Cleared my brower-cache... No change.....

I removed these lines....

If someone wants me to try more things, just post it. I'll try to 
process them all. I don't want to change the code. I want Apache-Python3 
to work in UTF-8 and not in ASCII. Fixing it in my code seems to me like 
a dirty fix...

For now I'm going one with wsgi and hope I don't get the same problem 
(but now I think I will :( ....)

Grtz

Op 17-08-14 om 09:50 schreef Steven D'Aprano:
....
>
> I think you've got it. I've been assuming the problem was on *writing* the
> line. That's because the OP was insistent that the line failing was
>
>      [quoting Dominique]
>      The problem is, when python 'prints' to the apache interface, it
>      translates the string to ascii.
>
>
> but if you read the traceback, you're right, the problem is *reading* the
> file, not printing:
>
> [Sat Aug 16 23:12:42.158326 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215: Traceback (most recent call last):
> [Sat Aug 16 23:12:42.158451 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215:   File "/var/www/cgi-python/index.html",
> line 12, in <module>
> [Sat Aug 16 23:12:42.158473 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215:     for line in f:
....
>
>> I wonder if specifying the binary data parameter and / or utf-8 encoding
>> when opening the file might help.
> We don't really know what encoding the index.html file is encoded in. It
> might be Latin-1, or cp-1252, or some other legacy encoding. But let's
> assume it's UTF-8.
>
> So why is Dominque's script reading it in ASCII? That's the key question. I
> have a sinking feeling that Apache may be running Python as a subprocess
> with the C locale, maybe. I don't know enough about cgi to be more than
> just guessing.
>
> Dominique, if you write:
>
> f = open("/var/www/cgi-data/index.html", "r", encoding='utf-8')
>
> the problem should go away (assuming index.html is valid UTF-8). If it
> doesn't, there's a very strange bug somewhere.
>
> Please try that, and see if it fixes the problem, or if the error goes to a
> different line.
.....
>
>> f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" )
> That's the bunny!
>
> If you just want to hide the problem without fixing the underlying cause,
> add an argument errors="replace", which is ugly but at least lets you move
> on:
>
> py> b = "Hello ë ü world".encode('utf-8')
> py> print(b.decode('ascii', errors='replace'))
> Hello �� �� world
>
>
>

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-15 20:10 +0200
  Re: Unicode in cgi-script with apache2 alister <alister.nospam.ware@ntlworld.com> - 2014-08-15 19:27 +0000
    Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:36 +0200
      Re: Unicode in cgi-script with apache2 Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-17 02:50 +0000
        Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 07:32 +0200
        Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 17:50 +1000
          Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 11:40 +0200
          Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 03:05 -0700
          Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 13:04 +0200
          Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 13:34 +0200
          Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 14:02 +0200
            Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 23:00 +1000
              Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 08:56 -0700
          Re: Unicode in cgi-script with apache2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-08-17 13:35 +0100
            Re: Unicode in cgi-script with apache2 Tony the Tiger <tony@tiger.invalid> - 2014-08-18 04:39 +0000
          Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 15:12 +0200
          Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 16:06 +0200
      Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 15:54 +1000
  Re: Unicode in cgi-script with apache2 John Gordon <gordon@panix.com> - 2014-08-15 19:32 +0000
    Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:39 +0200
  Re: Unicode in cgi-script with apache2 Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-16 16:40 +0000
    Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:57 +0200
  Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 01:08 -0700

csiph-web