Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #76382 > unrolled thread
| Started by | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| First post | 2014-08-15 20:10 +0200 |
| Last post | 2014-08-17 01:08 -0700 |
| Articles | 20 on this page of 23 — 9 participants |
Back to article view | Back to comp.lang.python
Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-15 20:10 +0200
Re: Unicode in cgi-script with apache2 alister <alister.nospam.ware@ntlworld.com> - 2014-08-15 19:27 +0000
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:36 +0200
Re: Unicode in cgi-script with apache2 Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-17 02:50 +0000
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 07:32 +0200
Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 17:50 +1000
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 11:40 +0200
Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 03:05 -0700
Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 13:04 +0200
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 13:34 +0200
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 14:02 +0200
Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 23:00 +1000
Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 08:56 -0700
Re: Unicode in cgi-script with apache2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-08-17 13:35 +0100
Re: Unicode in cgi-script with apache2 Tony the Tiger <tony@tiger.invalid> - 2014-08-18 04:39 +0000
Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 15:12 +0200
Re: Unicode in cgi-script with apache2 Peter Otten <__peter__@web.de> - 2014-08-17 16:06 +0200
Re: Unicode in cgi-script with apache2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-17 15:54 +1000
Re: Unicode in cgi-script with apache2 John Gordon <gordon@panix.com> - 2014-08-15 19:32 +0000
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:39 +0200
Re: Unicode in cgi-script with apache2 Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-16 16:40 +0000
Re: Unicode in cgi-script with apache2 Dominique Ramaekers <dominique@ramaekers-stassart.be> - 2014-08-17 00:57 +0200
Re: Unicode in cgi-script with apache2 wxjmfauth@gmail.com - 2014-08-17 01:08 -0700
Page 1 of 2 [1] 2 Next page →
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-15 20:10 +0200 |
| Subject | Unicode in cgi-script with apache2 |
| Message-ID | <mailman.13038.1408130249.18130.python-list@python.org> |
Hi,
I've got a little script:
#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/cgi-data/index.html", "r")
for line in f:
print(line,end='')
If I run the script in the terminal, it nicely prints the webpage
'index.html'.
If access the script through a webbrowser, apache gives an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
1791: ordinal not in range(128)
I've done a hole afternoon of reading on fora and blogs, I don't have a
solution.
Can anyone help me?
Greetings,
Dominique.
[toc] | [next] | [standalone]
| From | alister <alister.nospam.ware@ntlworld.com> |
|---|---|
| Date | 2014-08-15 19:27 +0000 |
| Message-ID | <satHv.195207$ze2.61877@fx28.am4> |
| In reply to | #76382 |
On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote:
> Hi,
>
> I've got a little script:
>
> #!/usr/bin/env python3 print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> f = open("/var/www/cgi-data/index.html", "r")
> for line in f:
> print(line,end='')
>
> If I run the script in the terminal, it nicely prints the webpage
> 'index.html'.
>
> If access the script through a webbrowser, apache gives an error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 1791: ordinal not in range(128)
>
> I've done a hole afternoon of reading on fora and blogs, I don't have a
> solution.
>
> Can anyone help me?
>
> Greetings,
>
> Dominique.
1) this is not the way to get python to generate a web page, if you dont
want to use an existing framework (for example if you are doing this ans
an educational exercise) i suggest to google SWGI
2) you need to encode your output strings into a format apache/html
protocols can support - UTF8 is probably best here.
change your pint function to
print(line.encode('utf'),end='')
3) Ignore any subsequent advice from JMF even when he is trying to help
he is invariable wrong.
--
Freedom's just another word for nothing left to lose.
-- Kris Kristofferson, "Me and Bobby McGee"
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 00:36 +0200 |
| Message-ID | <mailman.13054.1408229123.18130.python-list@python.org> |
| In reply to | #76383 |
I fond my problem, I will describe it more at the bottom of this message...
But first...
Thanks Alister for the tips:
1) This evening, I've researched WSGI. I found that WSGI is more
advanced than CGI and I also think WSGI is more the Python way. I'm an
amateur playing around with my imagination on a small virtual server
(online cloudserver.ramaekers-stassart.be). I'm trying to build
something rather specific. I also like to make things as basic as
possible. My first thought was not to use a framework. This because with
a framework I didn't really know what the code is doing. For a
framework, for me, would be a black-box. But after inspecting WSGI, I
got the idea not to make it myself more difficult than it has to be. I
will work with a framework and I think I'll put my chances on Falcon
(for it's speed, small size and it doesn't seem to difficult)... There
are a lot of frameworks, so if someone wants to point me to an other
framework, I'm open to suggestions...
2) Your tip, to use 'encode' did not solve the problem and created a new
one. My lines were incapsulted in quotes and I got a lot of \b's and
\n's... and I still got the same error.
3) I didn't got the message from JMF, so...
What seems to be the problem:
My Script was ok. I know this because in the terminal I got my expected
output. Python3 uses UTF-8 coding as a standard. The problem is, when
python 'prints' to the apache interface, it translates the string to
ascii. (Why, I never found an answer). Somewhere in the middle of my
index.html file, there are letters like ë and ü. If Python tries to
translate these, Python throws an error. If I delete these letters in
the file, the script works perfectly in a browser! In Python2.7 the
script can easily be tweaked so the translation to ascii isn't done, but
in Python3, its a real pain in the a... I've read about people who
managed to force Python3 to 'print' to apache in UTF-8, but none of
their solutions worked for me.
I think the programmers of Python doesn't want to focus on Python +
apache + CGI (I think it only happens with apache and not with an other
http-server). I don't think they do this intentional but I guess they
assume that if you use Python to make a web-application, you also use
mod_wsgi or mod_python (in apache)...
So I'll use wsgi, It's a little more work but it seems really neat...
grtz
Op 15-08-14 om 21:27 schreef alister:
> On Fri, 15 Aug 2014 20:10:25 +0200, Dominique Ramaekers wrote:
>
>> Hi,
>>
>> I've got a little script:
>>
>> #!/usr/bin/env python3 print("Content-Type: text/html")
>> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
>> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
>> print("")
>> f = open("/var/www/cgi-data/index.html", "r")
>> for line in f:
>> print(line,end='')
>>
>> If I run the script in the terminal, it nicely prints the webpage
>> 'index.html'.
>>
>> If access the script through a webbrowser, apache gives an error:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 1791: ordinal not in range(128)
>>
>> I've done a hole afternoon of reading on fora and blogs, I don't have a
>> solution.
>>
>> Can anyone help me?
>>
>> Greetings,
>>
>> Dominique.
> 1) this is not the way to get python to generate a web page, if you dont
> want to use an existing framework (for example if you are doing this ans
> an educational exercise) i suggest to google SWGI
>
> 2) you need to encode your output strings into a format apache/html
> protocols can support - UTF8 is probably best here.
> change your pint function to
> print(line.encode('utf'),end='')
>
>
> 3) Ignore any subsequent advice from JMF even when he is trying to help
> he is invariable wrong.
>
>
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2014-08-17 02:50 +0000 |
| Message-ID | <lsp5ab$sjv$1@dont-email.me> |
| In reply to | #76409 |
On Sun, 17 Aug 2014 00:36:14 +0200, Dominique Ramaekers wrote: > What seems to be the problem: > My Script was ok. I know this because in the terminal I got my expected > output. Python3 uses UTF-8 coding as a standard. The problem is, when > python 'prints' to the apache interface, it translates the string to > ascii. (Why, I never found an answer). Is the apache server running on a linux or a windows platform? The problem may not be python, it may be the underlying OS. I wonder if apache is spawning a process for python though, and if so whether it is in some way constraining the character set available to stdout of the spawned process. From your other message, the error appears to be a python error on reading the input file. For some reason python seems to be trying to interpret the file it is reading as ascii. I wonder if specifying the binary data parameter and / or utf-8 encoding when opening the file might help. eg: f = open( "/var/www/cgi-data/index.html", "rb" ) f = open( "/var/www/cgi-data/index.html", "rb", encoding="utf-8" ) f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" ) I've managed to drive down a bit further in the problem: print() goes to sys.stdout This is part of what the docs say about sys.stdout: """ The character encoding is platform-dependent. Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page. Under other platforms, the locale encoding is used (see locale.getpreferredencoding ()). Under all platforms though, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. """ At this point, details of the OS become very significant. If your server is running on a windows platform you may need to figure out how to make apache set the PYTHONIOENCODING environment variable to "utf-8" (or whatever else is appropriate) before calling the python script. I believe that the following line in your httpd.conf may have the required effect. SetEnv PYTHONIOENCODING utf-8 Of course, if the file is not encoded as utf-8, but rather something else, then use that as the encoding in the above suggestions. If the server is not running windows, then I'm not sure where the problem might be. -- Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 07:32 +0200 |
| Message-ID | <mailman.13058.1408253857.18130.python-list@python.org> |
| In reply to | #76413 |
* My system is a linux-box. * I've tried using encoding="utf-8". It didn't fix things. * That print uses sys.stdout would explain, using sys.stdout isn't better. * My locale and the system-wide locale is UTF-8. Using SetEnv PYTHONIOENCODING utf-8 didn't fix things * The file is encoded UTF-8... I can not speak for anybody else but in my search I don't believe to have read about someone who had the problem on a Windows-system. They all used linux (different kinds of flavors) or OS-X... This is the first time I've encountered a situation where Windows is better in encoding issues :P +1 for Microsoft... I think that Apache (*nix versions) doesn't tell Python, she's accepting UTF-8. Or Python doesn't listen right... Maybe I should place a bug report in both projects? Op 17-08-14 om 04:50 schreef Denis McMahon: > On Sun, 17 Aug 2014 00:36:14 +0200, Dominique Ramaekers wrote: > >> What seems to be the problem: >> My Script was ok. I know this because in the terminal I got my expected >> output. Python3 uses UTF-8 coding as a standard. The problem is, when >> python 'prints' to the apache interface, it translates the string to >> ascii. (Why, I never found an answer). > Is the apache server running on a linux or a windows platform? > > The problem may not be python, it may be the underlying OS. I wonder if > apache is spawning a process for python though, and if so whether it is > in some way constraining the character set available to stdout of the > spawned process. > > From your other message, the error appears to be a python error on > reading the input file. For some reason python seems to be trying to > interpret the file it is reading as ascii. > > I wonder if specifying the binary data parameter and / or utf-8 encoding > when opening the file might help. > > eg: > > f = open( "/var/www/cgi-data/index.html", "rb" ) > f = open( "/var/www/cgi-data/index.html", "rb", encoding="utf-8" ) > f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" ) > > I've managed to drive down a bit further in the problem: > > print() goes to sys.stdout > > This is part of what the docs say about sys.stdout: > > """ > The character encoding is platform-dependent. Under Windows, if the > stream is interactive (that is, if its isatty() method returns True), the > console codepage is used, otherwise the ANSI code page. Under other > platforms, the locale encoding is used (see locale.getpreferredencoding > ()). > > Under all platforms though, you can override this value by setting the > PYTHONIOENCODING environment variable before starting Python. > """ > > At this point, details of the OS become very significant. If your server > is running on a windows platform you may need to figure out how to make > apache set the PYTHONIOENCODING environment variable to "utf-8" (or > whatever else is appropriate) before calling the python script. > > I believe that the following line in your httpd.conf may have the > required effect. > > SetEnv PYTHONIOENCODING utf-8 > > Of course, if the file is not encoded as utf-8, but rather something > else, then use that as the encoding in the above suggestions. If the > server is not running windows, then I'm not sure where the problem might > be. >
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-08-17 17:50 +1000 |
| Message-ID | <53f05ed9$0$30003$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #76413 |
Denis McMahon wrote:
> From your other message, the error appears to be a python error on
> reading the input file. For some reason python seems to be trying to
> interpret the file it is reading as ascii.
Oh!!! /facepalm
I think you've got it. I've been assuming the problem was on *writing* the
line. That's because the OP was insistent that the line failing was
[quoting Dominique]
The problem is, when python 'prints' to the apache interface, it
translates the string to ascii.
but if you read the traceback, you're right, the problem is *reading* the
file, not printing:
[Sat Aug 16 23:12:42.158326 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: Traceback (most recent call last):
[Sat Aug 16 23:12:42.158451 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: File "/var/www/cgi-python/index.html",
line 12, in <module>
[Sat Aug 16 23:12:42.158473 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: for line in f:
That's the line which is failing, reading the file. Which is then *decoded*.
Files contain bytes, which have to be decoded into text, and the decode is
assuming ASCII:
[Sat Aug 16 23:12:42.158526 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: File
"/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Sat Aug 16 23:12:42.158569 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: return codecs.ascii_decode(input,
self.errors)[0]
[Sat Aug 16 23:12:42.158663 2014] [cgi:error] [pid 29327] [client
119.63.193.196:11110] AH01215: UnicodeDecodeError: 'ascii' codec can't
decode byte 0xc3 in position 1791: ordinal not in range(128)
> I wonder if specifying the binary data parameter and / or utf-8 encoding
> when opening the file might help.
We don't really know what encoding the index.html file is encoded in. It
might be Latin-1, or cp-1252, or some other legacy encoding. But let's
assume it's UTF-8.
So why is Dominque's script reading it in ASCII? That's the key question. I
have a sinking feeling that Apache may be running Python as a subprocess
with the C locale, maybe. I don't know enough about cgi to be more than
just guessing.
Dominique, if you write:
f = open("/var/www/cgi-data/index.html", "r", encoding='utf-8')
the problem should go away (assuming index.html is valid UTF-8). If it
doesn't, there's a very strange bug somewhere.
Please try that, and see if it fixes the problem, or if the error goes to a
different line.
> eg:
>
> f = open( "/var/www/cgi-data/index.html", "rb" )
No, you don't want that, since then reading the file will return bytes, not
text. Although I suppose the OP might just commit to using bytes
everywhere. Yuck.
> f = open( "/var/www/cgi-data/index.html", "rb", encoding="utf-8" )
That makes no sense. If you're reading in binary mode, there's no encoding.
Every byte represents itself.
> f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" )
That's the bunny!
If you just want to hide the problem without fixing the underlying cause,
add an argument errors="replace", which is ugly but at least lets you move
on:
py> b = "Hello ë ü world".encode('utf-8')
py> print(b.decode('ascii', errors='replace'))
Hello �� �� world
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 11:40 +0200 |
| Message-ID | <mailman.13061.1408268785.18130.python-list@python.org> |
| In reply to | #76416 |
Wow, everybody keeps on chewing on this problem. As a bonus, I've
reconfigured my server to do some testings.
http://cloudserver.ramaekers-stassart.be/test.html => is the file I want
to read. Going to this url displays the file...
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => is the
cgi-script of this test
http://cloudserver.ramaekers-stassart.be/wsgi => is the wsgi sollution
(but for now it just says 'Hello world'...)
----------------This configuration-----------------------------
dominique@cloudserver:/var/www/cgi-python$ cat /etc/default/locale
LANG="en_US.UTF-8"
LANGUAGE="en_US:"
dominique@cloudserver:/var/www/cgi-python$ cat
/etc/apache2/sites-enabled/000-default.conf
<VirtualHost *:80>
ServerAdmin dominique@ramaekers-stassart.be
WSGIScriptAlias /wsgi /var/www/wsgi/application
<Directory /var/www/wsgi>
Order allow,deny
Allow from all
</Directory>
DocumentRoot /var/www/html
ScriptAlias /cgi-python /var/www/cgi-python/
<Directory /var/www/cgi-python>
Options ExecCGI
SetHandler cgi-script
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
dominique@cloudserver:/var/www/cgi-python$ cat encoding1
#!/usr/bin/env python3
print("Content-Type: text/html")
print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
print("")
f = open("/var/www/html/test.html", "r")
for line in f:
print(line,end='')
dominique@cloudserver:/var/www/cgi-python$ cat ../html/test.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Testing my cgi...</title>
</head>
<body>
<p>Ok, Testing my cgi... Lets try some characters: é ë ü</p>
</body>
</html>
dominique@cloudserver:/var/www/cgi-python$ file ../html/test.html
../html/test.html: HTML document, UTF-8 Unicode text
---------Start test----------------------
In brower: http://cloudserver.ramaekers-stassart.be/test.html => page
displays ok (try it yourself...)
In terminal: => all go's wel....
dominique@cloudserver:/var/www/cgi-python$ ./encoding1
Content-Type: text/html
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Testing my cgi...</title>
</head>
<body>
<p>Ok, Testing my cgi... Lets try some characters: é ë ü</p>
</body>
</html>
In the browser (firefox):
http://cloudserver.ramaekers-stassart.be/cgi-python/encoding1 => gives a
blank page!
The error log says:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 6
[Sun Aug 17 11:09:21.102003 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:09:21.102129 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: File "/var/www/cgi-python/encoding1",
line 7, in <module>
[Sun Aug 17 11:09:21.102149 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: for line in f:
[Sun Aug 17 11:09:21.102201 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: File
"/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Sun Aug 17 11:09:21.102243 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: return codecs.ascii_decode(input,
self.errors)[0]
[Sun Aug 17 11:09:21.102318 2014] [cgi:error] [pid 32146] [client
84.194.120.161:36707] AH01215: UnicodeDecodeError: 'ascii' codec can't
decode byte 0xc3 in position 162: ordinal not in range(128)
--------------Conclusion-----------------------------
In my current configuration, the bug is recreated!!!
-------------------Test 2: new configuration-----------------------------
I change the line f = open("/var/www/html/test.html", "r") into f =
open("/var/www/html/test.html", "r", encoding="utf-8") and save the
script as encoding2
In the terminal: => All ok
In the browser: => blank page!!!
Error log in apache:
root@cloudserver:~# cat /var/log/apache2/error.log | tail -n 4
[Sun Aug 17 11:13:47.372353 2014] [cgi:error] [pid 32147] [client
84.194.120.161:36711] AH01215: Traceback (most recent call last):
[Sun Aug 17 11:13:47.372461 2014] [cgi:error] [pid 32147] [client
84.194.120.161:36711] AH01215: File "/var/www/cgi-python/encoding2",
line 8, in <module>
[Sun Aug 17 11:13:47.372483 2014] [cgi:error] [pid 32147] [client
84.194.120.161:36711] AH01215: print(line,end='')
[Sun Aug 17 11:13:47.372572 2014] [cgi:error] [pid 32147] [client
84.194.120.161:36711] AH01215: UnicodeEncodeError: 'ascii' codec can't
encode character '\\xe9' in position 51: ordinal not in range(128)
---------Conclusion------------------
Steven was right. It was a read error => with encoding2 script the file
is read in UTF-8. Dough, I find it strange. The file is in UTF-8 and
Python3 has UTF-8 as standard..... But reading the file is fixed.
Now the writing is still broken....
Here are some tests hinted before:
Tip from Steven => getting the encoding:
dominique@cloudserver:/var/www/cgi-python$ cat readencoding
#!/usr/bin/env python3
import sys
print("Content-Type: text/html")
print("")
print(sys.getfilesystemencoding())
Gives in the terminal: utf-8
Gives in the browes: ascii
Found the problem!!!!!
Now, why apache starts Python in ascii????
Putting the lines in my apache config:
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf-8
Cleared my brower-cache... No change.....
I removed these lines....
If someone wants me to try more things, just post it. I'll try to
process them all. I don't want to change the code. I want Apache-Python3
to work in UTF-8 and not in ASCII. Fixing it in my code seems to me like
a dirty fix...
For now I'm going one with wsgi and hope I don't get the same problem
(but now I think I will :( ....)
Grtz
Op 17-08-14 om 09:50 schreef Steven D'Aprano:
....
>
> I think you've got it. I've been assuming the problem was on *writing* the
> line. That's because the OP was insistent that the line failing was
>
> [quoting Dominique]
> The problem is, when python 'prints' to the apache interface, it
> translates the string to ascii.
>
>
> but if you read the traceback, you're right, the problem is *reading* the
> file, not printing:
>
> [Sat Aug 16 23:12:42.158326 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215: Traceback (most recent call last):
> [Sat Aug 16 23:12:42.158451 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215: File "/var/www/cgi-python/index.html",
> line 12, in <module>
> [Sat Aug 16 23:12:42.158473 2014] [cgi:error] [pid 29327] [client
> 119.63.193.196:11110] AH01215: for line in f:
....
>
>> I wonder if specifying the binary data parameter and / or utf-8 encoding
>> when opening the file might help.
> We don't really know what encoding the index.html file is encoded in. It
> might be Latin-1, or cp-1252, or some other legacy encoding. But let's
> assume it's UTF-8.
>
> So why is Dominque's script reading it in ASCII? That's the key question. I
> have a sinking feeling that Apache may be running Python as a subprocess
> with the C locale, maybe. I don't know enough about cgi to be more than
> just guessing.
>
> Dominique, if you write:
>
> f = open("/var/www/cgi-data/index.html", "r", encoding='utf-8')
>
> the problem should go away (assuming index.html is valid UTF-8). If it
> doesn't, there's a very strange bug somewhere.
>
> Please try that, and see if it fixes the problem, or if the error goes to a
> different line.
.....
>
>> f = open( "/var/www/cgi-data/index.html", "r", encoding="utf-8" )
> That's the bunny!
>
> If you just want to hide the problem without fixing the underlying cause,
> add an argument errors="replace", which is ugly but at least lets you move
> on:
>
> py> b = "Hello ë ü world".encode('utf-8')
> py> print(b.decode('ascii', errors='replace'))
> Hello �� �� world
>
>
>
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-08-17 03:05 -0700 |
| Message-ID | <406363a3-5616-477c-86c0-71e101bca5bb@googlegroups.com> |
| In reply to | #76416 |
Le dimanche 17 août 2014 09:50:48 UTC+2, Steven D'Aprano a écrit :
>
>
>
>
> py> b = "Hello ë ü world".encode('utf-8')
>
> py> print(b.decode('ascii', errors='replace'))
>
> Hello �� �� world
>
>
>
=========
No. Your are taking the problem in the wrong way. This is
a typical situation, where the produced code will work
correctly, but it will be a "just for me working code".
The mistake is that, in that way you are producing code,
that is not suitable for the "system" that will host your
string.
In the present case, you are already assuming prior
any string manipulation, the output should be utf-8.
D:\>c:\python32\python
Python 3.2.5 (default, May 15 2013, 23:06:03) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> b = "Hello ë ü world".encode('utf-8')
>>> b
b'Hello \xc3\xab \xc3\xbc world'
>>> b.decode('ascii', 'replace')
'Hello \ufffd\ufffd \ufffd\ufffd world'
>>> print(b.decode('ascii', 'replace'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python32\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: cha
racter maps to <undefined>
>>>
The proper way is to "prepare" your string prior any
further manipulation (see my previous comment with
processes).
I'm using explicitely the code page cp850 and the
euro sign.
>>> u = "Hello ë ü world \u20ac\u20ac\u20ac"
>>> newu = u.encode('cp850', 'replace').decode('cp850')
>>> print(newu)
Hello ë ü world ???
>>> type(newu)
<class 'str'>
>>>
The replacement character now belongs to the set of the
characters, which are display-able.
It will never fail.
You can mimic the same behaviour with a web navigator.
Create an html file in utf-8 containing characters
not belonging to iso-8859-1.
Display that file and change the coding of the nagivator
to iso-8859-1.
You will see, the navigator "reencode* the source with
a replacement char and only later re-display it. Same
process I gave above.
The key point is the detection, if doable, of the coding scheme
that should be used.
>>> import sys
>>> sys.stdout.encoding
'cp850'
>>>
My example is not Windows specific. On a gb**** Chinese
BSD or a kio-8 Russion linux: identical problematic.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-08-17 13:04 +0200 |
| Message-ID | <mailman.13062.1408273509.18130.python-list@python.org> |
| In reply to | #76416 |
Dominique Ramaekers wrote: > Putting the lines in my apache config: > AddDefaultCharset UTF-8 > SetEnv PYTHONIOENCODING utf-8 > > Cleared my brower-cache... No change..... Did you restart the apache?
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 13:34 +0200 |
| Message-ID | <mailman.13063.1408275277.18130.python-list@python.org> |
| In reply to | #76416 |
Yes, even a restart not just reload. I Also put it in the section <virtualHost> as in the main apache2.conf.... Op 17-08-14 om 13:04 schreef Peter Otten: > Dominique Ramaekers wrote: > >> Putting the lines in my apache config: >> AddDefaultCharset UTF-8 >> SetEnv PYTHONIOENCODING utf-8 >> >> Cleared my brower-cache... No change..... > Did you restart the apache? > >
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 14:02 +0200 |
| Message-ID | <mailman.13064.1408276955.18130.python-list@python.org> |
| In reply to | #76416 |
As I suspected, if I check the used encoding in wsgi I get:
ANSI_X3.4-1968
I found you can define the coding of the script with a special comment:
# -*- coding: utf-8 -*-
Now I don't get an error but my special chars still doesn't display well.
The script:
# -*- coding: utf-8 -*-
import sys
def application(environ, start_response):
status = '200 OK'
output = 'Hello World! é ü à ũ'
#output = sys.getfilesystemencoding() #1
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
Gives in the browser as output:
Hello World! é ü à ũ
And if I check the encoding with the python script (uncommenting line
#1), I still get ANSI_X3.4-1968
This is really getting on my nerves.
Op 17-08-14 om 13:04 schreef Peter Otten:
> Dominique Ramaekers wrote:
>
>> Putting the lines in my apache config:
>> AddDefaultCharset UTF-8
>> SetEnv PYTHONIOENCODING utf-8
>>
>> Cleared my brower-cache... No change.....
> Did you restart the apache?
>
>
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-08-17 23:00 +1000 |
| Message-ID | <53f0a787$0$29991$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #76423 |
Dominique Ramaekers wrote:
> As I suspected, if I check the used encoding in wsgi I get:
> ANSI_X3.4-1968
That's another name for ASCII.
> I found you can define the coding of the script with a special comment:
> # -*- coding: utf-8 -*-
Be careful. That just tells Python what encoding the source code file is in.
It is not used by print(), or reading/writing files, just when the compiler
reads the source code.
> Now I don't get an error but my special chars still doesn't display well.
> The script:
> # -*- coding: utf-8 -*-
> import sys
> def application(environ, start_response):
> status = '200 OK'
> output = 'Hello World! é ü à ũ'
> #output = sys.getfilesystemencoding() #1
>
> response_headers = [('Content-type', 'text/plain'),
> ('Content-Length', str(len(output)))]
> start_response(status, response_headers)
>
> return [output]
>
> Gives in the browser as output:
>
> Hello World! é ü à ũ
That looks like ordinary moji-bake. Your Python script takes the text
string 'Hello World! é ü à ũ', which in UTF-8 gives you bytes:
py> 'Hello World! é ü à ũ'.encode('utf-8')
b'Hello World! \xc3\xa9 \xc3\xbc \xc3\xa0 \xc5\xa9'
Decoding back using latin-1 gives:
py> 'Hello World! é ü à ũ'.encode('utf-8').decode('latin1')
'Hello World! é ü Ã\xa0 Å©'
which appears to be exactly what you have. Why Latin-1 instead of ASCII?
Because the process has to output *something*, and Latin-1 is sometimes
called "extended ASCII".
I'm starting to fear a bug in Python 3.4, but since I have almost no
knowledge about wsgi and cgi, I can't be sure that this isn't just normal
expected behaviour :-(
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-08-17 08:56 -0700 |
| Message-ID | <7eb1e2f0-a3ae-4ee1-b6ff-f25abc3f535f@googlegroups.com> |
| In reply to | #76426 |
Le dimanche 17 août 2014 15:00:53 UTC+2, Steven D'Aprano a écrit : > > > I'm starting to fear a bug in Python 3.4, but since I have almost no > > knowledge about wsgi and cgi, I can't be sure that this isn't just normal > > expected behaviour :-( > Not Python 3.4. Python 3. It fails from the day zero. Do you remember this story from the Greek guy with "its" Greek encoding on the server side? jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-08-17 13:35 +0100 |
| Message-ID | <mailman.13065.1408278931.18130.python-list@python.org> |
| In reply to | #76416 |
On 17/08/2014 13:02, Dominique Ramaekers wrote:
if style == TOP_POSTING:
*plonk*
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Tony the Tiger <tony@tiger.invalid> |
|---|---|
| Date | 2014-08-18 04:39 +0000 |
| Message-ID | <53f1837e$0$25650$b1db1813$ba2d9d20@news.astraweb.com> |
| In reply to | #76424 |
On Sun, 17 Aug 2014 13:35:15 +0100, Mark Lawrence wrote:
> if style == TOP_POSTING:
> *plonk*
Hear hear!
/Grrr
--
___ ___
(\_--_/) | _ ._ _|_|_ _ |o _ _ ._
( 9 9 ) |(_)| |\/ |_| |(/_ ||(_|(/_|
stripes are forever - as overripe ferrets
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-08-17 15:12 +0200 |
| Message-ID | <mailman.13067.1408281166.18130.python-list@python.org> |
| In reply to | #76416 |
Dominique Ramaekers wrote:
> As I suspected, if I check the used encoding in wsgi I get:
> ANSI_X3.4-1968
>
> I found you can define the coding of the script with a special comment:
> # -*- coding: utf-8 -*-
>
> Now I don't get an error but my special chars still doesn't display well.
> The script:
> # -*- coding: utf-8 -*-
> import sys
> def application(environ, start_response):
> status = '200 OK'
> output = 'Hello World! é ü à ũ'
> #output = sys.getfilesystemencoding() #1
>
> response_headers = [('Content-type', 'text/plain'),
> ('Content-Length', str(len(output)))]
> start_response(status, response_headers)
>
> return [output]
>
> Gives in the browser as output:
>
> Hello World! é ü à ũ
That's UTF-8 interpreted as Latin-1. Try specifying the charset in the
header:
...
response_headers = [('Content-type', 'text/plain; charset=utf-8'),
...
> And if I check the encoding with the python script (uncommenting line
> #1), I still get ANSI_X3.4-1968
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-08-17 16:06 +0200 |
| Message-ID | <mailman.13068.1408284413.18130.python-list@python.org> |
| In reply to | #76416 |
Dominique Ramaekers wrote:
> And if I check the encoding with the python script (uncommenting line
> #1), I still get ANSI_X3.4-1968
That should not matter as long as
print(os.environ.get("PYTHONIOENCODING"))
prints
UTF-8
If you do get the correct PYTHONIOENCODING you should be able to replace the
corresponding SetEnv with
SetEnv LANG en_US.UTF-8
or similar. If you don't get the expected value the SetEnv is probably not
in the right place. In my experiments I put it into
/etc/apache2/sites-enabled/000-default.conf
in an apache installation I think I have not tinkered with before ;)
While looking around in the apache configuration I also found the file
/etc/apache2/envvars. Here's an excerpt:
## The locale used by some modules like mod_dav
export LANG=C
## Uncomment the following line to use the system default locale instead:
#. /etc/default/locale
export LANG
If you uncomment the line
. /etc/default/locale
and replace
SetEnv LANG en_US.UTF-8
with
PassEnv LANG
you should get a similar effect assuming your system defaults to UTF-8.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-08-17 15:54 +1000 |
| Message-ID | <53f043af$0$29975$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #76409 |
Dominique Ramaekers wrote: [...] > 2) Your tip, to use 'encode' did not solve the problem and created a new > one. My lines were incapsulted in quotes and I got a lot of \b's and > \n's... and I still got the same error. Just throwing random encode/decode calls into the mix are unlikely to fix the problem. First, you need to find an Apache expert who can tell you what encoding your Apache process is expecting. Hopefully it is UTF-8. Then you need to confirm that your Python process is also using UTF-8. Nearly all Unicode-related issues are due to mismatches between encodings in different parts of the system. If only everyone could use UTF-8 for all storage and transport layers, life would be so much simpler... but I digress. [...] > What seems to be the problem: > My Script was ok. I know this because in the terminal I got my expected > output. Did you test it at the terminal with input including ë and ü? > Python3 uses UTF-8 coding as a standard. The problem is, when > python 'prints' to the apache interface, it translates the string to > ascii. (Why, I never found an answer). Try putting the lines: import sys print(sys.getfilesystemencoding()) at the start of your program, and see what it prints at the terminal and what it prints under Apache. I predict that under Apache, it will say something like "C locale" or "US ASCII". If so, *that* is your problem. > Somewhere in the middle of my > index.html file, there are letters like ë and ü. If Python tries to > translate these, Python throws an error. If I delete these letters in > the file, the script works perfectly in a browser! In Python2.7 the > script can easily be tweaked so the translation to ascii isn't done, Not quite. Under Python 2.7, you will likely get moji-bake. For instance, if your index.html contains "ë ü π" stored in UTF-8, Python 2.7 will throw its hands in the air, say "I have no idea what ASCII characters they are, let's pretend it's some sort of Latin-1" and you'll get: ë ü Ï instead. Or perhaps not. With Python 2.7, what you get is not quite random, but it depends on the environment in some fairly obscure ways. Python 3 at least raises an exception when there is a mismatch, instead of trying to guess what you get. > but > in Python3, its a real pain in the a... I've read about people who > managed to force Python3 to 'print' to apache in UTF-8, but none of > their solutions worked for me. There is very little point in throwing random solutions at a problem if you don't understand the problem. First you need to find out why Python is trying to convert to ASCII. That's probably because of something Apache is doing. Do you have an Apache technician you can ask? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2014-08-15 19:32 +0000 |
| Message-ID | <lsln95$mvk$1@reader1.panix.com> |
| In reply to | #76382 |
In <mailman.13038.1408130249.18130.python-list@python.org> Dominique Ramaekers <dominique@ramaekers-stassart.be> writes:
> #!/usr/bin/env python3
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> f = open("/var/www/cgi-data/index.html", "r")
> for line in f:
> print(line,end='')
> If access the script through a webbrowser, apache gives an error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 1791: ordinal not in range(128)
The error traceback should display exactly where the error occurs within
the script. Which line is it?
--
John Gordon Imagine what it must be like for a real medical doctor to
gordon@panix.com watch 'House', or a real serial killer to watch 'Dexter'.
[toc] | [prev] | [next] | [standalone]
| From | Dominique Ramaekers <dominique@ramaekers-stassart.be> |
|---|---|
| Date | 2014-08-17 00:39 +0200 |
| Message-ID | <mailman.13055.1408229269.18130.python-list@python.org> |
| In reply to | #76384 |
Hi John,
The error is in the line "print(line,end='')"... and it only happens
when the script is started from a webbrowser. In the terminal, the
script works fine.
See my previous mail for my findings after a lot of reading and trying...
grz
Op 15-08-14 om 21:32 schreef John Gordon:
> In <mailman.13038.1408130249.18130.python-list@python.org> Dominique Ramaekers <dominique@ramaekers-stassart.be> writes:
>
>> #!/usr/bin/env python3
>> print("Content-Type: text/html")
>> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
>> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
>> print("")
>> f = open("/var/www/cgi-data/index.html", "r")
>> for line in f:
>> print(line,end='')
>> If access the script through a webbrowser, apache gives an error:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 1791: ordinal not in range(128)
> The error traceback should display exactly where the error occurs within
> the script. Which line is it?
>
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web