Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed1.swip.net!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Sun, 17 Aug 2014 00:49:47 +0200
From: Dominique Ramaekers <dominique@ramaekers-stassart.be>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Unicode in cgi-script with apache2
References: <53EE4D11.7040604@ramaekers-stassart.be> <lsnejt$fa$1@ger.gmane.org>
In-Reply-To: <lsnejt$fa$1@ger.gmane.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.13056.1408229389.18130.python-list@python.org>
Lines: 87
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:76411

Hi Peter,

Your code seems interesting.

I've tried using sys.stdout (in a slightly different form) but it gave 
the same error.

I also read about people who fixed the error by changing the servers 
locale to en_US.UTF-8. The people who posted these fixes also said that 
you can only use en_US.UTF-8 (and not ex. nl_BE.UTF8)... Anyway, It 
didn't work for me. And I find this a dirty fix because, I don't want to 
use US locale...

Please excuse me not to try out your specific solutions. I've already 
started to implement WSGI over CGI. See my previous message...

grz

Op 16-08-14 om 13:17 schreef Peter Otten:
> Dominique Ramaekers wrote:
>
>> I've got a little script:
>>
>> #!/usr/bin/env python3
>> print("Content-Type: text/html")
>> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
>> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
>> print("")
>> f = open("/var/www/cgi-data/index.html", "r")
>> for line in f:
>>       print(line,end='')
>>
>> If I run the script in the terminal, it nicely prints the webpage
>> 'index.html'.
>>
>> If access the script through a webbrowser, apache gives an error:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 1791: ordinal not in range(128)
>>
>> I've done a hole afternoon of reading on fora and blogs, I don't have a
>> solution.
>>
>> Can anyone help me?
> If the input and output encoding are the same you can avoid the byte-to-text
> (and subsequent text-to-byte conversion) and serve the binary contents of
> the index.html file directly:
>
> #!/usr/bin/env python3
> import sys
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> sys.stdout.flush()
> with open("/var/www/cgi-data/index.html", "rb") as f:
>      for line in f:
>          sys.stdout.buffer.write(line)
>
> The flush() is necessary to write pending data before accessing the lowlevel
> stdout.buffer. Instead of the loop you can use any of these:
>
> sys.stdout.buffer.write(f.read()) # not for huge files, but should be OK for
>                                    # typical html file sizes
> sys.stdout.buffer.writelines(f)
> shutil.copyfileobj(f, sys.stdout.buffer) # show off your knowledge
>                                           # of the stdlib ;)
>
>
> Alternatively you could choose an encoding via the locale:
>
> #!/usr/bin/env python3
> import locale
> locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> with open("/var/www/cgi-data/index.html") as f:
>      for line in f:
>          print(line, end='')
>
> Python should then use UTF-8 as the default for i/o and the resulting
> scripts looks more familiar.
>