Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!news2.arglkargh.de!news.mixmin.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Wed, 01 May 2013 19:36:19 -0400
From: Ned Batchelder <ned@nedbatchelder.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: cl@isbd.net
Subject: Re: How do I encode and decode this data to write to a file?
References: <27s15a-943.ln1@chris.zbmc.eu>
In-Reply-To: <27s15a-943.ln1@chris.zbmc.eu>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1226.1367451385.3114.python-list@python.org>
Lines: 18
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:44607


On 4/29/2013 5:47 AM, cl@isbd.net wrote:
> If I understand correctly the encode() is saying that it can't
> understand the data in the html because there's a character 0xc3 in it.
> I *think* this means that the é is encoded in UTF-8 already in the
> incoming data stream (should be as my system is wholly UTF-8 as far as I
> know and I created the directory name).
>
> So how do I change the code so I don't get the error?  Do I just
> decode() the data first and then encode() it?
>

BTW, I did a presentation at PyCon 2012 that many people have found 
helpful: Pragmatic Unicode, or, How Do I Stop the Pain: 
http://nedbatchelder.com/text/unipain.html .  It explains the principles 
at work here.

--Ned.