Path: csiph.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Peter Pearson Newsgroups: comp.lang.python Subject: Re: How to manage accented characters in mail header? Date: 4 Jan 2025 15:00:21 GMT Lines: 42 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: individual.net T0YWnbVIlibo+Ml6/dCc/wdy4aN14wo7msBmPNWfqm1zB2ojjF Cancel-Lock: sha1:ZK70ezogtfHY+ofEI6aFFBGvWGQ= sha256:5nFDZMPYsaLT1LmT/6oaq64NL2MawKTp7hJczXoxMao= User-Agent: slrn/1.0.3 (Linux) Xref: csiph.com comp.lang.python:197142 On Sat, 4 Jan 2025 14:31:24 +0000, Chris Green wrote: > I have a Python script that filters my incoming E-Mail. It has been > working OK (with various updates and improvements) for many years. > > I now have a minor new problem when handling E-Mail with a From: that > has accented characters in it:- > > From: Sébastien Crignon > > > I use Python mailbox to parse the message:- > > import mailbox > ... > ... > msg = mailbox.MaildirMessage(sys.stdin.buffer.read()) > > Then various mailbox methods to get headers etc. > I use the following to get the From: address:- > > str(msg.get('from', "unknown").lower() > > The result has the part with the accented character wrapped as follows:- > > From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= > > > I know I have hit this issue before but I can't rememeber the fix. The > problem I have now is that searching the above doesn't work as > expected. Basically I just need to get rid of the ?utf-8? wrapped bit > altogether as I'm only interested in the 'real' address. How can I > easily remove the UTF8 section in a way that will work whether or not > it's there? This seemed to work for me: import email.header text, encoding = email.header.decode_header(some_string)[0] -- To email me, substitute nowhere->runbox, invalid->com.