Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #197142

Re: How to manage accented characters in mail header?

Path csiph.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From Peter Pearson <pkpearson@nowhere.invalid>
Newsgroups comp.lang.python
Subject Re: How to manage accented characters in mail header?
Date 4 Jan 2025 15:00:21 GMT
Lines 42
Message-ID <ltt0o4FlcuoU1@mid.individual.net> (permalink)
References <satn4l-6sqh.ln1@q957.zbmc.eu>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 8bit
X-Trace individual.net T0YWnbVIlibo+Ml6/dCc/wdy4aN14wo7msBmPNWfqm1zB2ojjF
Cancel-Lock sha1:ZK70ezogtfHY+ofEI6aFFBGvWGQ= sha256:5nFDZMPYsaLT1LmT/6oaq64NL2MawKTp7hJczXoxMao=
User-Agent slrn/1.0.3 (Linux)
Xref csiph.com comp.lang.python:197142

Show key headers only | View raw


On Sat, 4 Jan 2025 14:31:24 +0000, Chris Green <cl@isbd.net> wrote:
> I have a Python script that filters my incoming E-Mail.  It has been
> working OK (with various updates and improvements) for many years.
>
> I now have a minor new problem when handling E-Mail with a From: that
> has accented characters in it:-
>
>     From: Sébastien Crignon <sebastien.crignon@amvs.fr>
>
>
> I use Python mailbox to parse the message:-
>
>     import mailbox
>     ...
>     ...
>     msg = mailbox.MaildirMessage(sys.stdin.buffer.read())
>
> Then various mailbox methods to get headers etc.
> I use the following to get the From: address:-
>
>     str(msg.get('from', "unknown").lower()
>
> The result has the part with the accented character wrapped as follows:-
>
>     From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>
>
>
> I know I have hit this issue before but I can't rememeber the fix. The
> problem I have now is that searching the above doesn't work as
> expected. Basically I just need to get rid of the ?utf-8? wrapped bit
> altogether as I'm only interested in the 'real' address.  How can I
> easily remove the UTF8 section in a way that will work whether or not
> it's there?

This seemed to work for me:

    import email.header
    text, encoding = email.header.decode_header(some_string)[0]


-- 
To email me, substitute nowhere->runbox, invalid->com.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 14:31 +0000
  Re: How to manage accented characters in mail header? Peter Pearson <pkpearson@nowhere.invalid> - 2025-01-04 15:00 +0000
  Re: How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 19:07 +0000
    Re: How to manage accented characters in mail header? "Peter J. Holzer" <hjp-python@hjp.at> - 2025-01-06 20:43 +0100

csiph-web