Groups > comp.lang.python > #197143

Re: How to manage accented characters in mail header?

From	Chris Green <cl@isbd.net>
Newsgroups	comp.lang.python
Subject	Re: How to manage accented characters in mail header?
Date	2025-01-04 19:07 +0000
Message-ID	<dhdo4l-uvsi.ln1@q957.zbmc.eu> (permalink)
References	<satn4l-6sqh.ln1@q957.zbmc.eu> <decode_header-20250104154914@ram.dialup.fu-berlin.de>

Show all headers | View raw

Stefan Ram <ram@zedat.fu-berlin.de> wrote:
> Chris Green <cl@isbd.net> wrote or quoted:
> >From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>
> 
>   In Python, when you roll with decode_header from the email.header
>   module, it spits out a list of parts, where each part is like
>   a tuple of (decoded string, charset). To smash these decoded
>   sections into one string, you’ll want to loop through the list,
>   decode each piece (if it needs it), and then throw them together.
>   Here’s a straightforward example of how to pull this off:
> 
> from email.header import decode_header
> 
> # Example header
> header_example = \
> 'From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>'
> 
> # Decode the header
> decoded_parts = decode_header(header_example)
> 
> # Kick off an empty list for the decoded strings
> decoded_strings = []
> 
> for part, charset in decoded_parts:
>     if isinstance(part, bytes):
>         # Decode the bytes to a string using the charset
>         decoded_string = part.decode(charset or 'utf-8')
>     else:
>         # If it’s already a string, just roll with it
>         decoded_string = part
>     decoded_strings.append(decoded_string)
> 
> # Join the parts into a single string
> final_string = ''.join(decoded_strings)
> 
> print(final_string)# From: Sébastien Crignon <sebastien.crignon@amvs.fr>
> 
>   Breakdown
> 
>   decode_header(header_example): This line takes your email header
>   and breaks it down into a list of tuples.
> 
>   Looping through decoded_parts: You check if each part is in
>   bytes. If it is, you decode it using whatever charset it’s
>   got (defaulting to 'utf-8' if it’s a little vague).
> 
>   Appending Decoded Strings: You toss each decoded part into a list.
> 
>   Joining Strings: Finally, you use ''.join(decoded_strings) to glue
>   all the decoded strings into a single, coherent piece.
> 
>   Just a Heads Up
> 
>   Keep an eye out for cases where the charset might be None. In those
>   moments, it’s smart to fall back to 'utf-8' or something safe.
> 
Thanks, I think! :-)

Is there a simple[r] way to extract just the 'real' address between
the <>, that's all I actually need.  I think it has the be the last
chunk of the From: doesn't it?


-- 
Chris Green
·

Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 14:31 +0000
  Re: How to manage accented characters in mail header? Peter Pearson <pkpearson@nowhere.invalid> - 2025-01-04 15:00 +0000
  Re: How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 19:07 +0000
    Re: How to manage accented characters in mail header? "Peter J. Holzer" <hjp-python@hjp.at> - 2025-01-06 20:43 +0100

csiph-web