Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #197143
| From | Chris Green <cl@isbd.net> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: How to manage accented characters in mail header? |
| Date | 2025-01-04 19:07 +0000 |
| Message-ID | <dhdo4l-uvsi.ln1@q957.zbmc.eu> (permalink) |
| References | <satn4l-6sqh.ln1@q957.zbmc.eu> <decode_header-20250104154914@ram.dialup.fu-berlin.de> |
Stefan Ram <ram@zedat.fu-berlin.de> wrote: > Chris Green <cl@isbd.net> wrote or quoted: > >From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr> > > In Python, when you roll with decode_header from the email.header > module, it spits out a list of parts, where each part is like > a tuple of (decoded string, charset). To smash these decoded > sections into one string, you’ll want to loop through the list, > decode each piece (if it needs it), and then throw them together. > Here’s a straightforward example of how to pull this off: > > from email.header import decode_header > > # Example header > header_example = \ > 'From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>' > > # Decode the header > decoded_parts = decode_header(header_example) > > # Kick off an empty list for the decoded strings > decoded_strings = [] > > for part, charset in decoded_parts: > if isinstance(part, bytes): > # Decode the bytes to a string using the charset > decoded_string = part.decode(charset or 'utf-8') > else: > # If it’s already a string, just roll with it > decoded_string = part > decoded_strings.append(decoded_string) > > # Join the parts into a single string > final_string = ''.join(decoded_strings) > > print(final_string)# From: Sébastien Crignon <sebastien.crignon@amvs.fr> > > Breakdown > > decode_header(header_example): This line takes your email header > and breaks it down into a list of tuples. > > Looping through decoded_parts: You check if each part is in > bytes. If it is, you decode it using whatever charset it’s > got (defaulting to 'utf-8' if it’s a little vague). > > Appending Decoded Strings: You toss each decoded part into a list. > > Joining Strings: Finally, you use ''.join(decoded_strings) to glue > all the decoded strings into a single, coherent piece. > > Just a Heads Up > > Keep an eye out for cases where the charset might be None. In those > moments, it’s smart to fall back to 'utf-8' or something safe. > Thanks, I think! :-) Is there a simple[r] way to extract just the 'real' address between the <>, that's all I actually need. I think it has the be the last chunk of the From: doesn't it? -- Chris Green ·
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 14:31 +0000
Re: How to manage accented characters in mail header? Peter Pearson <pkpearson@nowhere.invalid> - 2025-01-04 15:00 +0000
Re: How to manage accented characters in mail header? Chris Green <cl@isbd.net> - 2025-01-04 19:07 +0000
Re: How to manage accented characters in mail header? "Peter J. Holzer" <hjp-python@hjp.at> - 2025-01-06 20:43 +0100
csiph-web