Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #196919 > unrolled thread
| Started by | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| First post | 2024-10-31 16:33 +0100 |
| Last post | 2024-11-02 08:44 +1100 |
| Articles | 17 — 7 participants |
Back to article view | Back to comp.lang.python
Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-10-31 16:33 +0100
Re: Printing UTF-8 mail to terminal Left Right <olegsivokon@gmail.com> - 2024-10-31 17:38 +0100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-01 07:52 +0100
Re: Printing UTF-8 mail to terminal Inada Naoki <songofacandy@gmail.com> - 2024-11-03 12:08 +0900
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-04 11:48 +0100
Re: Printing UTF-8 mail to terminal (Posting On Python-List Prohibited) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-10-31 19:35 +0000
Re: Printing UTF-8 mail to terminal Cameron Simpson <cs@cskk.id.au> - 2024-11-01 07:50 +1100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-01 08:11 +0100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-01 10:10 +0100
Re: Printing UTF-8 mail to terminal dieter.maurer@online.de - 2024-11-01 17:38 +0100
Re: Printing UTF-8 mail to terminal Cameron Simpson <cs@cskk.id.au> - 2024-11-02 08:47 +1100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-04 11:44 +0100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-04 11:57 +0100
Re: Printing UTF-8 mail to terminal "Loris Bennett" <loris.bennett@fu-berlin.de> - 2024-11-04 13:02 +0100
Re: Printing UTF-8 mail to terminal "Peter J. Holzer" <hjp-python@hjp.at> - 2024-11-05 21:39 +0100
Re: Printing UTF-8 mail to terminal Cameron Simpson <cs@cskk.id.au> - 2024-11-06 08:20 +1100
Re: Printing UTF-8 mail to terminal Cameron Simpson <cs@cskk.id.au> - 2024-11-02 08:44 +1100
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-10-31 16:33 +0100 |
| Subject | Printing UTF-8 mail to terminal |
| Message-ID | <878qu49tii.fsf@zedat.fu-berlin.de> |
Hi,
I have a command-line program which creates an email containing German
umlauts. On receiving the mail, my mail client displays the subject and
body correctly:
Subject: Übung
Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
I seem to remember that I had issues in the past with a Perl version of
a similar program. As far as I recall there was an issue with fact the
greeting is generated by querying a server, whereas the body is being
read from a file, which lead to oddities when the two bits were
concatenated. But that might just have been a Perl thing.
Cheers,
Loris
--
This signature is currently under constuction.
[toc] | [next] | [standalone]
| From | Left Right <olegsivokon@gmail.com> |
|---|---|
| Date | 2024-10-31 17:38 +0100 |
| Message-ID | <mailman.61.1730392745.4695.python-list@python.org> |
| In reply to | #196919 |
There's quite a lot of misuse of terminology around terminal / console / shell. Please, correct me if I'm wrong, but it looks like you are printing that on MS Windows, right? MS Windows doesn't have or use terminals (that's more of a Unix-related concept). And, by "terminal" I mean terminal emulator (i.e. a program that emulates the behavior of a physical terminal). You can, of course, find some terminal programs for windows (eg. mintty), but I doubt that that's what you are dealing with. What MS Windows users usually end up using is the console. If you run, eg. cmd.exe, it will create a process that displays a graphical console. The console uses an encoding scheme to represent the text output. I believe that the default on MS Windows is to use some single-byte encoding. This answer from SE family site tells you how to set the console encoding to UTF-8 permanently: https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 , which, I believe, will solve your problem with how the text is displayed. On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list <python-list@python.org> wrote: > > Hi, > > I have a command-line program which creates an email containing German > umlauts. On receiving the mail, my mail client displays the subject and > body correctly: > > Subject: Übung > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine Übung. > > So far, so good. However, when I use the --verbose option to print > the mail to the terminal via > > if args.verbose: > print(mail) > > I get: > > Subject: Übungsbetreff > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine =C3=9Cbung. > > What do I need to do to prevent the body from getting mangled? > > I seem to remember that I had issues in the past with a Perl version of > a similar program. As far as I recall there was an issue with fact the > greeting is generated by querying a server, whereas the body is being > read from a file, which lead to oddities when the two bits were > concatenated. But that might just have been a Perl thing. > > Cheers, > > Loris > > -- > This signature is currently under constuction. > -- > https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-01 07:52 +0100 |
| Message-ID | <87v7x7o37z.fsf@zedat.fu-berlin.de> |
| In reply to | #196922 |
Left Right <olegsivokon@gmail.com> writes: > There's quite a lot of misuse of terminology around terminal / console > / shell. Please, correct me if I'm wrong, but it looks like you are > printing that on MS Windows, right? MS Windows doesn't have or use > terminals (that's more of a Unix-related concept). And, by "terminal" > I mean terminal emulator (i.e. a program that emulates the behavior of > a physical terminal). You can, of course, find some terminal programs > for windows (eg. mintty), but I doubt that that's what you are dealing > with. > > What MS Windows users usually end up using is the console. If you > run, eg. cmd.exe, it will create a process that displays a graphical > console. The console uses an encoding scheme to represent the text > output. I believe that the default on MS Windows is to use some > single-byte encoding. This answer from SE family site tells you how to > set the console encoding to UTF-8 permanently: > https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 > , which, I believe, will solve your problem with how the text is > displayed. I'm not using MS Windows. I am using a Gnome terminal on Debian 12 locally and connecting via SSH to a AlmaLinux 8 server, where I start a tmux session. > On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list > <python-list@python.org> wrote: >> >> Hi, >> >> I have a command-line program which creates an email containing German >> umlauts. On receiving the mail, my mail client displays the subject and >> body correctly: >> >> Subject: Übung >> >> Sehr geehrter Herr Dr. Bennett, >> >> Dies ist eine Übung. >> >> So far, so good. However, when I use the --verbose option to print >> the mail to the terminal via >> >> if args.verbose: >> print(mail) >> >> I get: >> >> Subject: Übungsbetreff >> >> Sehr geehrter Herr Dr. Bennett, >> >> Dies ist eine =C3=9Cbung. >> >> What do I need to do to prevent the body from getting mangled? >> >> I seem to remember that I had issues in the past with a Perl version of >> a similar program. As far as I recall there was an issue with fact the >> greeting is generated by querying a server, whereas the body is being >> read from a file, which lead to oddities when the two bits were >> concatenated. But that might just have been a Perl thing. >> >> Cheers, >> >> Loris >> >> -- >> This signature is currently under constuction. >> -- >> https://mail.python.org/mailman/listinfo/python-list -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin
[toc] | [prev] | [next] | [standalone]
| From | Inada Naoki <songofacandy@gmail.com> |
|---|---|
| Date | 2024-11-03 12:08 +0900 |
| Message-ID | <mailman.75.1730603335.4695.python-list@python.org> |
| In reply to | #196928 |
Try PYTHONUTF8=1 envver. 2024年11月2日(土) 0:36 Loris Bennett via Python-list <python-list@python.org>: > Left Right <olegsivokon@gmail.com> writes: > > > There's quite a lot of misuse of terminology around terminal / console > > / shell. Please, correct me if I'm wrong, but it looks like you are > > printing that on MS Windows, right? MS Windows doesn't have or use > > terminals (that's more of a Unix-related concept). And, by "terminal" > > I mean terminal emulator (i.e. a program that emulates the behavior of > > a physical terminal). You can, of course, find some terminal programs > > for windows (eg. mintty), but I doubt that that's what you are dealing > > with. > > > > What MS Windows users usually end up using is the console. If you > > run, eg. cmd.exe, it will create a process that displays a graphical > > console. The console uses an encoding scheme to represent the text > > output. I believe that the default on MS Windows is to use some > > single-byte encoding. This answer from SE family site tells you how to > > set the console encoding to UTF-8 permanently: > > > https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 > > , which, I believe, will solve your problem with how the text is > > displayed. > > I'm not using MS Windows. I am using a Gnome terminal on Debian 12 > locally and connecting via SSH to a AlmaLinux 8 server, where I start a > tmux session. > > > On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list > > <python-list@python.org> wrote: > >> > >> Hi, > >> > >> I have a command-line program which creates an email containing German > >> umlauts. On receiving the mail, my mail client displays the subject and > >> body correctly: > >> > >> Subject: Übung > >> > >> Sehr geehrter Herr Dr. Bennett, > >> > >> Dies ist eine Übung. > >> > >> So far, so good. However, when I use the --verbose option to print > >> the mail to the terminal via > >> > >> if args.verbose: > >> print(mail) > >> > >> I get: > >> > >> Subject: Übungsbetreff > >> > >> Sehr geehrter Herr Dr. Bennett, > >> > >> Dies ist eine =C3=9Cbung. > >> > >> What do I need to do to prevent the body from getting mangled? > >> > >> I seem to remember that I had issues in the past with a Perl version of > >> a similar program. As far as I recall there was an issue with fact the > >> greeting is generated by querying a server, whereas the body is being > >> read from a file, which lead to oddities when the two bits were > >> concatenated. But that might just have been a Perl thing. > >> > >> Cheers, > >> > >> Loris > >> > >> -- > >> This signature is currently under constuction. > >> -- > >> https://mail.python.org/mailman/listinfo/python-list > -- > Dr. Loris Bennett (Herr/Mr) > FUB-IT, Freie Universität Berlin > -- > https://mail.python.org/mailman/listinfo/python-list >
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-04 11:48 +0100 |
| Message-ID | <87a5efmg0g.fsf@zedat.fu-berlin.de> |
| In reply to | #196948 |
Inada Naoki <songofacandy@gmail.com> writes: > 2024年11月2日(土) 0:36 Loris Bennett via Python-list <python-list@python.org>: > >> Left Right <olegsivokon@gmail.com> writes: >> >> > There's quite a lot of misuse of terminology around terminal / console >> > / shell. Please, correct me if I'm wrong, but it looks like you are >> > printing that on MS Windows, right? MS Windows doesn't have or use >> > terminals (that's more of a Unix-related concept). And, by "terminal" >> > I mean terminal emulator (i.e. a program that emulates the behavior of >> > a physical terminal). You can, of course, find some terminal programs >> > for windows (eg. mintty), but I doubt that that's what you are dealing >> > with. >> > >> > What MS Windows users usually end up using is the console. If you >> > run, eg. cmd.exe, it will create a process that displays a graphical >> > console. The console uses an encoding scheme to represent the text >> > output. I believe that the default on MS Windows is to use some >> > single-byte encoding. This answer from SE family site tells you how to >> > set the console encoding to UTF-8 permanently: >> > >> https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 >> > , which, I believe, will solve your problem with how the text is >> > displayed. >> >> I'm not using MS Windows. I am using a Gnome terminal on Debian 12 >> locally and connecting via SSH to a AlmaLinux 8 server, where I start a >> tmux session. >> >> > On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list >> > <python-list@python.org> wrote: >> >> >> >> Hi, >> >> >> >> I have a command-line program which creates an email containing German >> >> umlauts. On receiving the mail, my mail client displays the subject and >> >> body correctly: >> >> >> >> Subject: Übung >> >> >> >> Sehr geehrter Herr Dr. Bennett, >> >> >> >> Dies ist eine Übung. >> >> >> >> So far, so good. However, when I use the --verbose option to print >> >> the mail to the terminal via >> >> >> >> if args.verbose: >> >> print(mail) >> >> >> >> I get: >> >> >> >> Subject: Übungsbetreff >> >> >> >> Sehr geehrter Herr Dr. Bennett, >> >> >> >> Dies ist eine =C3=9Cbung. >> >> >> >> What do I need to do to prevent the body from getting mangled? >> >> >> >> I seem to remember that I had issues in the past with a Perl version of >> >> a similar program. As far as I recall there was an issue with fact the >> >> greeting is generated by querying a server, whereas the body is being >> >> read from a file, which lead to oddities when the two bits were >> >> concatenated. But that might just have been a Perl thing. >> >> > > Try PYTHONUTF8=1 envver. > This does not seem to affect the way the email body is printed. Cheers, Loris -- This signature is currently under constuction.
[toc] | [prev] | [next] | [standalone]
| From | Lawrence D'Oliveiro <ldo@nz.invalid> |
|---|---|
| Date | 2024-10-31 19:35 +0000 |
| Subject | Re: Printing UTF-8 mail to terminal (Posting On Python-List Prohibited) |
| Message-ID | <vg0m6l$2qq89$2@dont-email.me> |
| In reply to | #196919 |
On Thu, 31 Oct 2024 16:33:41 +0100, Loris Bennett wrote: > Dies ist eine =C3=9Cbung. > > What do I need to do to prevent the body from getting mangled? I don’t think that’s actually getting mangled, that is how the actual message body looks. What you have there is called “quoted printable” encoding, and it’s a standard way to ensure the message body consists only of 7-bit ASCII. If you look at the source of the message, you should see a header line like “Content-Transfer-Encoding: quoted-printable”. This is how your email client knows how to display the text properly.
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@cskk.id.au> |
|---|---|
| Date | 2024-11-01 07:50 +1100 |
| Message-ID | <mailman.63.1730408232.4695.python-list@python.org> |
| In reply to | #196919 |
On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >I have a command-line program which creates an email containing German >umlauts. On receiving the mail, my mail client displays the subject and >body correctly: [...] >So far, so good. However, when I use the --verbose option to print >the mail to the terminal via > > if args.verbose: > print(mail) > >I get: > > Subject: Übungsbetreff > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine =C3=9Cbung. > >What do I need to do to prevent the body from getting mangled? That looks to me like quoted-printable. This is an encoding for binary transport of text to make it robust against not 8-buit clean transports. So your Unicode text is encodings as UTF-8, and then that is encoded in quoted-printable for transport through the email system. Your terminal probably accepts UTF-8 - I imagine other German text renders corectly? You need to get the text and undo the quoted-printable encoding. If you're using the Python email module to parse (or construct) the message as a `Message` object I'd expect that to happen automatically. If you're just dealing with this directly, use the `quopri` stdlib module: https://docs.python.org/3/library/quopri.html Cheers, Cameron Simpson <cs@cskk.id.au>
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-01 08:11 +0100 |
| Message-ID | <87msijo2cd.fsf@zedat.fu-berlin.de> |
| In reply to | #196925 |
Cameron Simpson <cs@cskk.id.au> writes: > On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>I have a command-line program which creates an email containing German >>umlauts. On receiving the mail, my mail client displays the subject and >>body correctly: > [...] >>So far, so good. However, when I use the --verbose option to print >>the mail to the terminal via >> >> if args.verbose: >> print(mail) >> >>I get: >> >> Subject: Übungsbetreff >> >> Sehr geehrter Herr Dr. Bennett, >> >> Dies ist eine =C3=9Cbung. >> >>What do I need to do to prevent the body from getting mangled? > > That looks to me like quoted-printable. This is an encoding for binary > transport of text to make it robust against not 8-buit clean > transports. So your Unicode text is encodings as UTF-8, and then that > is encoded in quoted-printable for transport through the email system. As I mentioned, I think the problem is to do with the way the salutation text provided by the "salutation server" and the mail body from a file are encoded. This seems to be different. > Your terminal probably accepts UTF-8 - I imagine other German text > renders corectly? Yes, it does. > You need to get the text and undo the quoted-printable encoding. > > If you're using the Python email module to parse (or construct) the > message as a `Message` object I'd expect that to happen automatically. I am using email.message.EmailMessage as, from the Python documentation https://docs.python.org/3/library/email.examples.html I gathered that that is the standard approach. And you are right that encoding for the actual mail which is received is automatically sorted out. If I display the raw email in my client I get the following: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable ... Subject: =?utf-8?q?=C3=9Cbungsbetreff?= ... Dies ist eine =C3=9Cbung. I would interpret that as meaning that the subject and body are encoded in the same way. The problem just occurs with the unsent string representation printed to the terminal. Cheers, Loris -- This signature is currently under constuction.
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-01 10:10 +0100 |
| Message-ID | <875xp7nwus.fsf@zedat.fu-berlin.de> |
| In reply to | #196929 |
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> Cameron Simpson <cs@cskk.id.au> writes:
>
>> On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>>>I have a command-line program which creates an email containing German
>>>umlauts. On receiving the mail, my mail client displays the subject and
>>>body correctly:
>> [...]
>>>So far, so good. However, when I use the --verbose option to print
>>>the mail to the terminal via
>>>
>>> if args.verbose:
>>> print(mail)
>>>
>>>I get:
>>>
>>> Subject: Übungsbetreff
>>>
>>> Sehr geehrter Herr Dr. Bennett,
>>>
>>> Dies ist eine =C3=9Cbung.
>>>
>>>What do I need to do to prevent the body from getting mangled?
>>
>> That looks to me like quoted-printable. This is an encoding for binary
>> transport of text to make it robust against not 8-buit clean
>> transports. So your Unicode text is encodings as UTF-8, and then that
>> is encoded in quoted-printable for transport through the email system.
>
> As I mentioned, I think the problem is to do with the way the salutation
> text provided by the "salutation server" and the mail body from a file
> are encoded. This seems to be different.
>
>> Your terminal probably accepts UTF-8 - I imagine other German text
>> renders corectly?
>
> Yes, it does.
>
>> You need to get the text and undo the quoted-printable encoding.
>>
>> If you're using the Python email module to parse (or construct) the
>> message as a `Message` object I'd expect that to happen automatically.
>
> I am using
>
> email.message.EmailMessage
>
> as, from the Python documentation
>
> https://docs.python.org/3/library/email.examples.html
>
> I gathered that that is the standard approach.
>
> And you are right that encoding for the actual mail which is received is
> automatically sorted out. If I display the raw email in my client I get
> the following:
>
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: quoted-printable
> ...
> Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
> ...
> Dies ist eine =C3=9Cbung.
>
> I would interpret that as meaning that the subject and body are encoded
> in the same way.
>
> The problem just occurs with the unsent string representation printed to
> the terminal.
If I log the body like this
body = f"{salutation},\n\n{text}\n{signature}"
logger.debug("body: " + body)
and look at the log file in my terminal I see
2024-11-01 09:59:12,318 - DEBUG - mailer:create_body - body: Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
...
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the
EmailMessage object to the terminal, such that the quoted-printable
parts are turned (back) into UTF-8?
Cheers,
Loris
--
This signature is currently under constuction.
[toc] | [prev] | [next] | [standalone]
| From | dieter.maurer@online.de |
|---|---|
| Date | 2024-11-01 17:38 +0100 |
| Message-ID | <mailman.67.1730480556.4695.python-list@python.org> |
| In reply to | #196930 |
Loris Bennett wrote at 2024-11-1 10:10 +0100: > ... > mail.set_content(body, cte="quoted-printable") In the line above, you request the content to use the "cte" (= "Content-Transfer-Encoding") "quoted-printable" and consequently, the content is encoded with `quoted-printable`. Maybe, you do not need to pass `cte`?
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@cskk.id.au> |
|---|---|
| Date | 2024-11-02 08:47 +1100 |
| Message-ID | <mailman.69.1730497664.4695.python-list@python.org> |
| In reply to | #196930 |
On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>as expected. The non-UTF-8 text occurs when I do
>
> mail = EmailMessage()
> mail.set_content(body, cte="quoted-printable")
> ...
>
> if args.verbose:
> print(mail)
>
>which is presumably also correct.
>
>The question is: What conversion is necessary in order to print the
>EmailMessage object to the terminal, such that the quoted-printable
>parts are turned (back) into UTF-8?
Do you still have access to `body` ? That would be the original message
text? Otherwise maybe:
print(mail.get_content())
The objective is to obtain the message body Unicode text (i.e. a regular
Python string with the original text, unencoded). And to print that.
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-04 11:44 +0100 |
| Message-ID | <87ed3rmg7g.fsf@zedat.fu-berlin.de> |
| In reply to | #196939 |
Cameron Simpson <cs@cskk.id.au> writes:
> On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>>as expected. The non-UTF-8 text occurs when I do
>>
>> mail = EmailMessage()
>> mail.set_content(body, cte="quoted-printable")
>> ...
>>
>> if args.verbose:
>> print(mail)
>>
>>which is presumably also correct.
>>
>>The question is: What conversion is necessary in order to print the
>>EmailMessage object to the terminal, such that the quoted-printable
>>parts are turned (back) into UTF-8?
>
> Do you still have access to `body` ? That would be the original
> message text? Otherwise maybe:
>
> print(mail.get_content())
>
> The objective is to obtain the message body Unicode text (i.e. a
> regular Python string with the original text, unencoded). And to print
> that.
With the following:
######################################################################
import email.message
m = email.message.EmailMessage()
m['Subject'] = 'Übung'
m.set_content('Dies ist eine Übung')
print('== cte: default == \n')
print(m)
print('-- full mail ---')
print(m)
print('-- just content--')
print(m.get_content())
m.set_content('Dies ist eine Übung', cte='quoted-printable')
print('== cte: quoted-printable ==\n')
print('-- full mail --')
print(m)
print('-- just content --')
print(m.get_content())
######################################################################
I get the following output:
######################################################################
== cte: default ==
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- full mail ---
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- just content--
Dies ist eine Übung
== cte: quoted-printable ==
-- full mail --
Subject: Übung
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Dies ist eine =C3=9Cbung
-- just content --
Dies ist eine Übung
######################################################################
So in both cases the subject is fine, but it is unclear to me how to
print the body. Or rather, I know how to print the body OK, but I don't
know how to print the headers separately - there seems to be nothing
like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the
headers, but that seems a little clunky.
Cheers,
Loris
--
This signature is currently under constuction.
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-04 11:57 +0100 |
| Message-ID | <875xp3mfku.fsf@zedat.fu-berlin.de> |
| In reply to | #196950 |
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> Cameron Simpson <cs@cskk.id.au> writes:
>
>> On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>>>as expected. The non-UTF-8 text occurs when I do
>>>
>>> mail = EmailMessage()
>>> mail.set_content(body, cte="quoted-printable")
>>> ...
>>>
>>> if args.verbose:
>>> print(mail)
>>>
>>>which is presumably also correct.
>>>
>>>The question is: What conversion is necessary in order to print the
>>>EmailMessage object to the terminal, such that the quoted-printable
>>>parts are turned (back) into UTF-8?
>>
>> Do you still have access to `body` ? That would be the original
>> message text? Otherwise maybe:
>>
>> print(mail.get_content())
>>
>> The objective is to obtain the message body Unicode text (i.e. a
>> regular Python string with the original text, unencoded). And to print
>> that.
>
> With the following:
>
> ######################################################################
>
> import email.message
>
> m = email.message.EmailMessage()
>
> m['Subject'] = 'Übung'
>
> m.set_content('Dies ist eine Übung')
> print('== cte: default == \n')
> print(m)
>
> print('-- full mail ---')
> print(m)
> print('-- just content--')
> print(m.get_content())
>
> m.set_content('Dies ist eine Übung', cte='quoted-printable')
> print('== cte: quoted-printable ==\n')
> print('-- full mail --')
> print(m)
> print('-- just content --')
> print(m.get_content())
>
> ######################################################################
>
> I get the following output:
>
> ######################################################################
>
> == cte: default ==
>
> Subject: Übung
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: base64
> MIME-Version: 1.0
>
> RGllcyBpc3QgZWluZSDDnGJ1bmcK
>
> -- full mail ---
> Subject: Übung
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: base64
> MIME-Version: 1.0
>
> RGllcyBpc3QgZWluZSDDnGJ1bmcK
>
> -- just content--
> Dies ist eine Übung
>
> == cte: quoted-printable ==
>
> -- full mail --
> Subject: Übung
> MIME-Version: 1.0
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: quoted-printable
>
> Dies ist eine =C3=9Cbung
>
> -- just content --
> Dies ist eine Übung
>
> ######################################################################
>
> So in both cases the subject is fine, but it is unclear to me how to
> print the body. Or rather, I know how to print the body OK, but I don't
> know how to print the headers separately - there seems to be nothing
> like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the
> headers, but that seems a little clunky.
Sorry, I am confusing the terminology here. The 'body' seems to be the
headers plus the 'content'. So I can print the *content* without the
headers OK, but I can't easily print all the headers separately. If
just print the body, i.e. headers plus content, the umlauts in the
content are not resolved.
--
This signature is currently under constuction.
[toc] | [prev] | [next] | [standalone]
| From | "Loris Bennett" <loris.bennett@fu-berlin.de> |
|---|---|
| Date | 2024-11-04 13:02 +0100 |
| Message-ID | <871pzrmcky.fsf@zedat.fu-berlin.de> |
| In reply to | #196952 |
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
>
>> Cameron Simpson <cs@cskk.id.au> writes:
>>
>>> On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>>>>as expected. The non-UTF-8 text occurs when I do
>>>>
>>>> mail = EmailMessage()
>>>> mail.set_content(body, cte="quoted-printable")
>>>> ...
>>>>
>>>> if args.verbose:
>>>> print(mail)
>>>>
>>>>which is presumably also correct.
>>>>
>>>>The question is: What conversion is necessary in order to print the
>>>>EmailMessage object to the terminal, such that the quoted-printable
>>>>parts are turned (back) into UTF-8?
>>>
>>> Do you still have access to `body` ? That would be the original
>>> message text? Otherwise maybe:
>>>
>>> print(mail.get_content())
>>>
>>> The objective is to obtain the message body Unicode text (i.e. a
>>> regular Python string with the original text, unencoded). And to print
>>> that.
>>
>> With the following:
>>
>> ######################################################################
>>
>> import email.message
>>
>> m = email.message.EmailMessage()
>>
>> m['Subject'] = 'Übung'
>>
>> m.set_content('Dies ist eine Übung')
>> print('== cte: default == \n')
>> print(m)
>>
>> print('-- full mail ---')
>> print(m)
>> print('-- just content--')
>> print(m.get_content())
>>
>> m.set_content('Dies ist eine Übung', cte='quoted-printable')
>> print('== cte: quoted-printable ==\n')
>> print('-- full mail --')
>> print(m)
>> print('-- just content --')
>> print(m.get_content())
>>
>> ######################################################################
>>
>> I get the following output:
>>
>> ######################################################################
>>
>> == cte: default ==
>>
>> Subject: Übung
>> Content-Type: text/plain; charset="utf-8"
>> Content-Transfer-Encoding: base64
>> MIME-Version: 1.0
>>
>> RGllcyBpc3QgZWluZSDDnGJ1bmcK
>>
>> -- full mail ---
>> Subject: Übung
>> Content-Type: text/plain; charset="utf-8"
>> Content-Transfer-Encoding: base64
>> MIME-Version: 1.0
>>
>> RGllcyBpc3QgZWluZSDDnGJ1bmcK
>>
>> -- just content--
>> Dies ist eine Übung
>>
>> == cte: quoted-printable ==
>>
>> -- full mail --
>> Subject: Übung
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset="utf-8"
>> Content-Transfer-Encoding: quoted-printable
>>
>> Dies ist eine =C3=9Cbung
>>
>> -- just content --
>> Dies ist eine Übung
>>
>> ######################################################################
>>
>> So in both cases the subject is fine, but it is unclear to me how to
>> print the body. Or rather, I know how to print the body OK, but I don't
>> know how to print the headers separately - there seems to be nothing
>> like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the
>> headers, but that seems a little clunky.
>
> Sorry, I am confusing the terminology here. The 'body' seems to be the
> headers plus the 'content'. So I can print the *content* without the
> headers OK, but I can't easily print all the headers separately. If
> just print the body, i.e. headers plus content, the umlauts in the
> content are not resolved.
OK, so I can do:
######################################################################
if args.verbose:
for k in mail.keys():
print(f"{k}: {mail.get(k)}")
print('')
print(mail.get_content())
######################################################################
prints what I want and is not wildly clunky, but I am a little surprised
that I can't get a string representation of the whole email in one go.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin
[toc] | [prev] | [next] | [standalone]
| From | "Peter J. Holzer" <hjp-python@hjp.at> |
|---|---|
| Date | 2024-11-05 21:39 +0100 |
| Message-ID | <mailman.81.1730839621.4695.python-list@python.org> |
| In reply to | #196953 |
[Multipart message — attachments visible in raw view] — view raw
On 2024-11-04 13:02:21 +0100, Loris Bennett via Python-list wrote:
> "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> > "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> >> Cameron Simpson <cs@cskk.id.au> writes:
> >>> On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
> >>>>as expected. The non-UTF-8 text occurs when I do
> >>>>
> >>>> mail = EmailMessage()
> >>>> mail.set_content(body, cte="quoted-printable")
> >>>> ...
> >>>>
> >>>> if args.verbose:
> >>>> print(mail)
> >>>>
> >>>>which is presumably also correct.
> >>>>
> >>>>The question is: What conversion is necessary in order to print the
> >>>>EmailMessage object to the terminal, such that the quoted-printable
> >>>>parts are turned (back) into UTF-8?
[...]
> OK, so I can do:
>
> ######################################################################
> if args.verbose:
> for k in mail.keys():
> print(f"{k}: {mail.get(k)}")
> print('')
> print(mail.get_content())
> ######################################################################
>
> prints what I want and is not wildly clunky, but I am a little surprised
> that I can't get a string representation of the whole email in one go.
Mails can contain lots of stuff, so there is in general no suitable
human readable string representation of a whole email. You have to go
through it part by part and decide what you want to do with each. For
example, if you have a multipart/alternative with a text/plain and a
text/html part what should the "string representation" be? For some uses
the text/plain part might be sufficient. For some you might want the
HTML part or some rendering of it. Or what would you do with an image?
Omit it completely? Just use the filename (if any)? Try to convert it to
ASCII-Art? Use an AI to describe it?
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@cskk.id.au> |
|---|---|
| Date | 2024-11-06 08:20 +1100 |
| Message-ID | <mailman.84.1730841650.4695.python-list@python.org> |
| In reply to | #196953 |
On 04Nov2024 13:02, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>OK, so I can do:
>
>######################################################################
>if args.verbose:
> for k in mail.keys():
> print(f"{k}: {mail.get(k)}")
> print('')
> print(mail.get_content())
>######################################################################
>
>prints what I want and is not wildly clunky, but I am a little surprised
>that I can't get a string representation of the whole email in one go.
A string representation of the whole message needs to be correctly
encoded so that its components can be identified mechanically. So it
needs to be a syntacticly valid RFC5322 message. Thus the encoding.
As an example (slightly contrived) of why this is important, multipart
messages are delimited with distinct lines, and their content may not
present such a line (even f it's in the "raw" original data).
So printing a whole message transcribes it in the encoded form so that
it can be decoded mechanically. And conservativly, this is usually an
ASCII compatibly encoding so that it can traverse various systems
undamaged. This means the text requiring UTF8 encoding get further
encoded as quoted printable to avoid ambiguity about the meaning of
bytes/octets which have their high bit set.
BTW, doesn't this:
for k in mail.keys():
print(f"{k}: {mail.get(k)}")
print the quoted printable (i.e. not decoded) form of subject lines?
Cheers,
Cameron Simpson <cs@cskk.id.au>
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@cskk.id.au> |
|---|---|
| Date | 2024-11-02 08:44 +1100 |
| Message-ID | <mailman.68.1730497471.4695.python-list@python.org> |
| In reply to | #196929 |
On 01Nov2024 08:11, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >Cameron Simpson <cs@cskk.id.au> writes: >> If you're using the Python email module to parse (or construct) the >> message as a `Message` object I'd expect that to happen automatically. > >I am using > email.message.EmailMessage Noted. That seems like the correct approach to me. >And you are right that encoding for the actual mail which is received >is >automatically sorted out. If I display the raw email in my client I get >the following: > > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: quoted-printable > ... > Subject: =?utf-8?q?=C3=9Cbungsbetreff?= > ... > Dies ist eine =C3=9Cbung. Right. Quoted-printable encoding for the transport. >I would interpret that as meaning that the subject and body are encoded >in the same way. Yes. >The problem just occurs with the unsent string representation printed to >the terminal. Yes, and I was thinking abut this yesterday. I suspect that `print(some_message_object)` is intended to transcribe it for transport. For example, one could write to an mbox file and just print() the message into it and get correct transport/storage formatting, which includes the qp encoding. Can you should the code (or example code) which leads to the qp output? I suspect there's a straight forward way to get the decoded Unicode, but I'd need to see how what you've got was obtained.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web