Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39929 > unrolled thread

stmplib MIMEText charset weirdness

Started by"Adam W." <AWasilenko@gmail.com>
First post2013-02-25 20:00 -0800
Last post2013-02-26 14:46 -0500
Articles 4 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  stmplib MIMEText charset weirdness "Adam W." <AWasilenko@gmail.com> - 2013-02-25 20:00 -0800
    Re: stmplib MIMEText charset weirdness Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-26 07:10 +0000
      Re: stmplib MIMEText charset weirdness "Adam W." <AWasilenko@gmail.com> - 2013-02-26 07:29 -0800
    Re: stmplib MIMEText charset weirdness Terry Reedy <tjreedy@udel.edu> - 2013-02-26 14:46 -0500

#39929 — stmplib MIMEText charset weirdness

From"Adam W." <AWasilenko@gmail.com>
Date2013-02-25 20:00 -0800
Subjectstmplib MIMEText charset weirdness
Message-ID<fc332ca1-e77e-4cab-a96f-53b49e734407@googlegroups.com>
Can someone explain to me why I can't set the charset after the fact and still have it work.

For example:
>>> text = MIMEText('❤¥'.encode('utf-8'), 'html')
>>> text.set_charset('utf-8')
>>> text.as_string()
Traceback (most recent call last):
  File "<pyshell#53>", line 1, in <module>
    text.as_string()
  File "C:\Python32\lib\email\message.py", line 168, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "C:\Python32\lib\email\generator.py", line 91, in flatten
    self._write(msg)
  File "C:\Python32\lib\email\generator.py", line 137, in _write
    self._dispatch(msg)
  File "C:\Python32\lib\email\generator.py", line 163, in _dispatch
    meth(msg)
  File "C:\Python32\lib\email\generator.py", line 192, in _handle_text
    raise TypeError('string payload expected: %s' % type(payload))
TypeError: string payload expected: <class 'bytes'>

As opposed to:
>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')
>>> text.as_string()
'Content-Type: text/html; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


Side question:
>>> text = MIMEText('❤¥', 'html')
>>> text.set_charset('utf-8')
>>> text.as_string()
'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/html; charset="utf-8"\n\n❤¥'

Why is it now 8-bit encoding?

[toc] | [next] | [standalone]


#39935

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-02-26 07:10 +0000
Message-ID<512c5fe3$0$30001$c3e8da3$5496439d@news.astraweb.com>
In reply to#39929
On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:

> Can someone explain to me why I can't set the charset after the fact and
> still have it work.
> 
> For example:
>>>> text = MIMEText('❤¥'.encode('utf-8'), 'html')


It would help if you tell us where this MIMEText function came from. 
Based on the error messages you provide later, I'm going to assume it is 
the one in the Python 3.2 email package:

from email.mime.text import MIMEText

The documentation for MIMEText is rather terse, but it implies that the 
parameter given should be a string, not bytes:

http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText

If I provide a string, it seems to work fine:


py> msg = '❤¥'
py> blob = MIMEText(msg, _charset='utf-8')
py> blob.as_string()
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

4p2kwqU=



But if I provide bytes, as you do, I get the same error you do:


py> msg_as_bytes = msg.encode('utf-8')
py> print(msg_as_bytes)
b'\xe2\x9d\xa4\xc2\xa5'
py> blob = MIMEText(msg_as_bytes)
py> blob.as_string()
Traceback (most recent call last):
  [...]
TypeError: string payload expected: <class 'bytes'>


So it pays to read the error message. It tells you that it expected the 
payload should be a string, but was bytes instead.


> As opposed to:
>
>>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')
>>>> text.as_string()
> 'Content-Type: text/html; charset="utf-8"\nMIME-Version:
> 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


My wild guess is that it is an accident (possibly a bug) that the above 
works at all. I think it shouldn't; MIMEText is expecting a string, and 
you provide a bytes object. The documentation for the email package 
states:


[quote]
Here are the major differences between email version 5.0 and version 4:

    All operations are on unicode strings. Text inputs must be strings, 
text outputs are strings. Outputs are limited to the ASCII character set 
and so can be encoded to ASCII for transmission. Inputs are also limited 
to ASCII; this is an acknowledged limitation of email 5.0 and means it 
can only be used to parse email that is 7bit clean.
[end quote]

http://docs.python.org/3.2/library/email.html



but frankly, I'm not an expert on the email package. It may be that the 
behaviour you describe is deliberate.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#39963

From"Adam W." <AWasilenko@gmail.com>
Date2013-02-26 07:29 -0800
Message-ID<c70fdc7f-9c4c-44e1-8ac4-74bb08c345dd@googlegroups.com>
In reply to#39935
On Tuesday, February 26, 2013 2:10:28 AM UTC-5, Steven D'Aprano wrote:
> On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:
> 
> The documentation for MIMEText is rather terse, but it implies that the 
> 
> parameter given should be a string, not bytes:
> 
> 
> 
> http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText
> 
> 
> 
> If I provide a string, it seems to work fine:
> 
> 


Ok, working under the assumption you need to provide it a string, it still leaves the question why adding the header after the fact (to a string input) does not produce the same result as declaring the encoding type inline.

 
> 
> > As opposed to:
> 
> >
> 
> >>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')
> 
> >>>> text.as_string()
> 
> > 'Content-Type: text/html; charset="utf-8"\nMIME-Version:
> 
> > 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'
> 
> 
> 
> 
> 
> My wild guess is that it is an accident (possibly a bug) that the above 
> 
> works at all. I think it shouldn't; MIMEText is expecting a string, and 
> 
> you provide a bytes object. The documentation for the email package 
> 
> states:
> 
> 
> 
> 
> 
> [quote]
> 
> Here are the major differences between email version 5.0 and version 4:
> 
> 
> 
>     All operations are on unicode strings. Text inputs must be strings, 
> 
> text outputs are strings. Outputs are limited to the ASCII character set 
> 
> and so can be encoded to ASCII for transmission. Inputs are also limited 
> 
> to ASCII; this is an acknowledged limitation of email 5.0 and means it 
> 
> can only be used to parse email that is 7bit clean.
> 
> [end quote]
> 
> 
> 
> http://docs.python.org/3.2/library/email.html
> 

I find this limitation hard to believe, why bother with encoding flags if it can only ever accept ASCII anyway?

The reason this issue came up was because I was adding the header after like in my examples and it wasn't working, so I Google'd around and found this Stackoverflow: http://stackoverflow.com/questions/10295530/how-to-set-a-charset-in-email-using-smtplib-in-python-2-7

Which seemed to be doing exactly what I wanted, with the only difference is the inline deceleration of utf-8, with that change it started working as desired...

[toc] | [prev] | [next] | [standalone]


#40000

FromTerry Reedy <tjreedy@udel.edu>
Date2013-02-26 14:46 -0500
Message-ID<mailman.2571.1361907991.2939.python-list@python.org>
In reply to#39929
On 2/25/2013 11:00 PM, Adam W. wrote:
> Can someone explain to me why I can't set the charset after the fact.

Email was revised to v.6 for 3.3, so the immediate answer to both your 
why questions is 'because email was not revised yet'.

> text = MIMEText('❤¥'.encode('utf-8'), 'html')

In 3.3 this fails immediately with
AttributeError: 'bytes' object has no attribute 'encode'
because when _charset is not given, MIMEText.__init__ test encodes to 
discover what it should be
         if _charset is None:
             try:
                 _text.encode('us-ascii')
                 _charset = 'us-ascii'
             except UnicodeEncodeError:
                 _charset = 'utf-8'

> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')

If one provides bytes, one must provide the charset and MIMEText assumes 
you are not lying.

> text.as_string()
> Content-Type: text/html; charset="utf-8"
 > MIME-Version: 1.0
 > Content-Transfer-Encoding: base64
 >
> 4p2kwqU=

> Side question:
> text = MIMEText('❤¥', 'html')
> text.set_charset('utf-8')

This is redundant here. This method is inherited from Message and 
appears pretty useless for the subclass.

> text.as_string()
> 'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
 > text/html;charset="utf-8"\n\n❤¥'
>
> Why is it now 8-bit encoding?

Bug fixed in 3.3. Output now same as above. Use 3.3 for email unless you 
cannot due to other dependencies not yet being available.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web