Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoded': 0.05; 'sys': 0.05; '(using': 0.07; '-*-': 0.07; 'bits': 0.07; 'encoding:': 0.07; 'try:': 0.07; 'utf-8': 0.07; 'python': 0.09; '[1]:': 0.09; '[2]:': 0.09; '[3]:': 0.09; 'charset': 0.09; 'encode': 0.09; 'encoding)': 0.09; 'subject:using': 0.09; 'typeerror:': 0.09; 'encoding': 0.15; 'url:org)': 0.15; '3.3.': 0.16; 'bits,': 0.16; 'bytes-like': 0.16; 'filename:fname piece:signature': 0.16; 'gnupg': 0.16; 'input:': 0.16; 'received:206.46': 0.16; 'received:206.46.173': 0.16; 'subject:non': 0.16; 'sure.': 0.16; 'surrogate': 0.16; 'thoughts?': 0.16; 'string': 0.17; 'byte': 0.17; 'bytes': 0.17; 'headers': 0.17; 'unicode': 0.17; 'hack': 0.18; 'trying': 0.21; 'bit': 0.21; 'import': 0.21; "i've": 0.23; 'signed': 0.24; 'url:bugs': 0.24; 'tried': 0.25; 'header:User- Agent:1': 0.26; 'url:wiki': 0.26; 'skip:m 30': 0.26; '[1]': 0.27; 'skip:e 30': 0.27; "doesn't": 0.28; '(used': 0.29; 'issues.': 0.29; 'url:wikipedia': 0.29; 'skip:_ 10': 0.29; "i'm": 0.29; 'that.': 0.30; 'figure': 0.30; 'url:python': 0.32; 'getting': 0.33; 'comments': 0.33; 'problem': 0.33; 'to:addr:python-list': 0.33; "can't": 0.34; 'skip:b 20': 0.34; 'replaced': 0.35; 'skip:b 50': 0.35; 'except': 0.36; 'but': 0.36; 'url:org': 0.36; 'skip:m 40': 0.36; 'anything': 0.36; 'subject:with': 0.36; 'should': 0.36; 'possible': 0.37; 'object': 0.38; 'sure': 0.38; 'url:en': 0.38; 'to:addr:python.org': 0.39; 'subject:-': 0.40; 'address': 0.60; 'content-disposition:inline': 0.60; 'skip:u 10': 0.60; 'information,': 0.63; 'more': 0.63; 'received:206': 0.65; 'escape,': 0.84; 'payload': 0.84; 'trevor': 0.84; '8bit': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1359011133; bh=yDPLLEetlu2MfF7zL7njjiBGuZd6y7kD04svmJr/9SE=; h=Date:From:To:Subject; b=VcVK8I9rIWiWgc2Su9I+0VX9qZXB31DnoiL2VLf4BkO5XKFAsDCJFe2DYhGvCtWvR ZxXcBwTbKqDAhF0CCGsESBlHBQX1eFJq1dqExDM35Y0DZr/AaJQ+reoxvrR5bzuKPn nw646rQi4fXZG+OeMqNcIrcsJBAd/DOSFNxXdWh0= Date: Thu, 24 Jan 2013 02:05:33 -0500 From: "W. Trevor King" To: python-list@python.org Subject: Flatten an email Message with a non-ASCII body using 8bit CTE MIME-version: 1.0 Content-type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W/nzBZO5zC0uMSeA" Content-disposition: inline OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 103 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1359014742 news.xs4all.nl 6852 [2001:888:2000:d::a6]:40856 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:37544 --W/nzBZO5zC0uMSeA Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello list! I'm trying to figure out how to flatten a MIMEText message to bytes using an 8bit Content-Transfer-Encoding in Python 3.3. Here's what I've tried so far: # -*- encoding: utf-8 -*- import email.encoders from email.charset import Charset from email.generator import BytesGenerator from email.mime.text import MIMEText import sys body =3D '=CE=96=CE=B5=CF=8D=CF=82' encoding =3D 'utf-8' charset =3D Charset(encoding) charset.body_encoding =3D email.encoders.encode_7or8bit message =3D MIMEText(body, 'plain', encoding) del message['Content-Transfer-Encoding'] message.set_payload(body, charset) try: BytesGenerator(sys.stdout.buffer).flatten(message) except UnicodeEncodeError as e: print('error with string input:') print(e) message =3D MIMEText(body, 'plain', encoding) del message['Content-Transfer-Encoding'] message.set_payload(body.encode(encoding), charset) try: BytesGenerator(sys.stdout.buffer).flatten(message) except TypeError as e: print('error with byte input:') print(e) The `del m[=E2=80=A6]; m.set_payload()` bits work around #16324 [1] and sho= uld be orthogonal to the encoding issues. It's possible that #12553 is trying to address this issue [2,3], but that issue's comments are a bit vague, so I'm not sure. The problem with the string payload is that email.generator.BytesGenerator.write is getting the Unicode string payload unencoded and trying to encode it as ASCII. It may be possible to work around this by encoding the payload so that anything that doesn't encode (using the body charset) to a 7bit value is replaced with a surrogate escape, but I'm not sure how to do that. The problem with the byte payload is that _has_surrogates (used in email.generator.Generator._handle_text and BytesGenerator._handle_text) chokes on byte input: TypeError: can't use a string pattern on a bytes-like object For UTF-8, you can get away with: message.as_string().encode(message.get_charset().get_output_charset()) because the headers are encoded into 7 bits, so re-encoding them with UTF-8 is a no-op. However, if the body charset is UTF-16-LE or any other encoding that remaps 7bit characters, this hack breaks down. Thoughts? Trevor [1]: http://bugs.python.org/issue16324 [2]: http://bugs.python.org/issue12553 [3]: http://bugs.python.org/issue12552#msg140294 --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --W/nzBZO5zC0uMSeA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJRAN08AAoJEEUbTsx0l5OMfwAP/3oX6AhlhUhNVaUb99mVJe4C moT+pN3ribyhdrxxy6elUxOzywGkVUIBlK29etu97LZIGNLUJ7/2qL1P6YF3oLE4 aODfAztnCicqWWmvjITMdfY54yJaspDdSMyO4lIN/5OtVnPYejLkWUEFI/CXqGgh kFG/RQWAaRW49AESGWy+2pZCr3QaGeBUA6axoPHYa2b9H/5uN9OT8qUiOeVyBKBZ n+gcb3PbK3nthIehr7W7fqZ6GtnXoDuIO9zSopVjrEfn0/BSJtvhdifv8pNezevN tvuWTBCIMGAj76XO9nh7I7JZOtDHmmtSKb523pyZiZBkhMeTFcrH7MgNPJ3sT2Jx +WKVW1ui/YmW5e2weXvEBlnYLpb/3lRzYLDsQAIgzPxPbmw14yQqJlobzPPyDDXN GnjmRdEV7GaJekiOOiNxCCOYbwIvKv2Xm/txiEO25gotzYZUQ4AP2BXNamMStUmX pFC+K8pPJNzeWpVUqzUTkYbWit2QgPUJWS4Dwt2kgV5Qv6ut0dYJaeCRWuttUoMx jcxiL7uSN2g7czERVA/a81kzYsUphcUWtuO+nBVjl+8AGosLDamm6WOZtwVMzagm vHgrlcJ9vIULDy9HiI9AkUrmiAKMKbYVu/X9OnMK85IdaFiJy6CCv+Lm9XDXoOiw fuFfS/uVNPIRjAv9euT2 =OT7m -----END PGP SIGNATURE----- --W/nzBZO5zC0uMSeA--