Groups > comp.lang.python > #54752 > unrolled thread

Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2

Started by	nilsbunger@gmail.com
First post	2013-09-25 09:38 -0700
Last post	2013-09-26 09:35 -0700
Articles	8 — 5 participants

Back to article view | Back to comp.lang.python

  Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 nilsbunger@gmail.com - 2013-09-25 09:38 -0700
    Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Chris Angelico <rosuav@gmail.com> - 2013-09-26 14:11 +1000
      Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Nils Bunger <nilsbunger@gmail.com> - 2013-09-25 21:23 -0700
        Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Chris Angelico <rosuav@gmail.com> - 2013-09-26 14:32 +1000
          Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Neil Cerutti <neilc@norwich.edu> - 2013-09-26 13:41 +0000
            Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Nils Bunger <nilsbunger@gmail.com> - 2013-09-26 08:56 -0700
              Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Piet van Oostrum <piet@vanoostrum.org> - 2013-09-26 14:44 -0400
    Re: Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2 Nils Bunger <nilsbunger@gmail.com> - 2013-09-26 09:35 -0700

#54752 — Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2

From	nilsbunger@gmail.com
Date	2013-09-25 09:38 -0700
Subject	Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2
Message-ID	<14063249-6159-48ff-bfe2-8e8d6e3cd7a4@googlegroups.com>

Hi, 

I'm having trouble encoding a MIME message with a binary file.  Newline characters are being interpreted even though the content is supposed to be binary. This is using Python 3.3.2

Small test case:

app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
b = io.BytesIO()
g = BytesGenerator(b)
g.flatten(app)
for i in b.getvalue()[-3:]:
    print ("%02x " % i, end="")
print ()

This prints 51 0a 51,  meaning the 0x0d character got reinterpreted as a newline. 

I've tried setting an email policy of HTTP policy, but that goes even further, converting \r to \r\n

This is for HTTP transport, so binary encoding is normal.

Any thoughts how I can do this properly?

[toc] | [next] | [standalone]

#54780

From	Chris Angelico <rosuav@gmail.com>
Date	2013-09-26 14:11 +1000
Message-ID	<mailman.335.1380168701.18130.python-list@python.org>
In reply to	#54752

On Thu, Sep 26, 2013 at 2:38 AM,  <nilsbunger@gmail.com> wrote:
> app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)

What is MIMEApplication? It's not a builtin, so your test case is
missing an import, at least. Is this email.mime.MIMEApplication?

ChrisA

[toc] | [prev] | [next] | [standalone]

#54781

From	Nils Bunger <nilsbunger@gmail.com>
Date	2013-09-25 21:23 -0700
Message-ID	<870d6a97-613d-401f-ad03-0bbd3b088538@googlegroups.com>
In reply to	#54780

Chris, 

Thanks for answering. 

Yes, it's email.mime.MIMEApplication. I've pasted a snippet with the imports below.

I'm trying to use this to build a multi-part MIME message, with this as one part. 

I really can't figure out any way to attach a binary part like this to a multi-part MIME message without the encoding issue... any help would be greatly appreciate!

Nils

---------

import io
from email.mime.application import MIMEApplication
from email.generator import BytesGenerator
from email.encoders import  encode_noop

app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
b = io.BytesIO()
g = BytesGenerator(b)
g.flatten(app)
for i in b.getvalue()[-3:]:
    print("%02x " % i, end="")
print()

On Wednesday, September 25, 2013 9:11:31 PM UTC-7, Chris Angelico wrote:
> On Thu, Sep 26, 2013 at 2:38 AM,  <nilsbunger@gmail.com> wrote:
> 
> > app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
> 
> 
> 
> What is MIMEApplication? It's not a builtin, so your test case is
> 
> missing an import, at least. Is this email.mime.MIMEApplication?
> 
> 
> 
> ChrisA

[toc] | [prev] | [next] | [standalone]

#54782

From	Chris Angelico <rosuav@gmail.com>
Date	2013-09-26 14:32 +1000
Message-ID	<mailman.336.1380170351.18130.python-list@python.org>
In reply to	#54781

On Thu, Sep 26, 2013 at 2:23 PM, Nils Bunger <nilsbunger@gmail.com> wrote:
> Yes, it's email.mime.MIMEApplication. I've pasted a snippet with the imports below.
>
> I'm trying to use this to build a multi-part MIME message, with this as one part.
>
> I really can't figure out any way to attach a binary part like this to a multi-part MIME message without the encoding issue... any help would be greatly appreciate!

I partly responded just to ping your thread, as I'm not particularly
familiar with the email.mime module. But a glance at the docs suggests
that MIMEApplication is a "subclass of MIMENonMultipart", so might it
be a problem to use that for multipart??

It's designed to handle text, so you may want to use an encoder (like
the default base64 one) rather than trying to push binary data through
it.

Random ideas, hopefully someone who actually knows the module can respond.

ChrisA

[toc] | [prev] | [next] | [standalone]

#54821

From	Neil Cerutti <neilc@norwich.edu>
Date	2013-09-26 13:41 +0000
Message-ID	<bairt0FcltgU1@mid.individual.net>
In reply to	#54782

On 2013-09-26, Chris Angelico <rosuav@gmail.com> wrote:
> On Thu, Sep 26, 2013 at 2:23 PM, Nils Bunger <nilsbunger@gmail.com> wrote:
>> Yes, it's email.mime.MIMEApplication. I've pasted a snippet
>> with the imports below.
>>
>> I'm trying to use this to build a multi-part MIME message,
>> with this as one part.
>>
>> I really can't figure out any way to attach a binary part like
>> this to a multi-part MIME message without the encoding
>> issue... any help would be greatly appreciate!
>
> I partly responded just to ping your thread, as I'm not
> particularly familiar with the email.mime module. But a glance
> at the docs suggests that MIMEApplication is a "subclass of
> MIMENonMultipart", so might it be a problem to use that for
> multipart??
>
> It's designed to handle text, so you may want to use an encoder
> (like the default base64 one) rather than trying to push binary
> data through it.
>
> Random ideas, hopefully someone who actually knows the module
> can respond.

I got interested in it since I have never used any of the
modules. So I played with it enough to discover that the part of
the code above that converts the \r to \n is the flatten call.

I got to here and RFC 2049 and gave up.

   The following guidelines may be useful to anyone devising a data
   format (media type) that is supposed to survive the widest range of
   networking technologies and known broken MTAs unscathed.  Note that
   anything encoded in the base64 encoding will satisfy these rules, but
   that some well-known mechanisms, notably the UNIX uuencode facility,
   will not.  Note also that anything encoded in the Quoted-Printable
   encoding will survive most gateways intact, but possibly not some
   gateways to systems that use the EBCDIC character set.

    (1)   Under some circumstances the encoding used for data may
          change as part of normal gateway or user agent
          operation.  In particular, conversion from base64 to
          quoted-printable and vice versa may be necessary.  This
          may result in the confusion of CRLF sequences with line
          breaks in text bodies.  As such, the persistence of
          CRLF as something other than a line break must not be
          relied on.

    (2)   Many systems may elect to represent and store text data
          using local newline conventions.  Local newline
          conventions may not match the RFC822 CRLF convention --
          systems are known that use plain CR, plain LF, CRLF, or
          counted records.  The result is that isolated CR and LF
          characters are not well tolerated in general; they may
          be lost or converted to delimiters on some systems, and
          hence must not be relied on.

So putting a raw CR in a binary chunk maybe be intolerable, and
you need to use a different encoder. But I'm out of my element.

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]

#54836

From	Nils Bunger <nilsbunger@gmail.com>
Date	2013-09-26 08:56 -0700
Message-ID	<defb0c5f-705a-427c-84c9-83e3bab573e3@googlegroups.com>
In reply to	#54821

Hi Neil, 

Thanks for looking at this.  

I'm trying to create a multipart MIME for an HTTP POST request, not an email.  This is for a third-party API that requires a multipart POST with a binary file, so I don't have the option to just use a different encoding.

Multipart HTTP is standardized in HTTP 1.0 and supports binary parts. Also, no one will re-interpret contents of HTTP on the wire, as binary is quite normal in HTTP.

The issue seems to be some parts of the python MIME encoder still assume it's for email only, where everything would be b64 encoded.

Maybe I have to roll my own to create a multipart msg with a binary file? I was hoping to avoid that.

Nils

ps. You probably know this, but in case anyone else reads this thread, HTTP requires all headers to have CRLF, not native line endings. The python MIME modules can do that properly as of python 3.2 (fixed as of this bug http://hg.python.org/cpython/rev/ebf6741a8d6e/)



> 
> I got interested in it since I have never used any of the
> 
> modules. So I played with it enough to discover that the part of
> 
> the code above that converts the \r to \n is the flatten call.
> 
> 
> 
> I got to here and RFC 2049 and gave up.
> 
> 
> 
>    The following guidelines may be useful to anyone devising a data
> 
>    format (media type) that is supposed to survive the widest range of
> 
>    networking technologies and known broken MTAs unscathed.  Note that
> 
>    anything encoded in the base64 encoding will satisfy these rules, but
> 
>    that some well-known mechanisms, notably the UNIX uuencode facility,
> 
>    will not.  Note also that anything encoded in the Quoted-Printable
> 
>    encoding will survive most gateways intact, but possibly not some
> 
>    gateways to systems that use the EBCDIC character set.
> 
> 
> 
>     (1)   Under some circumstances the encoding used for data may
> 
>           change as part of normal gateway or user agent
> 
>           operation.  In particular, conversion from base64 to
> 
>           quoted-printable and vice versa may be necessary.  This
> 
>           may result in the confusion of CRLF sequences with line
> 
>           breaks in text bodies.  As such, the persistence of
> 
>           CRLF as something other than a line break must not be
> 
>           relied on.
> 
> 
> 
>     (2)   Many systems may elect to represent and store text data
> 
>           using local newline conventions.  Local newline
> 
>           conventions may not match the RFC822 CRLF convention --
> 
>           systems are known that use plain CR, plain LF, CRLF, or
> 
>           counted records.  The result is that isolated CR and LF
> 
>           characters are not well tolerated in general; they may
> 
>           be lost or converted to delimiters on some systems, and
> 
>           hence must not be relied on.
> 
> 
> 
> So putting a raw CR in a binary chunk maybe be intolerable, and
> 
> you need to use a different encoder. But I'm out of my element.
> 
> 
> 
> -- 
> 
> Neil Cerutti

[toc] | [prev] | [next] | [standalone]

#54848

From	Piet van Oostrum <piet@vanoostrum.org>
Date	2013-09-26 14:44 -0400
Message-ID	<m2eh8bifsh.fsf@cochabamba.vanoostrum.org>
In reply to	#54836

Nils Bunger <nilsbunger@gmail.com> writes:

> Hi Neil, 
>
> Thanks for looking at this.  
>
> I'm trying to create a multipart MIME for an HTTP POST request, not an
> email. This is for a third-party API that requires a multipart POST
> with a binary file, so I don't have the option to just use a different
> encoding.
>
> Multipart HTTP is standardized in HTTP 1.0 and supports binary parts.
> Also, no one will re-interpret contents of HTTP on the wire, as binary
> is quite normal in HTTP.
>
> The issue seems to be some parts of the python MIME encoder still
> assume it's for email only, where everything would be b64 encoded.
>
> Maybe I have to roll my own to create a multipart msg with a binary
> file? I was hoping to avoid that.

The email MIME stuff is not really adapted for HTTP. I would advise to
use the Requests package (http://docs.python-requests.org/en/latest/) or
the Uploading Files part from Doug Hellmann's page
(http://doughellmann.com/2009/07/pymotw-urllib2-library-for-opening-urls.html).
This is for Python2; I can send you a Python3 version if you want.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [next] | [standalone]

#54838

From	Nils Bunger <nilsbunger@gmail.com>
Date	2013-09-26 09:35 -0700
Message-ID	<a675d865-8961-49a5-b70f-0d795b353aa2@googlegroups.com>
In reply to	#54752

Hi all, 

I was able to workaround this problem by encoding a unique 'marker' in the binary part, then replacing the marker with the actual binary content after generating the MIME message. 

See my answer on Stack Overflow http://stackoverflow.com/a/19033750/526098 for the code.

Thanks, your suggestions helped me think of this.

Nils

On Wednesday, September 25, 2013 9:38:17 AM UTC-7, Nils Bunger wrote:
> Hi, 
> 
> 
> 
> I'm having trouble encoding a MIME message with a binary file.  Newline characters are being interpreted even though the content is supposed to be binary. This is using Python 3.3.2
> 
> 
> 
> Small test case:
> 
> 
> 
> app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
> 
> b = io.BytesIO()
> 
> g = BytesGenerator(b)
> 
> g.flatten(app)
> 
> for i in b.getvalue()[-3:]:
> 
>     print ("%02x " % i, end="")
> 
> print ()
> 
> 
> 
> This prints 51 0a 51,  meaning the 0x0d character got reinterpreted as a newline. 
> 
> 
> 
> I've tried setting an email policy of HTTP policy, but that goes even further, converting \r to \r\n
> 
> 
> 
> This is for HTTP transport, so binary encoding is normal.
> 
> 
> 
> Any thoughts how I can do this properly?

[toc] | [prev] | [standalone]

csiph-web

Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2

Contents

#54752 — Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2

#54780

#54781

#54782

#54821

#54836

#54848

#54838