Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #8759 > unrolled thread

HeaderParseError

Started byThomas Guettler <hv@tbz-pariv.de>
First post2011-07-04 10:31 +0200
Last post2011-07-05 14:02 +0200
Articles 5 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  HeaderParseError Thomas Guettler <hv@tbz-pariv.de> - 2011-07-04 10:31 +0200
    Re: HeaderParseError Peter Otten <__peter__@web.de> - 2011-07-04 11:51 +0200
      Re: HeaderParseError Thomas Guettler <hv@tbz-pariv.de> - 2011-07-04 12:38 +0200
        Re: HeaderParseError Peter Otten <__peter__@web.de> - 2011-07-04 13:20 +0200
          Re: HeaderParseError Thomas Guettler <hv@tbz-pariv.de> - 2011-07-05 14:02 +0200

#8759 — HeaderParseError

FromThomas Guettler <hv@tbz-pariv.de>
Date2011-07-04 10:31 +0200
SubjectHeaderParseError
Message-ID<97dc37F7gaU1@mid.individual.net>
Hi,

I get a HeaderParseError during decode_header(), but Thunderbird can
display the name.

>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError


How can I parse this in Python?

  Thomas

Same question on Stackoverflow:
http://stackoverflow.com/questions/6568596/headerparseerror-in-python

-- 
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

[toc] | [next] | [standalone]


#8765

FromPeter Otten <__peter__@web.de>
Date2011-07-04 11:51 +0200
Message-ID<ius2f9$e1$1@solani.org>
In reply to#8759
Thomas Guettler wrote:

> I get a HeaderParseError during decode_header(), but Thunderbird can
> display the name.
> 
>>>> from email.header import decode_header
>>>> 
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
>     raise HeaderParseError
> email.errors.HeaderParseError
> 
> 
> How can I parse this in Python?

Trying to decode as much as possible:

>>> s = "QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?="
>>> for n in range(len(s), 0, -1):
...     try: t = s[:n].decode("base64")
...     except: pass
...     else: break
...
>>> n, t
(49, 'Anmeldung Netzanschluss S\x19\x1c\x9a[\x99\xcc\xdc\x0b\x9a\x9c\x19')
>>> print t.decode("iso-8859-1")
Anmeldung Netzanschluss S[ÌÜ

>>> s[n:]
'w==?='

The characters after "...Netzanschluss " look like garbage. What does 
Thunderbird display?

[toc] | [prev] | [next] | [standalone]


#8766

FromThomas Guettler <hv@tbz-pariv.de>
Date2011-07-04 12:38 +0200
Message-ID<97djhfF10kU1@mid.individual.net>
In reply to#8765
On 04.07.2011 11:51, Peter Otten wrote:
> Thomas Guettler wrote:
> 
>> I get a HeaderParseError during decode_header(), but Thunderbird can
>> display the name.
>>
>>>>> from email.header import decode_header
>>>>>
> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
>>     raise HeaderParseError
>> email.errors.HeaderParseError
>>
>>
>> How can I parse this in Python?
> 
> Trying to decode as much as possible:
> 
>>>> s = "QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?="
>>>> for n in range(len(s), 0, -1):
> ...     try: t = s[:n].decode("base64")
> ...     except: pass
> ...     else: break
> ...
>>>> n, t
> (49, 'Anmeldung Netzanschluss S\x19\x1c\x9a[\x99\xcc\xdc\x0b\x9a\x9c\x19')
>>>> print t.decode("iso-8859-1")
> Anmeldung Netzanschluss S[ÌÜ
> 
>>>> s[n:]
> 'w==?='
> 
> The characters after "...Netzanschluss " look like garbage. What does 
> Thunderbird display?

Hi Peter, Thunderbird shows this:

Anmeldung Netzanschluss Südring3p.jpg

  Thomas

-- 
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

[toc] | [prev] | [next] | [standalone]


#8767

FromPeter Otten <__peter__@web.de>
Date2011-07-04 13:20 +0200
Message-ID<mailman.595.1309778458.1164.python-list@python.org>
In reply to#8766
Thomas Guettler wrote:

> On 04.07.2011 11:51, Peter Otten wrote:
>> Thomas Guettler wrote:
>> 
>>> I get a HeaderParseError during decode_header(), but Thunderbird can
>>> display the name.
>>>
>>>>>> from email.header import decode_header
>>>>>>
>> 
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "/usr/lib64/python2.6/email/header.py", line 101, in
>>>   decode_header
>>>     raise HeaderParseError
>>> email.errors.HeaderParseError

>> The characters after "...Netzanschluss " look like garbage. What does
>> Thunderbird display?
> 
> Hi Peter, Thunderbird shows this:
> 
> Anmeldung Netzanschluss Südring3p.jpg

>>> a = u"Anmeldung Netzanschluss 
Südring3p.jpg".encode("iso-8859-1").encode("base64")

>>> b = "QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?="
>>> for i, (x, y) in enumerate(zip(a, b)):
...     if x != y: print i, x, y
...
33 / _
52
?
>>> b.decode("base64")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/encodings/base64_codec.py", line 42, in 
base64_decode
    output = base64.decodestring(input)
  File "/usr/lib/python2.6/base64.py", line 321, in decodestring
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
>>> b.replace("_", "/").decode("base64")
'Anmeldung Netzanschluss S\xfcdring3p.jpg'

Looks like you encountered a variant of base64 that uses "_" instead of "/" 
for chr(63). The wikipedia page http://en.wikipedia.org/wiki/Base64
calls that base64url.

You could try and make the email package accept that with a monkey patch 
like the following:

#untested
import binascii
def a2b_base64(s):
    return binascii.a2b_base64(s.replace("_", "/"))

from email import base64mime
base64mime.a2b_base64 = a2b_base64

Alternatively monkey-patch the binascii module before you import the email 
package.

[toc] | [prev] | [next] | [standalone]


#8821

FromThomas Guettler <hv@tbz-pariv.de>
Date2011-07-05 14:02 +0200
Message-ID<97gcqoFfd1U1@mid.individual.net>
In reply to#8767
On 04.07.2011 13:20, Peter Otten wrote:
> Thomas Guettler wrote:
> 
>> On 04.07.2011 11:51, Peter Otten wrote:
>>> Thomas Guettler wrote:
>>>
>>>> I get a HeaderParseError during decode_header(), but Thunderbird can
>>>> display the name.
>>>>
>>>>>>> from email.header import decode_header
>>>>>>>
>>>

Hi,

I created a ticket: http://bugs.python.org/issue12489

  Thomas Güttler


-- 
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web