Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #56389
| Date | 2013-10-08 14:20 +0200 |
|---|---|
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
| Subject | parsing email from stdin |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.851.1381234806.18130.python-list@python.org> (permalink) |
I want to do some postprocessing on messages from a particular mailbox.
So I use getmail which will fetch the messages and feed them to stdin
of my program.
As I don't know what encoding these messages will be in, I thought it
would be prudent to read stdin as binary data.
Using python 3.3 on a debian box I have the following code.
#!/usr/bin/python3
import sys
from email import message_from_file
sys.stdin = sys.stdin.detach()
msg = message_from_file(sys.stdin)
which gives me the following trace back
File "/home/apardon/.getmail/verdeler", line 7, in <module>
msg = message_from_file(sys.stdin)
File "/usr/lib/python3.3/email/__init__.py", line 56, in message_from_file
return Parser(*args, **kws).parse(fp)
File "/usr/lib/python3.3/email/parser.py", line 58, in parse
feedparser.feed(data)
File "/usr/lib/python3.3/email/feedparser.py", line 167, in feed
self._input.push(data)
File "/usr/lib/python3.3/email/feedparser.py", line 100, in push
data, self._partial = self._partial + data, ''
TypeError: Can't convert 'bytes' object to str implicitly))
which seems to be rather odd. The following header are in the msg:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
So why doesn't the email parser lookup the charset and use that
for converting to string type?
What is the canonical way to parse an email message from stdin?
--
Antoon Pardon
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
parsing email from stdin Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-10-08 14:20 +0200
csiph-web