Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #60498
| From | Florian Lindner <mailinglists@xgm.de> |
|---|---|
| Subject | Flattening an email message |
| Date | 2013-11-26 11:51 +0100 |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.3225.1385463445.18130.python-list@python.org> (permalink) |
Hello,
I want to use some machine learning stuff on mail messages. First step is get
some flattened text from a mail message, python's email package does not work
as automatically as I wish. Right now I have:
> def mail_preprocessor(str):
> msg = email.message_from_string(str)
> msg_body = ""
>
> for part in msg.walk():
> if part.get_content_type() == "text/plain":
> msg_body += part.get_payload(decode=True)
>
> msg_body = msg_body.lower()
> msg_body = msg_body.replace("\n", " ")
> msg_body = msg_body.replace("\t", " ")
> return msg_body
For getting a text from html I could use BeautifulSoup. Right now I'm still a
layer down (encapsulation etc.) at RFC 2822 stuff.
Does anybody knows about some package or code I can throw an email message at
and get some kind of text from it? Attachments being discarded, HTML I can
take care of...
Thanks!
Florian
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Flattening an email message Florian Lindner <mailinglists@xgm.de> - 2013-11-26 11:51 +0100
csiph-web