Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #60498 > unrolled thread

Flattening an email message

Started byFlorian Lindner <mailinglists@xgm.de>
First post2013-11-26 11:51 +0100
Last post2013-11-26 11:51 +0100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python


Contents

  Flattening an email message Florian Lindner <mailinglists@xgm.de> - 2013-11-26 11:51 +0100

#60498 — Flattening an email message

FromFlorian Lindner <mailinglists@xgm.de>
Date2013-11-26 11:51 +0100
SubjectFlattening an email message
Message-ID<mailman.3225.1385463445.18130.python-list@python.org>
Hello,

I want to use some machine learning stuff on mail messages. First step is get 
some flattened text from a mail message, python's email package does not work 
as automatically as I wish. Right now I have:

> def mail_preprocessor(str):
>     msg = email.message_from_string(str)
>     msg_body = ""
>     
>     for part in msg.walk():
>         if part.get_content_type() == "text/plain":
>             msg_body += part.get_payload(decode=True)
>     
>     msg_body = msg_body.lower()
>     msg_body = msg_body.replace("\n", " ")
>     msg_body = msg_body.replace("\t", " ")
>     return msg_body

For getting a text from html I could use BeautifulSoup. Right now I'm still a 
layer down (encapsulation etc.) at RFC 2822 stuff. 

Does anybody knows about some package or code I can throw an email message at 
and get some kind of text from it? Attachments being discarded, HTML I can 
take care of...

Thanks!

Florian

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web