Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #77540

Re: Best way to filter parts of a email.message.Message

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python.list@tim.thechases.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.014
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'digest': 0.04; 'nested': 0.07; 'tests.': 0.07; 'indeed,': 0.09; 'cc:addr:python-list': 0.11; '-tkc': 0.16; 'any.': 0.16; 'bug,': 0.16; 'delimiter': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'simpson': 0.16; 'undesirable': 0.16; 'wanted.': 0.16; 'wrote:': 0.18; 'thu,': 0.19; '>>>': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'url:bugs': 0.29; 'tim': 0.29; "doesn't": 0.30; 'chase': 0.31; 'sep': 0.31; 'file': 0.32; 'this.': 0.32; 'supposed': 0.32; 'quite': 0.32; 'url:python': 0.33; 'checking': 0.33; 'actual': 0.34; 'there': 0.35; "didn't": 0.36; 'charset:us-ascii': 0.36; 'thanks': 0.36; 'possible': 0.36; 'url:org': 0.36; 'turn': 0.37; 'list': 0.37; 'handle': 0.38; 'pm,': 0.38; 'previous': 0.38; 'highest': 0.39; 'skip:. 10': 0.39; 'mailing': 0.39; 'occur': 0.65; 'to:addr:gmail.com': 0.65; 'here': 0.66; 'goal': 0.75; 'detecting': 0.84; 'fidelity': 0.84; 'received:50.22': 0.84; 'subject:Best': 0.91; 'thoughts,': 0.91
Date Thu, 4 Sep 2014 05:45:03 -0500
From Tim Chase <python.list@tim.thechases.com>
To Chris Angelico <rosuav@gmail.com>
Subject Re: Best way to filter parts of a email.message.Message
In-Reply-To <CAPTjJmqUU8BTZeGaUnGLqSt_QbW0pN70=pDRGL4vOAYj+2VPhA@mail.gmail.com>
References <20140903205958.119351c8@bigbox.christie.dr> <20140904035240.GA75238@cskk.homeip.net> <CAPTjJmqUU8BTZeGaUnGLqSt_QbW0pN70=pDRGL4vOAYj+2VPhA@mail.gmail.com>
X-Mailer Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-pc-linux-gnu)
Mime-Version 1.0
Content-Type text/plain; charset=US-ASCII
Content-Transfer-Encoding 7bit
X-AntiAbuse This header was added to track abuse, please include it with any abuse report
X-AntiAbuse Primary Hostname - boston.accountservergroup.com
X-AntiAbuse Original Domain - python.org
X-AntiAbuse Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse Sender Address Domain - tim.thechases.com
X-Get-Message-Sender-Via boston.accountservergroup.com: authenticated_id: tim@thechases.com
Cc "python-list@python.org" <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.13763.1409827608.18130.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1409827608 news.xs4all.nl 2845 [2001:888:2000:d::a6]:56770
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:77540

Show key headers only | View raw


On 2014-09-04 14:08, Chris Angelico wrote:
> On Thu, Sep 4, 2014 at 1:52 PM, Cameron Simpson wrote:
>> On 03Sep2014 20:59, Tim Chase wrote:
>>> - mime-parts can be nested, so I need to recursively handle them
>>
>> Just to this. IIRC, the MIME part delimiter is supposed to be
>> absolute. That is, it will not occur in the nested subparts, if
>> any.
>
> I think the point here is that the validity check of a mime-part may
> involve checking sub-parts - eg the message is a mailing list digest
> (with one part per actual message), and some of those have
> attachments, which are to be stripped off if over 1MB.

Indeed, ChrisA divined my intent--an attachment remover based on
various types of mime-type and size tests.  A cursory search didn't
turn up any pre-existing utilities that did quite what I wanted.

Perhaps my previous message about detecting mbox-vs-maildir-vs-mh
format mailboxes hinted at this.  Thanks to all who answered there
(though I encountered a possible bug, as Claws Mail doesn't put
a .mh_sequences file in what it claims are MH local folders which
causes issues with MH() http://bugs.python.org/issue22319 ).

So the goal is to keep the original message with highest fidelity
while still stripping out the undesirable attachments.

Thanks for your thoughts,

-tkc

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Best way to filter parts of a email.message.Message Tim Chase <python.list@tim.thechases.com> - 2014-09-04 05:45 -0500

csiph-web