Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #77355

Re: Distinguishing between maildir, mbox, and MH files/directories?

References <20140831134525.25a4321e@bigbox.christie.dr>
Date 2014-08-31 13:59 -0700
Subject Re: Distinguishing between maildir, mbox, and MH files/directories?
From Dan Stromberg <drsalists@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.13664.1409518750.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Sun, Aug 31, 2014 at 11:45 AM, Tim Chase
<python.list@tim.thechases.com> wrote:
> Tinkering around with a little script, I found myself with the need
> to walk a directory tree and process mail messaged found within.
> Sometimes these end up being mbox files (with multiple messages
> within), sometimes it's a Maildir structure with messages in each
> individual file and extra holding directories, and sometimes it's a
> MH directory.  To complicate matters, there's also the possibility of
> non-{mbox,maildir,mh) files such as binary MUA caches appearing
> alongside these messages.
>
> Python knows how to handle each just fine as long as I tell it what
> type of file to expect.  But is there a straight-forward way to
> distinguish them?  (FWIW, the *nix "file" utility is just reporting
> "ASCII text", sometimes "with very long lines", and sometimes
> erroneously flags them as C or C++ files‽).
>
> All I need is "is it maildir, mbox, mh, or something else" (I don't
> have to get more complex for the "something else") inside an os.walk
> loop.

If you find a directory full of numbered files (and optionally,
numbered filenames preceded by commas), that's probably an MH folder.
I don't like regexes that much, but I'd probably use one for this.

If you find a directory full of Maildir-style files, that's probably
Maildir.  You could probably match this with a regex too.

If you find a file with lots of '^From " in it, that's probably an
mbox file.  However, you could have an mbox file with only one '^From
', so watch out.

This will probably give some false postives and/or false negatives,
depending on your data, but perhaps it's easier than classifying
things manually.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Distinguishing between maildir, mbox, and MH files/directories? Dan Stromberg <drsalists@gmail.com> - 2014-08-31 13:59 -0700

csiph-web