Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #77381

Re: Distinguishing between maildir, mbox, and MH files/directories?

Date 2014-09-01 12:07 +1000
From Cameron Simpson <cs@zip.com.au>
Subject Re: Distinguishing between maildir, mbox, and MH files/directories?
References <20140831134525.25a4321e@bigbox.christie.dr>
Newsgroups comp.lang.python
Message-ID <mailman.13675.1409537281.18130.python-list@python.org> (permalink)

Show all headers | View raw


On 31Aug2014 13:45, Tim Chase <python.list@tim.thechases.com> wrote:
>Tinkering around with a little script, I found myself with the need
>to walk a directory tree and process mail messaged found within.
>Sometimes these end up being mbox files (with multiple messages
>within), sometimes it's a Maildir structure with messages in each
>individual file and extra holding directories, and sometimes it's a
>MH directory.  To complicate matters, there's also the possibility of
>non-{mbox,maildir,mh) files such as binary MUA caches appearing
>alongside these messages.
>
>Python knows how to handle each just fine as long as I tell it what
>type of file to expect.  But is there a straight-forward way to
>distinguish them?  (FWIW, the *nix "file" utility is just reporting
>"ASCII text", sometimes "with very long lines", and sometimes
>erroneously flags them as C or C++ files‽).
>
>All I need is "is it maildir, mbox, mh, or something else" (I don't
>have to get more complex for the "something else") inside an os.walk
>loop.

Here is my code for these tests:

     def ismhdir(path):
       ''' Test if `path` points at an MH directory.
       '''
       return os.path.isfile(os.path.join(path, '.mh_sequences'))

     def ismaildir(path):
       ''' Test if `path` points at a Maildir directory.
       '''
       for subdir in ('new', 'cur', 'tmp'):
         if not os.path.isdir(os.path.join(path,subdir)):
           return False
       return True

     def ismbox(path):
       ''' Open path and check that its first line begins with "From ".
       '''
       fp=None
       try:
         fp=open(path)
         from_ = fp.read(5)
       except IOError:
         if fp is not None:
           fp.close()
         return False
       fp.close()
       return from_ == 'From '

I would use these is code somewhat like this (imagining your use case):

   if ismaildir(path):
     ...
   elif ismhdir(path):
     ...
   elif ismbox(path):
     ...
   else:
     reject other known special files here
     continue traversing downward otherwise

Cheers,
Cameron Simpson <cs@zip.com.au>

Gabriel Genellina: See PEP 234 http://www.python.org/dev/peps/pep-0234/
Angus Rodgers:
   You've got to love a language whose documentation contains sentences
   beginning like this:
     "Among its chief virtues are the following four -- no, five -- no,
     six -- points: [...]"
from python-list@python.org

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Distinguishing between maildir, mbox, and MH files/directories? Cameron Simpson <cs@zip.com.au> - 2014-09-01 12:07 +1000

csiph-web