Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #77358

Re: Distinguishing between maildir, mbox, and MH files/directories?

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'classes,': 0.05; 'messages.': 0.05; 'tree': 0.05; 'binary': 0.07; 'assuming': 0.09; 'complicate': 0.09; 'method:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:files': 0.09; 'try:': 0.09; 'python': 0.11; 'jan': 0.12; 'assume': 0.14; '"is': 0.16; 'caches': 0.16; 'else"': 0.16; 'enough.': 0.16; 'erroneously': 0.16; 'loop.': 0.16; 'maildir': 0.16; 'matters,': 0.16; 'os.walk': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'script,': 0.16; 'subject:between': 0.16; 'text",': 0.16; 'two,': 0.16; 'wrote:': 0.18; 'header:User- Agent:1': 0.23; 'directory.': 0.24; 'parse': 0.24; 'fine': 0.24; 'class.': 0.26; 'pass': 0.26; 'subject:/': 0.26; 'header:X -Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'possibility': 0.29; 'reporting': 0.29; 'tim': 0.29; 'chase': 0.31; 'flags': 0.31; 'raised': 0.31; 'them?': 0.31; 'file': 0.32; 'could': 0.34; 'except': 0.35; 'knows': 0.35; 'something': 0.35; 'but': 0.35; 'there': 0.35; 'c++': 0.36; 'format.': 0.36; 'module.': 0.36; 'doing': 0.36; 'subject:?': 0.36; 'two': 0.37; 'being': 0.38; 'sometimes': 0.38; 'handle': 0.38; 'to:addr:python- list': 0.38; 'files': 0.38; 'pm,': 0.38; 'little': 0.38; 'received:71': 0.39; 'structure': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'how': 0.40; 'tell': 0.60; 'simple': 0.61; 'first': 0.61; 'such': 0.63; 'myself': 0.63; 'more': 0.64; 'holding': 0.65; 'within': 0.65; 'mailbox': 0.68; 'walk': 0.74; 'alongside': 0.84; 'detecting': 0.84; 'directories,': 0.84; 'distinguish': 0.84; 'received:fios.verizon.net': 0.84; 'hand,': 0.93
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Terry Reedy <tjreedy@udel.edu>
Subject Re: Distinguishing between maildir, mbox, and MH files/directories?
Date Sun, 31 Aug 2014 17:02:32 -0400
References <20140831134525.25a4321e@bigbox.christie.dr>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding quoted-printable
X-Gmane-NNTP-Posting-Host pool-71-175-90-87.phlapa.fios.verizon.net
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
In-Reply-To <20140831134525.25a4321e@bigbox.christie.dr>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.13665.1409518971.18130.python-list@python.org> (permalink)
Lines 50
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1409518971 news.xs4all.nl 2947 [2001:888:2000:d::a6]:54669
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:77358

Show key headers only | View raw


On 8/31/2014 2:45 PM, Tim Chase wrote:
> Tinkering around with a little script, I found myself with the need
> to walk a directory tree and process mail messaged found within.
> Sometimes these end up being mbox files (with multiple messages
> within), sometimes it's a Maildir structure with messages in each
> individual file and extra holding directories, and sometimes it's a
> MH directory.  To complicate matters, there's also the possibility of
> non-{mbox,maildir,mh) files such as binary MUA caches appearing
> alongside these messages.

I know nothing about the format within such file but will make a couple 
of assumptions.

> Python knows how to handle each just fine as long as I tell it what
> type of file to expect.

By instantiating mailbox.mbox or mailbox.Maildir

>  But is there a straight-forward way to
> distinguish them?  (FWIW, the *nix "file" utility is just reporting
> "ASCII text", sometimes "with very long lines", and sometimes
> erroneously flags them as C or C++ files‽).

> All I need is "is it maildir, mbox, mh, or something else" (I don't
> have to get more complex for the "something else") inside an os.walk
> loop.

Simple method: try to parse with mbox and then Maildir and if either 
succeeds, assume that the file was in the corresponding format.

try:
   <process as mbox>
except mailbox.FormatError:
   try:
     <process as Maildir>
   except mailbox.FormatError:
     pass

If a format is detectable in the first line or two, you could try to 
write a detect(path) that would return the corresponding class.  That 
would perhaps be a good addition to the mailbox module.  On the other 
hand, if you are interested in just those two classes, and not any of 
the Maildir subclasses, the above might be good enough. I am assuming 
that FormatError is raised without reading the whole file and doing a 
lot else before detecting the mismatch.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Distinguishing between maildir, mbox, and MH files/directories? Terry Reedy <tjreedy@udel.edu> - 2014-08-31 17:02 -0400

csiph-web