Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #4072

Reading Huge UnixMailbox Files

Date 2011-04-26 15:39 -0400
From Brandon McGinty <brandon.mcginty@gmail.com>
Subject Reading Huge UnixMailbox Files
Newsgroups comp.lang.python
Message-ID <mailman.866.1303846801.9059.python-list@python.org> (permalink)

Show all headers | View raw


List,
I'm trying to import hundreds of thousands of e-mail messages into a
database with Python.
However, some of these mailboxes are so large that they are giving
errors when being read with the standard mailbox module.
I created a buffered reader, that reads chunks of the mailbox, splits
them using the re.split function with a compiled regexp, and imports
each chunk as a message.
The regular expression work is where the bottle-neck appears to be,
based on timings.
I'm wondering if there is a faster way to do this, or some other method
that you all would recommend.

Brandon McGinty

Back to comp.lang.python | Previous | NextNext in thread | Find similar


Thread

Reading Huge UnixMailbox Files Brandon McGinty <brandon.mcginty@gmail.com> - 2011-04-26 15:39 -0400
  Re: Reading Huge UnixMailbox Files Nobody <nobody@nowhere.com> - 2011-04-26 21:23 +0100
    Re: Reading Huge UnixMailbox Files Dan Stromberg <drsalists@gmail.com> - 2011-04-26 14:02 -0700
      Re: Reading Huge UnixMailbox Files Nobody <nobody@nowhere.com> - 2011-04-27 13:52 +0100

csiph-web