Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100169 > unrolled thread

Getting data out of Mozilla Thunderbird with Python?

Started byAnthony Papillion <anthony@cajuntechie.org>
First post2015-12-08 12:21 -0600
Last post2015-12-10 14:26 +0100
Articles 11 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Getting data out of Mozilla Thunderbird with Python? Anthony Papillion <anthony@cajuntechie.org> - 2015-12-08 12:21 -0600
    Re: Getting data out of Mozilla Thunderbird with Python? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-12-08 19:42 +0100
      Re: Getting data out of Mozilla Thunderbird with Python? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-08 20:31 +0000
      META email [was Re: Getting data out of Mozilla Thunderbird with Python?] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-12-09 18:43 +1100
    Re: Getting data out of Mozilla Thunderbird with Python? Christian Gollwitzer <auriocus@gmx.de> - 2015-12-09 09:03 +0100
      Re: Getting data out of Mozilla Thunderbird with Python? Steven D'Aprano <steve@pearwood.info> - 2015-12-09 22:11 +1100
        Re: Getting data out of Mozilla Thunderbird with Python? srinivas devaki <mr.eightnoteight@gmail.com> - 2015-12-09 19:36 +0530
        Re: Getting data out of Mozilla Thunderbird with Python? Chris Angelico <rosuav@gmail.com> - 2015-12-10 01:15 +1100
        Re: Getting data out of Mozilla Thunderbird with Python? Grant Edwards <invalid@invalid.invalid> - 2015-12-09 17:25 +0000
        Re: Getting data out of Mozilla Thunderbird with Python? Michael Torrie <torriem@gmail.com> - 2015-12-09 23:23 -0700
          Re: Getting data out of Mozilla Thunderbird with Python? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-12-10 14:26 +0100

#100169 — Getting data out of Mozilla Thunderbird with Python?

FromAnthony Papillion <anthony@cajuntechie.org>
Date2015-12-08 12:21 -0600
SubjectGetting data out of Mozilla Thunderbird with Python?
Message-ID<mailman.74.1449598912.12405.python-list@python.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Everyone,

I have a TON of email (years) stored in my Thunderbird. My backup
strategy for the last few years has been to periodically dump it all
in a tar file, encrypt that tar file, and move it up to the cloud.
That way, if my machine ever crashes, I don't lose years of email.

But I've been thinking about bringing Python into the mix to build a
bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
all mail would be backed up to a database where I could run analytics
against it and search it more effectively.

I'm looking for a way to get at the mail stored in Thunderbird using
Python and, so far, I can't find anything. I did find the mozmail
package but it seems to be geared more towards testing and not really
the kind of use I need.

Can anyone suggest anything?

Many Thanks,
Anthony Papillion

- -- 
Phone:          1.845.666.1114
Skype:          cajuntechie
PGP Key:        0x028ADF7453B04B15
Fingerprint:    C5CE E687 DDC2 D12B 9063  56EA 028A DF74 53B0 4B15

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJWZx+3AAoJEAKK33RTsEsVVa8QAKf1AmFdJsi4/b08vpkfwP3c
akGV98EuZzEva29jr8nnfXGgqw7xD/nDjMyLzuO0/q4Kn7eKpEnxkcGDLSbDgxaW
O8kD5eALHCVlUp9p/h7RMBBAyZ4mH8YC6qwvd5SWtH0TIMR7ClcWmDYwPF1Ahk7n
NAFvTsMl8PSnhcIoWHE4vebN4wHR8gZAxOLI8WVPA2BbER64EXiL00nWBav6UDN5
NUosAAVa549rrH0ibEf7Lada63DRTHCYnESxNIkAAHIO0z69WjnfZQ8gmmGFhuaW
AZzqYV5pIhdRnvrwjCQ06LtUNtz/qPqLbLSWF0hA6lwPKqzNum9EdvS4c1xjcXsU
KpOCTmJXy40x1Oi8h+yT6PGiDxt5VCHCdN8ppToI3HY5pYmoiPgWszJzrqYMz7hz
ruhNFAksKNUSI9QQupYcPw6oKQdnoGWmBH1yvGlZqeZuIxhGEv87oqRISE4NRQLe
yL4aDebwXdDgBzIZvFOFy2W4L43jdravg2/LliSC18iCUKBnIpWhazy7NZHw6h55
h3QP84DeuB/9tPLQUZF+BEJm3I+V8WfSKVVnsSbk/n/chHgYpWnu+h/wpD6lx43x
y0lPJm0ni5LeQM1bK4TsIXVEAOzl8UaOwn/VUG7P6Jnt6VEqvQutWZ0/WEeP1nIX
M7+e9hLlQWtlEbl6ud1K
=Dz7N
-----END PGP SIGNATURE-----

[toc] | [next] | [standalone]


#100170

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-12-08 19:42 +0100
Message-ID<4306369.OX057VMHEH@PointedEars.de>
In reply to#100169
Anthony Papillion wrote:

> -----BEGIN PGP SIGNED MESSAGE-----

Please don’t do that again.

> I have a TON of email (years) stored in my Thunderbird. My backup
> strategy for the last few years has been to periodically dump it all
> in a tar file, encrypt that tar file, and move it up to the cloud.
> That way, if my machine ever crashes, I don't lose years of email.
> 
> But I've been thinking about bringing Python into the mix to build a
> bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
> all mail would be backed up to a database where I could run analytics
> against it and search it more effectively.
> 
> I'm looking for a way to get at the mail stored in Thunderbird using
> Python and, so far, I can't find anything. I did find the mozmail
> package but it seems to be geared more towards testing and not really
> the kind of use I need.
> 
> Can anyone suggest anything?

Yes.

(Please never ask that question again:
<http://www.catb.org/~esr/faqs/smart-questions.html>)











Thunderbird uses the mbox format to store both e-mails and news messages.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#100174

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-12-08 20:31 +0000
Message-ID<mailman.77.1449606906.12405.python-list@python.org>
In reply to#100170
On 08/12/2015 18:42, Thomas 'PointedEars' Lahn wrote:
> Anthony Papillion wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>
> Please don’t do that again.
>

Says who?

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#100186 — META email [was Re: Getting data out of Mozilla Thunderbird with Python?]

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-12-09 18:43 +1100
SubjectMETA email [was Re: Getting data out of Mozilla Thunderbird with Python?]
Message-ID<5667dba7$0$14486$c3e8da3@news.astraweb.com>
In reply to#100170
On Wednesday 09 December 2015 05:42, Thomas 'PointedEars' Lahn wrote:

[snip]

Thomas, your sig says:

    Please do not cc me. / Bitte keine Kopien per E-Mail.

but you have a Reply-To set. That implies that you want replies to be sent 
directly to you by email, not to the list or newsgroup. Is that really what 
you want? That seems incompatible with your signature. Which is correct?



-- 
Steve

[toc] | [prev] | [next] | [standalone]


#100187

FromChristian Gollwitzer <auriocus@gmx.de>
Date2015-12-09 09:03 +0100
Message-ID<n48n4e$v96$1@dont-email.me>
In reply to#100169
Am 08.12.15 um 19:21 schrieb Anthony Papillion:
> I have a TON of email (years) stored in my Thunderbird. My backup
> strategy for the last few years has been to periodically dump it all
> in a tar file, encrypt that tar file, and move it up to the cloud.
> That way, if my machine ever crashes, I don't lose years of email.
>
> But I've been thinking about bringing Python into the mix to build a
> bridge between Thunderbird and SQLite or MySQL (probably sqlite) where
> all mail would be backed up to a database where I could run analytics
> against it and search it more effectively.
>
> I'm looking for a way to get at the mail stored in Thunderbird using
> Python and, so far, I can't find anything. I did find the mozmail
> package but it seems to be geared more towards testing and not really
> the kind of use I need.

You have several options.

1) As noted before, Thunderbird ususally stores mail in mbox format, 
which you can read and parse. However it keeps an extra index file 
(.msf) to track deleted messages etc. Until you "compact" the folders, 
the messages are not deleted in the mbox file

2) You can configure it to use maildir instead. Maildir is a directory 
where every mail is stored in a single file. That might be easier to 
parse and much faster to access.

3) Are you sure that you want to solve the problem using Python? 
Thunderbird has excellent filters and global full text search (stored in 
sqlite, btw). You can instruct it to archive mails, which means it 
creates a folder for each year - once created for a past year, that 
folder will never change. This is how I do my mail backup, and these 
folders are backed up by my regular backup (TimeMachine). You could also 
try to open the full text index with sqlite and run some query on it.

4) Yet another option using Thunderbird alone is IMAP. If you can either 
use a commercial IMAP server, have your own server in the cloud or even 
write an IMAP server using Python, then Thunderbird can 
access/manipulate the mail there as a usual folder.

5) There are converters like Hypermail or MHonArc to create HTML 
archives of mbox email files for viewing in a browser

	Christian

[toc] | [prev] | [next] | [standalone]


#100194

FromSteven D'Aprano <steve@pearwood.info>
Date2015-12-09 22:11 +1100
Message-ID<56680c59$0$1591$c3e8da3$5496439d@news.astraweb.com>
In reply to#100187
On Wed, 9 Dec 2015 07:03 pm, Christian Gollwitzer wrote:

> 1) As noted before, Thunderbird ususally stores mail in mbox format,
> which you can read and parse. However it keeps an extra index file
> (.msf) to track deleted messages etc. Until you "compact" the folders,
> the messages are not deleted in the mbox file
> 
> 2) You can configure it to use maildir instead. Maildir is a directory
> where every mail is stored in a single file. That might be easier to
> parse and much faster to access.

Maildir is also *much* safer too. With mbox, a single error when writing
email to the mailbox will likely corrupt *all* emails from that point on,
so potentially every email in the mailbox. With maildir, a single error
when writing will, at worst, corrupt one email.

Thanks Mozilla, for picking the *less* efficient and *more* risky format as
the default. Good choice!


> 3) Are you sure that you want to solve the problem using Python?
> Thunderbird has excellent filters and global full text search (stored in
> sqlite, btw).

Sqlite is unsafe on Linux systems if you are using ntfs. I have had no end
of database corruption with Firefox and Thunderbird due to this, although
in fairness I haven't had any problems for a year or so now.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#100196

Fromsrinivas devaki <mr.eightnoteight@gmail.com>
Date2015-12-09 19:36 +0530
Message-ID<mailman.92.1449670019.12405.python-list@python.org>
In reply to#100194
On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <steve@pearwood.info> wrote:
>
> Maildir is also *much* safer too. With mbox, a single error when writing
> email to the mailbox will likely corrupt *all* emails from that point on,
> so potentially every email in the mailbox. With maildir, a single error
> when writing will, at worst, corrupt one email.
>

may be with frequent backup of mbox file and storing checksum to each email
will be faster and safe too.
I wonder if they already do that.

[toc] | [prev] | [next] | [standalone]


#100197

FromChris Angelico <rosuav@gmail.com>
Date2015-12-10 01:15 +1100
Message-ID<mailman.93.1449670518.12405.python-list@python.org>
In reply to#100194
On Thu, Dec 10, 2015 at 1:06 AM, srinivas devaki
<mr.eightnoteight@gmail.com> wrote:
> On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <steve@pearwood.info> wrote:
>>
>> Maildir is also *much* safer too. With mbox, a single error when writing
>> email to the mailbox will likely corrupt *all* emails from that point on,
>> so potentially every email in the mailbox. With maildir, a single error
>> when writing will, at worst, corrupt one email.
>>
>
> may be with frequent backup of mbox file and storing checksum to each email
> will be faster and safe too.
> I wonder if they already do that.

Yes, because we all know that frequent checking is better than
prevention. That's why MySQL's myisamchk command makes it so much
better than PostgreSQL's transactional DDL.

ChrisA

[toc] | [prev] | [next] | [standalone]


#100202

FromGrant Edwards <invalid@invalid.invalid>
Date2015-12-09 17:25 +0000
Message-ID<n49o5p$6f1$1@reader1.panix.com>
In reply to#100194
On 2015-12-09, Steven D'Aprano <steve@pearwood.info> wrote:

> Thanks Mozilla, for picking the *less* efficient and *more* risky format as
> the default. Good choice!

At least they picked a standard format as the default and gave you the
option to use a different standard format (cf. Microsoft and Outlook).

-- 
Grant Edwards               grant.b.edwards        Yow! Are you the
                                  at               self-frying president?
                              gmail.com            

[toc] | [prev] | [next] | [standalone]


#100220

FromMichael Torrie <torriem@gmail.com>
Date2015-12-09 23:23 -0700
Message-ID<mailman.106.1449728626.12405.python-list@python.org>
In reply to#100194
On 12/09/2015 04:11 AM, Steven D'Aprano wrote:
> Maildir is also *much* safer too. With mbox, a single error when writing
> email to the mailbox will likely corrupt *all* emails from that point on,
> so potentially every email in the mailbox. With maildir, a single error
> when writing will, at worst, corrupt one email.
> 
> Thanks Mozilla, for picking the *less* efficient and *more* risky format as
> the default. Good choice!

Not so long ago, many filesystems were very poor at storing lots of
small files. For disk efficiency, storing them in one big file,
periodically compacting the file, was seen as a better way to go. After
all mbox format has been around for a very long time for certain reasons
(which no longer exist today). Maildir came later.  Back when hard
drives were smaller, it was also not uncommon to run out of inodes in a
file system on a server that had many small files.

Neither of these issues is much of a problem these days.  Ext4 added the
ability to store small files right in the inode, so internal
fragmentation (and wasting of space) isn't a big issue anymore.

It's good to know I can configure Thunderbird to use maildir for local
storage.  I'll have to make the change here.  Will make my backups a lot
easier and faster.

[toc] | [prev] | [next] | [standalone]


#100233

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-12-10 14:26 +0100
Message-ID<11986323.g8sR6m0TfB@PointedEars.de>
In reply to#100220
Michael Torrie wrote:

> It's good to know I can configure Thunderbird to use maildir for local
> storage.  I'll have to make the change here.  Will make my backups a lot
> easier and faster.

But see also <https://wiki.mozilla.org/Thunderbird/Maildir>.  Not all of 
those bugs have been resolved/fixed.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web