Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #100169 > unrolled thread
| Started by | Anthony Papillion <anthony@cajuntechie.org> |
|---|---|
| First post | 2015-12-08 12:21 -0600 |
| Last post | 2015-12-10 14:26 +0100 |
| Articles | 11 — 10 participants |
Back to article view | Back to comp.lang.python
Getting data out of Mozilla Thunderbird with Python? Anthony Papillion <anthony@cajuntechie.org> - 2015-12-08 12:21 -0600
Re: Getting data out of Mozilla Thunderbird with Python? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-12-08 19:42 +0100
Re: Getting data out of Mozilla Thunderbird with Python? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-08 20:31 +0000
META email [was Re: Getting data out of Mozilla Thunderbird with Python?] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-12-09 18:43 +1100
Re: Getting data out of Mozilla Thunderbird with Python? Christian Gollwitzer <auriocus@gmx.de> - 2015-12-09 09:03 +0100
Re: Getting data out of Mozilla Thunderbird with Python? Steven D'Aprano <steve@pearwood.info> - 2015-12-09 22:11 +1100
Re: Getting data out of Mozilla Thunderbird with Python? srinivas devaki <mr.eightnoteight@gmail.com> - 2015-12-09 19:36 +0530
Re: Getting data out of Mozilla Thunderbird with Python? Chris Angelico <rosuav@gmail.com> - 2015-12-10 01:15 +1100
Re: Getting data out of Mozilla Thunderbird with Python? Grant Edwards <invalid@invalid.invalid> - 2015-12-09 17:25 +0000
Re: Getting data out of Mozilla Thunderbird with Python? Michael Torrie <torriem@gmail.com> - 2015-12-09 23:23 -0700
Re: Getting data out of Mozilla Thunderbird with Python? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-12-10 14:26 +0100
| From | Anthony Papillion <anthony@cajuntechie.org> |
|---|---|
| Date | 2015-12-08 12:21 -0600 |
| Subject | Getting data out of Mozilla Thunderbird with Python? |
| Message-ID | <mailman.74.1449598912.12405.python-list@python.org> |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello Everyone, I have a TON of email (years) stored in my Thunderbird. My backup strategy for the last few years has been to periodically dump it all in a tar file, encrypt that tar file, and move it up to the cloud. That way, if my machine ever crashes, I don't lose years of email. But I've been thinking about bringing Python into the mix to build a bridge between Thunderbird and SQLite or MySQL (probably sqlite) where all mail would be backed up to a database where I could run analytics against it and search it more effectively. I'm looking for a way to get at the mail stored in Thunderbird using Python and, so far, I can't find anything. I did find the mozmail package but it seems to be geared more towards testing and not really the kind of use I need. Can anyone suggest anything? Many Thanks, Anthony Papillion - -- Phone: 1.845.666.1114 Skype: cajuntechie PGP Key: 0x028ADF7453B04B15 Fingerprint: C5CE E687 DDC2 D12B 9063 56EA 028A DF74 53B0 4B15 -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJWZx+3AAoJEAKK33RTsEsVVa8QAKf1AmFdJsi4/b08vpkfwP3c akGV98EuZzEva29jr8nnfXGgqw7xD/nDjMyLzuO0/q4Kn7eKpEnxkcGDLSbDgxaW O8kD5eALHCVlUp9p/h7RMBBAyZ4mH8YC6qwvd5SWtH0TIMR7ClcWmDYwPF1Ahk7n NAFvTsMl8PSnhcIoWHE4vebN4wHR8gZAxOLI8WVPA2BbER64EXiL00nWBav6UDN5 NUosAAVa549rrH0ibEf7Lada63DRTHCYnESxNIkAAHIO0z69WjnfZQ8gmmGFhuaW AZzqYV5pIhdRnvrwjCQ06LtUNtz/qPqLbLSWF0hA6lwPKqzNum9EdvS4c1xjcXsU KpOCTmJXy40x1Oi8h+yT6PGiDxt5VCHCdN8ppToI3HY5pYmoiPgWszJzrqYMz7hz ruhNFAksKNUSI9QQupYcPw6oKQdnoGWmBH1yvGlZqeZuIxhGEv87oqRISE4NRQLe yL4aDebwXdDgBzIZvFOFy2W4L43jdravg2/LliSC18iCUKBnIpWhazy7NZHw6h55 h3QP84DeuB/9tPLQUZF+BEJm3I+V8WfSKVVnsSbk/n/chHgYpWnu+h/wpD6lx43x y0lPJm0ni5LeQM1bK4TsIXVEAOzl8UaOwn/VUG7P6Jnt6VEqvQutWZ0/WEeP1nIX M7+e9hLlQWtlEbl6ud1K =Dz7N -----END PGP SIGNATURE-----
[toc] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-12-08 19:42 +0100 |
| Message-ID | <4306369.OX057VMHEH@PointedEars.de> |
| In reply to | #100169 |
Anthony Papillion wrote: > -----BEGIN PGP SIGNED MESSAGE----- Please don’t do that again. > I have a TON of email (years) stored in my Thunderbird. My backup > strategy for the last few years has been to periodically dump it all > in a tar file, encrypt that tar file, and move it up to the cloud. > That way, if my machine ever crashes, I don't lose years of email. > > But I've been thinking about bringing Python into the mix to build a > bridge between Thunderbird and SQLite or MySQL (probably sqlite) where > all mail would be backed up to a database where I could run analytics > against it and search it more effectively. > > I'm looking for a way to get at the mail stored in Thunderbird using > Python and, so far, I can't find anything. I did find the mozmail > package but it seems to be geared more towards testing and not really > the kind of use I need. > > Can anyone suggest anything? Yes. (Please never ask that question again: <http://www.catb.org/~esr/faqs/smart-questions.html>) Thunderbird uses the mbox format to store both e-mails and news messages. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-12-08 20:31 +0000 |
| Message-ID | <mailman.77.1449606906.12405.python-list@python.org> |
| In reply to | #100170 |
On 08/12/2015 18:42, Thomas 'PointedEars' Lahn wrote: > Anthony Papillion wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > > Please don’t do that again. > Says who? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-12-09 18:43 +1100 |
| Subject | META email [was Re: Getting data out of Mozilla Thunderbird with Python?] |
| Message-ID | <5667dba7$0$14486$c3e8da3@news.astraweb.com> |
| In reply to | #100170 |
On Wednesday 09 December 2015 05:42, Thomas 'PointedEars' Lahn wrote:
[snip]
Thomas, your sig says:
Please do not cc me. / Bitte keine Kopien per E-Mail.
but you have a Reply-To set. That implies that you want replies to be sent
directly to you by email, not to the list or newsgroup. Is that really what
you want? That seems incompatible with your signature. Which is correct?
--
Steve
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-12-09 09:03 +0100 |
| Message-ID | <n48n4e$v96$1@dont-email.me> |
| In reply to | #100169 |
Am 08.12.15 um 19:21 schrieb Anthony Papillion: > I have a TON of email (years) stored in my Thunderbird. My backup > strategy for the last few years has been to periodically dump it all > in a tar file, encrypt that tar file, and move it up to the cloud. > That way, if my machine ever crashes, I don't lose years of email. > > But I've been thinking about bringing Python into the mix to build a > bridge between Thunderbird and SQLite or MySQL (probably sqlite) where > all mail would be backed up to a database where I could run analytics > against it and search it more effectively. > > I'm looking for a way to get at the mail stored in Thunderbird using > Python and, so far, I can't find anything. I did find the mozmail > package but it seems to be geared more towards testing and not really > the kind of use I need. You have several options. 1) As noted before, Thunderbird ususally stores mail in mbox format, which you can read and parse. However it keeps an extra index file (.msf) to track deleted messages etc. Until you "compact" the folders, the messages are not deleted in the mbox file 2) You can configure it to use maildir instead. Maildir is a directory where every mail is stored in a single file. That might be easier to parse and much faster to access. 3) Are you sure that you want to solve the problem using Python? Thunderbird has excellent filters and global full text search (stored in sqlite, btw). You can instruct it to archive mails, which means it creates a folder for each year - once created for a past year, that folder will never change. This is how I do my mail backup, and these folders are backed up by my regular backup (TimeMachine). You could also try to open the full text index with sqlite and run some query on it. 4) Yet another option using Thunderbird alone is IMAP. If you can either use a commercial IMAP server, have your own server in the cloud or even write an IMAP server using Python, then Thunderbird can access/manipulate the mail there as a usual folder. 5) There are converters like Hypermail or MHonArc to create HTML archives of mbox email files for viewing in a browser Christian
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-12-09 22:11 +1100 |
| Message-ID | <56680c59$0$1591$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #100187 |
On Wed, 9 Dec 2015 07:03 pm, Christian Gollwitzer wrote: > 1) As noted before, Thunderbird ususally stores mail in mbox format, > which you can read and parse. However it keeps an extra index file > (.msf) to track deleted messages etc. Until you "compact" the folders, > the messages are not deleted in the mbox file > > 2) You can configure it to use maildir instead. Maildir is a directory > where every mail is stored in a single file. That might be easier to > parse and much faster to access. Maildir is also *much* safer too. With mbox, a single error when writing email to the mailbox will likely corrupt *all* emails from that point on, so potentially every email in the mailbox. With maildir, a single error when writing will, at worst, corrupt one email. Thanks Mozilla, for picking the *less* efficient and *more* risky format as the default. Good choice! > 3) Are you sure that you want to solve the problem using Python? > Thunderbird has excellent filters and global full text search (stored in > sqlite, btw). Sqlite is unsafe on Linux systems if you are using ntfs. I have had no end of database corruption with Firefox and Thunderbird due to this, although in fairness I haven't had any problems for a year or so now. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2015-12-09 19:36 +0530 |
| Message-ID | <mailman.92.1449670019.12405.python-list@python.org> |
| In reply to | #100194 |
On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <steve@pearwood.info> wrote: > > Maildir is also *much* safer too. With mbox, a single error when writing > email to the mailbox will likely corrupt *all* emails from that point on, > so potentially every email in the mailbox. With maildir, a single error > when writing will, at worst, corrupt one email. > may be with frequent backup of mbox file and storing checksum to each email will be faster and safe too. I wonder if they already do that.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-12-10 01:15 +1100 |
| Message-ID | <mailman.93.1449670518.12405.python-list@python.org> |
| In reply to | #100194 |
On Thu, Dec 10, 2015 at 1:06 AM, srinivas devaki <mr.eightnoteight@gmail.com> wrote: > On Dec 9, 2015 4:45 PM, "Steven D'Aprano" <steve@pearwood.info> wrote: >> >> Maildir is also *much* safer too. With mbox, a single error when writing >> email to the mailbox will likely corrupt *all* emails from that point on, >> so potentially every email in the mailbox. With maildir, a single error >> when writing will, at worst, corrupt one email. >> > > may be with frequent backup of mbox file and storing checksum to each email > will be faster and safe too. > I wonder if they already do that. Yes, because we all know that frequent checking is better than prevention. That's why MySQL's myisamchk command makes it so much better than PostgreSQL's transactional DDL. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2015-12-09 17:25 +0000 |
| Message-ID | <n49o5p$6f1$1@reader1.panix.com> |
| In reply to | #100194 |
On 2015-12-09, Steven D'Aprano <steve@pearwood.info> wrote:
> Thanks Mozilla, for picking the *less* efficient and *more* risky format as
> the default. Good choice!
At least they picked a standard format as the default and gave you the
option to use a different standard format (cf. Microsoft and Outlook).
--
Grant Edwards grant.b.edwards Yow! Are you the
at self-frying president?
gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-12-09 23:23 -0700 |
| Message-ID | <mailman.106.1449728626.12405.python-list@python.org> |
| In reply to | #100194 |
On 12/09/2015 04:11 AM, Steven D'Aprano wrote: > Maildir is also *much* safer too. With mbox, a single error when writing > email to the mailbox will likely corrupt *all* emails from that point on, > so potentially every email in the mailbox. With maildir, a single error > when writing will, at worst, corrupt one email. > > Thanks Mozilla, for picking the *less* efficient and *more* risky format as > the default. Good choice! Not so long ago, many filesystems were very poor at storing lots of small files. For disk efficiency, storing them in one big file, periodically compacting the file, was seen as a better way to go. After all mbox format has been around for a very long time for certain reasons (which no longer exist today). Maildir came later. Back when hard drives were smaller, it was also not uncommon to run out of inodes in a file system on a server that had many small files. Neither of these issues is much of a problem these days. Ext4 added the ability to store small files right in the inode, so internal fragmentation (and wasting of space) isn't a big issue anymore. It's good to know I can configure Thunderbird to use maildir for local storage. I'll have to make the change here. Will make my backups a lot easier and faster.
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2015-12-10 14:26 +0100 |
| Message-ID | <11986323.g8sR6m0TfB@PointedEars.de> |
| In reply to | #100220 |
Michael Torrie wrote: > It's good to know I can configure Thunderbird to use maildir for local > storage. I'll have to make the change here. Will make my backups a lot > easier and faster. But see also <https://wiki.mozilla.org/Thunderbird/Maildir>. Not all of those bugs have been resolved/fixed. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web