Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > alt.comp.software.thunderbird > #19694
| From | Dave Royal <dave@dave123royal.com> |
|---|---|
| Newsgroups | alt.comp.software.thunderbird |
| Subject | Re: Copying the text in subject of multiple messages? |
| Date | 2026-02-16 15:06 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <10mvbpg$uhho$1@dont-email.me> (permalink) |
| References | (5 earlier) <10mqdme$3cavb$1@dont-email.me> <10mqfqe$3cuuc$2@dont-email.me> <10mu8t0$jrn5$2@toylet.eternal-september.org> <10muk3q$n4fm$1@dont-email.me> <10muqb4$p5cb$1@dont-email.me> |
On Mon, 16 Feb 2026 05:08:33 -0500, Paul wrote: > On Mon, 2/16/2026 3:22 AM, Dave Royal wrote: >> >> My first thought was to use sed to convert the mail file into csv >> - date, sender, subject - and then sort and select in a spreadsheet. >> The rfc2047 encoding complicates it. >> >> I wonder if awk could do the conversion - between sed and the >> spreadsheet? People have programmed some remarkable things in awk. >> >> Any script or language with an rfc2047 decoder could be used to >> write a little utility: >> stdin > decode > stdout >> >> I never learned C but I recently played with Rust: >> https://crates.io/crates/rfc2047-decoder/1.1.0 > > The mbox file is a mix of character sets. > This is not particularly amenable to hobby programming (I tried and > failed). You really need a parser that reads the headers on each > message, if you expect to access a body in an intelligent way. > > The header lines at one time, would have been guaranteed to be ASCII. > But gradually certain of the lines now also require flexible parsing to > render the UTF-8 or whatever, properly. The Subject line could have > emojis in it. > And again, not all the Linux tools could handle that. Just as PERL may > not be prepared to handle UTF-8 properly, or GAWK or SED for that > matter. > > If the Subject line has an escape sequence in it, > your tools have to handle that. > > Years ago, this would have been a doddle. Today, > it takes a lot of thinking to do this right and make a result that is > presentable. > > A Subject: line during the ThaiSpam attack. > > Subject: > =?UTF-8?B? 4Lid4Liy4LiBIDEwIOC4o+C4seC4miAxMDAg4LiX4Liz4Lii4Lit4LiUIDIwMCDguJY=?= > > How it looks in Thunderbird. > > [Picture] > > https://i.postimg.cc/HxzKH3qV/Spam-Attack-Subject-Lines.gif > > Normal tools aren't really ready for that. > > While some days, the Subject lines look "normal", your crafted software > solution has to work with the worst case behavior. The header lines are still all ASCII. Non-ASCII characters and emoticons are encoded like the example you quoted. sed or grep can easily extract just the (encoded) subject lines. Extracting date, from, and subject onto one csv line can be done with sed - you have to use the hold space to merge the 3 lines. I do something similar with the bodies of emails from a supermarket (sainsbury's UK) confirming deliveries. I extract the item, quantity and price from a table in the email and turn it into a spreadsheet for ease of checking. The sequence is email (eml) > sed (csv) > awk* (csv) > sort (csv) > LibreOfficeCalc *awk adds a sort category column, eg dairy. The email bodies are html-only UTF-8 (so no decoding required). LOCalc displays the UTF-8 text fine but when I moved it to Windows and Excel it didn't so I had to put it through an extra stage (another sed) to add a Byte Order Marker (BOM). (I then automated it with a TB addon: display the email and it automatically generates a csv file.) -- (Remove any numerics from my email address.)
Back to alt.comp.software.thunderbird | Previous | Next — Previous in thread | Find similar
Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-14 01:10 +0800
Re: Copying the text in subject of multiple messages? Paul <nospam@needed.invalid> - 2026-02-13 16:10 -0500
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-14 19:03 +0800
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-14 19:11 +0800
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-14 19:28 +0800
Re: Copying the text in subject of multiple messages? "J. P. Gilliver" <G6JPG@255soft.uk> - 2026-02-14 02:48 +0000
Re: Copying the text in subject of multiple messages? "Carlos E. R." <robin_listas@es.invalid> - 2026-02-14 10:23 +0100
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-14 17:38 +0800
Re: Copying the text in subject of multiple messages? "J. P. Gilliver" <G6JPG@255soft.uk> - 2026-02-14 15:13 +0000
Re: Copying the text in subject of multiple messages? Paul <nospam@needed.invalid> - 2026-02-14 10:49 -0500
Re: Copying the text in subject of multiple messages? Dave Royal <dave@dave123royal.com> - 2026-02-14 18:08 +0000
Re: Copying the text in subject of multiple messages? Paul <nospam@needed.invalid> - 2026-02-14 13:44 -0500
Re: Copying the text in subject of multiple messages? "Carlos E. R." <robin_listas@es.invalid> - 2026-02-15 00:55 +0100
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-16 13:10 +0800
Re: Copying the text in subject of multiple messages? Dave Royal <dave@dave123royal.com> - 2026-02-16 08:22 +0000
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-16 17:58 +0800
Re: Copying the text in subject of multiple messages? Paul <nospam@needed.invalid> - 2026-02-16 05:08 -0500
Re: Copying the text in subject of multiple messages? "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-02-16 18:38 +0800
Re: Copying the text in subject of multiple messages? Dave Royal <dave@dave123royal.com> - 2026-02-16 15:06 +0000
csiph-web