Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #71192
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2014-05-09 13:49 -0700 |
| References | <ea305e19-be61-469b-8a15-0753406f8476@googlegroups.com> <mailman.9830.1399666223.18130.python-list@python.org> |
| Message-ID | <e253b9fe-c65f-4df7-b9b2-aaccb14b2e64@googlegroups.com> (permalink) |
| Subject | Re: Why isn't my re.sub replacing the contents of my MS Word file? |
| From | scottcabit@gmail.com |
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote: > A Word doc (as your subject mentions) is a binary format. There's > the older .doc and the newer .docx (which is actually a .zip file > with a particular content-structure renamed to .docx). > I am using .doc files only...... > > For the older .doc file, it's a binary format, so even if you can > successfully find & swap out sequences of 7 chars for a single char, > it might screw up the internal offsets, breaking your file. I do not save the file out again, only try to change all en-dash and em-dash to dashes, then search and print things to another file, closing the searched file without writing it. > > Additionally, I vaguely remember sparring with them using some 16-bit > wide characters in .doc files so you might have to search for > atrocious things like b"\x00&\x00#\x00x\x002\x000\x001\x002" (each > character being prefixed with "\x00". Hmmm..thought that was what I was doing. Can anyone figure out why the syntax is wrong for Word 2007 document binary file data?
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 12:51 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? MRAB <python@mrabarnett.plus.com> - 2014-05-09 21:03 +0100
Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 13:46 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Chris Angelico <rosuav@gmail.com> - 2014-05-10 06:08 +1000
Re: Why isn't my re.sub replacing the contents of my MS Word file? Tim Chase <python.list@tim.thechases.com> - 2014-05-09 15:09 -0500
Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 13:49 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-10 00:31 +0000
Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-10 00:12 +0000
Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-12 10:35 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Rustom Mody <rustompmody@gmail.com> - 2014-05-12 20:00 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Dave Angel <davea@davea.name> - 2014-05-12 17:15 -0400
Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:49 +0000
Re: Why isn't my re.sub replacing the contents of my MS Word file? Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:55 +1000
Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-13 12:01 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? MRAB <python@mrabarnett.plus.com> - 2014-05-13 21:26 +0100
Re: Why isn't my re.sub replacing the contents of my MS Word file? wxjmfauth@gmail.com - 2014-05-13 23:12 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 13:21 +0000
Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-14 07:40 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Rustom Mody <rustompmody@gmail.com> - 2014-05-09 21:22 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? wxjmfauth@gmail.com - 2014-05-10 00:11 -0700
Re: Why isn't my re.sub replacing the contents of my MS Word file? Tim Golden <mail@timgolden.me.uk> - 2014-05-10 09:49 +0100
csiph-web