Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #71192

Re: Why isn't my re.sub replacing the contents of my MS Word file?

Newsgroups comp.lang.python
Date 2014-05-09 13:49 -0700
References <ea305e19-be61-469b-8a15-0753406f8476@googlegroups.com> <mailman.9830.1399666223.18130.python-list@python.org>
Message-ID <e253b9fe-c65f-4df7-b9b2-aaccb14b2e64@googlegroups.com> (permalink)
Subject Re: Why isn't my re.sub replacing the contents of my MS Word file?
From scottcabit@gmail.com

Show all headers | View raw


On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote:

> A Word doc (as your subject mentions) is a binary format.  There's
> the older .doc and the newer .docx (which is actually a .zip file
> with a particular content-structure renamed to .docx).
> 
   I am using .doc files only......

> 
> For the older .doc file, it's a binary format, so even if you can
> successfully find & swap out sequences of 7 chars for a single char,
> it might screw up the internal offsets, breaking your file.

   I do not save the file out again, only try to change all en-dash and em-dash to dashes, then search and print things to another file, closing the searched file without writing it.

> 
> Additionally, I vaguely remember sparring with them using some 16-bit
> wide characters in .doc files so you might have to search for
> atrocious things like b"\x00&\x00#\x00x\x002\x000\x001\x002" (each
> character being prefixed with "\x00".

  Hmmm..thought that was what I was doing. Can anyone figure out why the syntax is wrong for Word 2007 document binary file data?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 12:51 -0700
  Re: Why isn't my re.sub replacing the contents of my MS Word file? MRAB <python@mrabarnett.plus.com> - 2014-05-09 21:03 +0100
    Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 13:46 -0700
  Re: Why isn't my re.sub replacing the contents of my MS Word file? Chris Angelico <rosuav@gmail.com> - 2014-05-10 06:08 +1000
  Re: Why isn't my re.sub replacing the contents of my MS Word file? Tim Chase <python.list@tim.thechases.com> - 2014-05-09 15:09 -0500
    Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-09 13:49 -0700
      Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-10 00:31 +0000
  Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-10 00:12 +0000
    Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-12 10:35 -0700
      Re: Why isn't my re.sub replacing the contents of my MS Word file? Rustom Mody <rustompmody@gmail.com> - 2014-05-12 20:00 -0700
      Re: Why isn't my re.sub replacing the contents of my MS Word file? Dave Angel <davea@davea.name> - 2014-05-12 17:15 -0400
      Re: Why isn't my re.sub replacing the contents of my MS Word file? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:49 +0000
        Re: Why isn't my re.sub replacing the contents of my MS Word file? Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:55 +1000
        Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-13 12:01 -0700
          Re: Why isn't my re.sub replacing the contents of my MS Word file? MRAB <python@mrabarnett.plus.com> - 2014-05-13 21:26 +0100
            Re: Why isn't my re.sub replacing the contents of my MS Word file? wxjmfauth@gmail.com - 2014-05-13 23:12 -0700
              Re: Why isn't my re.sub replacing the contents of my MS Word file? alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 13:21 +0000
            Re: Why isn't my re.sub replacing the contents of my MS Word file? scottcabit@gmail.com - 2014-05-14 07:40 -0700
  Re: Why isn't my re.sub replacing the contents of my MS Word file? Rustom Mody <rustompmody@gmail.com> - 2014-05-09 21:22 -0700
    Re: Why isn't my re.sub replacing the contents of my MS Word file? wxjmfauth@gmail.com - 2014-05-10 00:11 -0700
      Re: Why isn't my re.sub replacing the contents of my MS Word file? Tim Golden <mail@timgolden.me.uk> - 2014-05-10 09:49 +0100

csiph-web