Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.sys.apple2.programmer > #374

Re: Fastest method to copy/process a range of bytes?

From Egan Ford <datajerk@gmail.com>
Newsgroups comp.sys.apple2.programmer
Subject Re: Fastest method to copy/process a range of bytes?
Date 2012-08-11 15:58 -0600
Organization XMission http://xmission.com/
Message-ID <k06khd$p8t$1@news.xmission.com> (permalink)
References <jvs69d$qa1$1@news.xmission.com>

Show all headers | View raw


On 8/7/12 4:53 PM, Egan Ford wrote:
> This is the fastest I could come up with.  It aligns with what I have
> read online as well as in books.  The draw back is that I have to use x
> and y, it's long, and the (copy in this case) code has to be declared
> twice.
>
> Any suggestions or tricks on doing this faster?

Gents,

Thank you all for all the pointers.  My consolidated replies below.


On 8/7/12 7:57 PM, Antoine Vignau wrote:
 > - use absolute addressing

My problem with absolute addressing is that it is, well, absolute.  I 
probably should have stated more clearly what my program does and what 
my goals are.

I am writing multi-precision arithmetic code and the size of my arrays 
are undetermined until run-time.  The use of pointers (indirect 
addressing) seems a bit more natural.  To use absolute I'd have to have 
self-modifying code.  I am not strictly apposed to that (I do use it for 
fast absolute table look ups), however an objective is to illustrate 
conventional practices while also trying to optimize for speed.


On 8/8/12 11:49 AM, Michael J. Mahon wrote:
 > (I've changed the nomenclature: "page" has the usual meaning
 > and I use "block" to refer to the entire memory range being
 > copied.)

For the readability of my comments I have changed all instances of block 
to page.  I used the term block because it is how I visualized the 
memory.  Thanks for the tip.


On 8/8/12 11:49 AM, Michael J. Mahon wrote:
 > This modification copies the final partial page in a downward
 > direction, which could be an issue if the source and destination
 > blocks overlap.

It is also an issue with my mp math code.  I have to process the bytes 
in order (LSB to MSB for add, sub, asl, mult; reverse for div).  I 
should not have used copy as an example for speed up since it eliminates 
restrictions that other processing code has.

However, I will try to leverage inc ptr+1 to free up x or y.  The 
problem with my mult and div code is that I have another loop that needs 
to run very fast inside the x/y loop.  Right now I have to backup x/y or 
try to find a way to merge the loops.


On 8/8/12 9:28 AM, Daniel Kruszyna wrote:
 > Another idea is to unroll the first inner loop (lda sta iny).

This simple idea just shaved off 0.5 sec (out of 6.5).  I call copy a 
lot (931 instances).  I unrolled 4x.


On 8/8/12 8:27 PM, Anton Treuenfels wrote:> Here's an approach that uses 
pointer adjustments so as to have only one
 > main loop. Also that loop is unrolled a bit. Setup takes around 50
 > cycles worst case but the last two high byte pointer increments are
 > avoided, saving 10 cycles. So net 40 cycles. The unrolled loop saves
 > three cycles each time through ("bne" not executed). So a net gain if
 > moving 14 or more bytes in a partial page. At least I think that's right.

Anton, this is brilliant.  Unrolling 4x with my old (forward processing) 
code, vs. your code with only 2x unrolls performs almost the same.  I'll 
be experimenting with this further.  I also have to see if I can do this 
in reverse as well.  Current example of my reverse (n .. 0) array 
processing code:

add_mp:
         sta     ptr             ; store ptr lo from A
         tya
         clc
         adc     arrayend+1      ; add number of pages since we have to
         sta     ptr+1           ;   go backwards for add/sub/asl

         lda     ptr_mp+1
         clc
         adc     arrayend+1      ; add number of pages since we have to
         sta     ptr_mp+1        ;   go backwards for add/sub/asl

         ldx     arrayend+1      ; full pages
         ldy     arrayend        ; partial
         clc
         bcc     :+++
:       dex
         dec     ptr+1           ; previous page of 256
         dec     ptr_mp+1        ; previous page of 256
:       dey
:       lda     (ptr),y
         adc     (ptr_mp),y
         sta     (ptr),y
         tya
         bne     :--
         txa
         bne     :---
         rts

The above code is just a bit slower than two loops (full pages and 
partial), but it is shorter and since I do not call add, sub, etc... as 
often as div, any optimization will be minimal as well.

Thanks again.


Back to comp.sys.apple2.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-07 16:53 -0600
  Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-07 18:57 -0700
    Re: Fastest method to copy/process a range of bytes? Daniel Kruszyna <dan@krue.net> - 2012-08-08 15:28 +0000
  Re: Fastest method to copy/process a range of bytes? aiiadict@gmail.com - 2012-08-07 22:07 -0700
  Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:49 -0700
    Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:59 -0700
  Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-08 21:27 -0500
    Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:12 -0700
      Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:23 -0700
        Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:35 -0500
          Re: Fastest method to copy/process a range of bytes? Jerry <awanderin@yahoo.ca> - 2012-08-11 01:03 -0600
            Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-11 11:33 -0700
    Re: Fastest method to copy/process a range of bytes?  mmphosis <mmphosis@macgui.com> - 2012-08-09 06:09 +0000
      Re: Fastest method to copy/process a range of bytes?  mmphosis <mmphosis@macgui.com> - 2012-08-09 09:40 +0000
      Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:54 -0500
        Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:48 -0700
        Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:46 -0700
          Re: Fastest method to copy/process a range of bytes? Michael J. Mahon <mjmahon@aol.com> - 2012-08-10 15:25 -0500
            Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-10 14:23 -0700
    Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-12 08:56 -0600
      Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-12 23:27 -0500
        Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-13 11:18 -0600
  Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 15:58 -0600
    Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 16:16 -0600

csiph-web