Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.sys.apple2.programmer > #359

Re: Fastest method to copy/process a range of bytes?

From "Anton Treuenfels" <teamtempest@yahoo.com>
Newsgroups comp.sys.apple2.programmer
References <jvs69d$qa1$1@news.xmission.com>
Subject Re: Fastest method to copy/process a range of bytes?
Date 2012-08-08 21:27 -0500
Message-ID <mdGdnWkVEqOuvb7NnZ2dnUVZ_vednZ2d@earthlink.com> (permalink)

Show all headers | View raw


"Egan Ford" <datajerk@gmail.com> wrote in message 
news:jvs69d$qa1$1@news.xmission.com...
> This is the fastest I could come up with.  It aligns with what I have read 
> online as well as in books.  The draw back is that I have to use x and y, 
> it's long, and the (copy in this case) code has to be declared twice.
>
> Any suggestions or tricks on doing this faster?
>
> Thanks.
>
>
> Assume length, ptr, and ptr_mp have already be set.
>
> copy_mp:
>         ldy     #0              ; init block counter
>         ldx     length+1        ; do full blocks first,
> ; ldx with number of full blocks
>         beq     copy_mp1        ; if 0, no full blocks
> :
>         lda     (ptr),y         ; load it
>         sta     (ptr_mp),y      ; store it
>         iny                     ; next y
>         bne     :-              ; still in block? go again
>         inc     ptr+1           ; next block of 256
>         inc     ptr_mp+1        ; next block of 256
>         dex                     ; if more blocks
>         bne     :-              ; back to :
> copy_mp1:
>         ldx     length          ; remainder of array
>         beq     copy_mp2        ; no remainder (full blocks only)
> :                               ; same as before, Y should be 0
>         lda     (ptr),y         ; load it
>         sta     (ptr_mp),y      ; store it
>         iny                     ; count up address
>         dex                     ; count down length of array
>         bne     :-              ; back to :
> copy_mp2:
>         rts

Here's an approach that uses pointer adjustments so as to have only one main 
loop. Also that loop is unrolled a bit. Setup takes around 50 cycles worst 
case but the last two high byte pointer increments are avoided, saving 10 
cycles. So net 40 cycles. The unrolled loop saves three cycles each time 
through ("bne" not executed). So a net gain if moving 14 or more bytes in a 
partial page. At least I think that's right.

     ldx length+1    ; full pages
     inx
     ldy length         ; partial page ?
     beq :6              ; b: no
     clc                   ; adjust to point below source start
     tya                   ; - ex: ptr = $xx00, length = 1
     adc ptr             ; we want ptr = $(xx-1)01, y = $ff (255)
     sta ptr
     bcs :1
     dec ptr+1
:1  clc                   ; adjust to point below destination start
     tya                   ; - ex: ptr_mp = $xxff, length = 2
     adc ptr_mp      ; we want ptr_mp = $xx01, y = $fe (254)
     sta ptr_mp
     bcs :2
     dec ptr_mp+1
:2   tya                   ; adjust y index value
      eor #$ff           ; one's complement
      tay
      iny                   ; two's complement
      lsr                    ; odd or even count ? - this is a form of 
Duff's Device, BTW
      bcs :4               ; b: even
      bcc :5               ; b: odd (forced)

:3   inc ptr+1           ; next page
     inc ptr_mp+1
:4   lda (ptr),y           ; move even byte (0, 2, ..., 254)
      sta (ptr_mp),y
      iny
:5   lda (ptr),y           ; move odd byte (1, 3, ..., 255)
     sta (ptr_mp),y
     iny                       ; page done ?
     bne :4                  ; b: no
:6  dex                     ; full page left ?
     bne :3                ; b:yes
     rts

The loop could be unrolled to four, and the corresponding "Duff's Device" 
section expanded to branch correctly into it. That'd be overall faster, but 
not by as much as the first unrolling on a percentage basis.

I've wondered about code like this:

    ldy ptr
    lda #$00
    sta ptr

so there's a guarantee of no page crossing by at least one of the 
participants (the other would have to be adjusted downward by "ptr", of 
course). Haven't got around to coding it to see what it would be like, 
though.

- Anton Treuenfels
 

Back to comp.sys.apple2.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-07 16:53 -0600
  Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-07 18:57 -0700
    Re: Fastest method to copy/process a range of bytes? Daniel Kruszyna <dan@krue.net> - 2012-08-08 15:28 +0000
  Re: Fastest method to copy/process a range of bytes? aiiadict@gmail.com - 2012-08-07 22:07 -0700
  Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:49 -0700
    Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:59 -0700
  Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-08 21:27 -0500
    Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:12 -0700
      Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:23 -0700
        Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:35 -0500
          Re: Fastest method to copy/process a range of bytes? Jerry <awanderin@yahoo.ca> - 2012-08-11 01:03 -0600
            Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-11 11:33 -0700
    Re: Fastest method to copy/process a range of bytes?  mmphosis <mmphosis@macgui.com> - 2012-08-09 06:09 +0000
      Re: Fastest method to copy/process a range of bytes?  mmphosis <mmphosis@macgui.com> - 2012-08-09 09:40 +0000
      Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:54 -0500
        Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:48 -0700
        Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:46 -0700
          Re: Fastest method to copy/process a range of bytes? Michael J. Mahon <mjmahon@aol.com> - 2012-08-10 15:25 -0500
            Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-10 14:23 -0700
    Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-12 08:56 -0600
      Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-12 23:27 -0500
        Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-13 11:18 -0600
  Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 15:58 -0600
    Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 16:16 -0600

csiph-web