Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.sys.apple2.programmer > #378
| From | Egan Ford <datajerk@gmail.com> |
|---|---|
| Newsgroups | comp.sys.apple2.programmer |
| Subject | Re: Fastest method to copy/process a range of bytes? |
| Date | 2012-08-13 11:18 -0600 |
| Organization | XMission http://xmission.com/ |
| Message-ID | <k0bcue$8ei$1@news.xmission.com> (permalink) |
| References | <jvs69d$qa1$1@news.xmission.com> <mdGdnWkVEqOuvb7NnZ2dnUVZ_vednZ2d@earthlink.com> <k08g78$tkd$1@news.xmission.com> <9sudnYzk1OuAH7XNnZ2dnUVZ_q2dnZ2d@earthlink.com> |
On 8/12/12 10:27 PM, Anton Treuenfels wrote:
> Ah, so. What comes of coding off the top of my head. Alas I don't
> believe your fix works in all cases. Consider the case where length is
> zero; the INX is meant to account for this. Failing to do that will give
> you a whole lot of moves you don't want. Although I suppose if you can
> guarantee length will never be zero it wouldn't be a problem and your
> fix will work.
Length will never be zero. The smallest it can be with this program is $2C.
On 8/12/12 10:27 PM, Anton Treuenfels wrote:
> As mentioned earlier by MJM, you can also dispense with the X register
> altogether if you're willing to simply INC and DEC 'length+1' directly.
> It's only a 3-cycle penalty once per page
Yep, that is exactly what I did in cases where I needed x in an inner
loop (see div16_mp below).
Here is an example of a reverse loop starting at arraylen-1 (arrayend is
precomputed) and ending at ptr, unrolled 4x with Duff:
add_mp:
sta ptr ; store ptr lo from A
tya
clc
adc arrayend+1 ; add number of pages since we have to
sta ptr+1 ; go backwards for add/sub/asl
lda ptr_mp+1
clc
adc arrayend+1 ; add number of pages since we have to
sta ptr_mp+1 ; go backwards for add/sub/asl
ldx arrayend+1 ; full pages
ldy arrayend ; partial
clc
tya
and #3
cmp #3
beq @3
cmp #2
beq @2
cmp #1
beq @1
bne @0
: dex
dec ptr+1 ; previous page of 256
dec ptr_mp+1 ; previous page of 256
: dey
@3: lda (ptr),y
adc (ptr_mp),y
sta (ptr),y
dey
@2: lda (ptr),y
adc (ptr_mp),y
sta (ptr),y
dey
@1: lda (ptr),y
adc (ptr_mp),y
sta (ptr),y
dey
@0: lda (ptr),y
adc (ptr_mp),y
sta (ptr),y
tya
bne :-
txa
bne :--
rts
My last challenge is the tya/bne code. txa/bne can be fixed with inx
before the loop. tya is costing my 2 cycles. The size of the array is
~420 in my testing, so 840 cycles, but I call this ~1000x. That is
almost a second! :-) But the 4x unroll reduces this to less than .25 sec.
I need to move ptr back one and y up one. But the code is unpleasant to
look at. I could really use a "branch if #$FF".
However, my real challenge is the time taken in mp_div and mp_div16:
calls avg.cyl/call total time(s)
----- ------------ -------------
counter 19: 2 4063 0.007963 mp_set
counter 20: 931 6284 5.733878 mp_copy
counter 21: 4 8501 0.033321 mp_asl
counter 22: 463 8361 3.793781 mp_add
counter 23: 465 8467 3.858206 mp_sub
counter 24: 1396 39871 54.543258 mp_div
counter 25: 675 70268 46.479285 mp_div16
Anton, thanks again for the pointers.
(the aforementioned div16_mp):
.macro div16_fast
ldy #16 ;index for 16 bits
: asl dividend+3 ;dividend/2, clear quotient bit
rol dividend+2
rol dividend+1
rol dividend+0
sec
lda dividend+1 ;try subtracting divisor
sbc divisor+1
tax
lda dividend+0
sbc divisor+0
bcc :+ ;too small, qbit=0
stx dividend+1 ;okay, store remainder
sta dividend+0
inc dividend+3 ;set quotient bit = 1
: dey ;next step
bne :--
.endmacro
div16_mp:
;;setup and skip leading zeros
sta ptr ; store ptr lo from A
sty ptr+1 ; store ptr hi from Y
ldx arraylen+1 ; full pages
ldy arraylen ; partial pages
beq @0 ; if not process full pages only
inx ; bump up x +1 for the partial page
clc ; adjust to point below source start
tya ; - ex: ptr = $xx00, arraylen = 1
adc ptr ; we want ptr = $(xx-1)01, y = $ff (255)
sta ptr
bcs :+
dec ptr+1
: tya ; adjust y index value, y = -y
eor #$ff ; one's complement
tay
iny ; two's complement
lsr ; odd or even count, this is a form of
; Duff's Device
bcs @0 ; b: even (checking A, not Y here)
bcc @1 ; b: odd (forced)
@2: inc ptr+1 ; next page
@0: lda (ptr),y ; check even byte (0, 2, ..., 254)
bne @4
iny
@1: lda (ptr),y ; check odd byte (1, 3, ..., 255)
bne @4
iny ; page done ?
bne @0 ; b: no
dex ; full page left ?
bne @2 ; b:yes
sec ; got here? all zeros
rts
@4: ;; done skipping zeros, y and x set to continue
tya ; y must be even
lsr
bcc :+ ; y even
dey
cpy #$FF
bne :+ ; y didn't roll back to FF
dex
: lda #0
sta dividend ; MSB
sta dividend+1 ; MSB-1
stx xreg ; div16_fast/safe needs x reg
beq @6
@7: inc ptr+1 ; next page
@6: lda (ptr),y ;
sta dividend+2
sty yreg
iny
lda (ptr),y ;
sta dividend+3
div16_fast ; need x and y reg
ldy yreg
lda quotient
sta (ptr),y
iny
lda quotient+1
sta (ptr),y
iny ; page done ?
bne @6
dec xreg ; (dex) full page left ?
bne @7
clc ; non zero result
rts
Back to comp.sys.apple2.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-07 16:53 -0600
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-07 18:57 -0700
Re: Fastest method to copy/process a range of bytes? Daniel Kruszyna <dan@krue.net> - 2012-08-08 15:28 +0000
Re: Fastest method to copy/process a range of bytes? aiiadict@gmail.com - 2012-08-07 22:07 -0700
Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:49 -0700
Re: Fastest method to copy/process a range of bytes? "Michael J. Mahon" <mjmahon@aol.com> - 2012-08-08 10:59 -0700
Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-08 21:27 -0500
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:12 -0700
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-08 22:23 -0700
Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:35 -0500
Re: Fastest method to copy/process a range of bytes? Jerry <awanderin@yahoo.ca> - 2012-08-11 01:03 -0600
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-11 11:33 -0700
Re: Fastest method to copy/process a range of bytes? mmphosis <mmphosis@macgui.com> - 2012-08-09 06:09 +0000
Re: Fastest method to copy/process a range of bytes? mmphosis <mmphosis@macgui.com> - 2012-08-09 09:40 +0000
Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-09 18:54 -0500
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:48 -0700
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-09 17:46 -0700
Re: Fastest method to copy/process a range of bytes? Michael J. Mahon <mjmahon@aol.com> - 2012-08-10 15:25 -0500
Re: Fastest method to copy/process a range of bytes? Antoine Vignau <antoine.vignau@laposte.net> - 2012-08-10 14:23 -0700
Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-12 08:56 -0600
Re: Fastest method to copy/process a range of bytes? "Anton Treuenfels" <teamtempest@yahoo.com> - 2012-08-12 23:27 -0500
Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-13 11:18 -0600
Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 15:58 -0600
Re: Fastest method to copy/process a range of bytes? Egan Ford <datajerk@gmail.com> - 2012-08-11 16:16 -0600
csiph-web