Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.advocacy > #684666

Re: Challenge For The "Expert" Tyrone

From Physfitfreak <physfitfreak@gmail.com>
Newsgroups comp.os.linux.advocacy
Subject Re: Challenge For The "Expert" Tyrone
Date 2025-01-28 10:43 -0600
Organization individual
Message-ID <vnb1f4$1tgcb$1@dont-email.me> (permalink)
References <pan$5f575$bd36f4a3$95d6d52e$b096d18c@linux.rocks>

Show all headers | View raw


On 1/28/25 10:14 AM, Farley Flud wrote:
> Poor tired, exhausted Tyrone.  He must have spent days of futile
> searching in an attempt to find a copy somewhere of my absolutely
> perfect AVX-512 assembly code.
> 
> (Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!)
> 
> Of course, all of his efforts were in total vain, because no such
> copy exists anywhere, except right here on C.O.L.A.
> 
> Poor tired, exhausted Tyrone (not to mention poor, dumb bastard).
> 
> (Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!)
> 
> Well, I have a challenge for the "expert" Tyrone.
> 
> I have ever so slightly modified my absolutely perfect AVX-512 code
> so that it no longer will execute.  Instead it will crash horribly.
> 
> The ever-so-slightly modified code follows.
> 
> Let's allow the "expert" Tyrone to discover and clearly report
> the fault.
> 
> Anyone want to takes bets?
> 
> Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!
> 
> I recommend that Tyrone invest his extensive and exhaustive search
> time in a search for his own stupidity.
> 
> Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!
> 
> 
> ============================================
> Begin AVX-512 NASM Assembly (Modified)
> ============================================
> 
> BITS 64
> 
> segment .text
> 	global _start
> 
> _start:
> 	mov r8, data_in
> 	mov r9, data_out
> 	mov rbx, qword [stride]
> 	xor rdx, rdx
> 	mov rax, qword [N]
> 	div rbx 	; rax = quotient, rdx = remainder
> load:
> 	vmovdqa32 zmm1, zword [r8]
> 	vmovdqa32 zword [r9], zmm1
> 	add r8, 64 ; increment data pointers
> 	add r9, 64
> 	dec rax
> 	jnz load
> 	xor r11, r11 	; load mask, i.e. only rdx left over to load
> 	mov r10, -1
> 	mov rcx, rdx
> 	shld r11, r10, cl
> 	kmovq k1, r11;
> 	vmovdqa32 zmm1{k1}{z}, zword [r8]
> 	vmovdqa32 zword [r9], zmm1
> exit:	
> 	xor edi,edi
> 	mov eax,60
> 	syscall
> 
> segment .data
> align 64
> N:		dq 37 	;set length of block and stride
> stride:		dq 16
> data_in:	dd 16 dup (0xefbeadde) ;dummy data
> 		dd 16 dup (0xfecaafde)
> 		dd 5 dup (0xefbeadde)
> 
> segment .bss
> alignb 64
> data_out:	resd 37
> 
> ========================================
> End AVX-512 NASM Assembly (Modified)
> ========================================
> 
> 
> 
> 


No Farley, not after the advent of DeepSeek.

If I, who doesn't know _hoot_ about assembly language can find the 
answer within about 10 seconds of your post, then any "Tyrone" can also.

Output from DeepSeek:


1. Incorrect Handling of Remainder:

     The code calculates the quotient and remainder of N / stride using 
the div instruction. The quotient (rax) is used to determine how many 
full 64-byte blocks to copy, and the remainder (rdx) is used to handle 
the remaining elements.

     However, the code assumes that the remainder (rdx) will always be 
less than or equal to 16 (the stride). If N is not a multiple of stride, 
the remainder could be larger than 16, which would lead to incorrect 
behavior when creating the mask for the final partial load.

2. Mask Calculation:

     The mask calculation for the final partial load is incorrect. The 
shld instruction is used to create a mask, but it doesn't correctly 
handle the case where the remainder is zero. If the remainder is zero, 
the mask should be zero, but the current code will still attempt to load 
and store data, which could lead to accessing out-of-bounds memory.

3. Alignment Issues:

     The data_in and data_out arrays are aligned to 64 bytes, which is 
correct for AVX-512 operations. However, the code does not ensure that 
the final partial load is also aligned, which could lead to performance 
issues or crashes on some architectures.


Back to comp.os.linux.advocacy | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Challenge For The "Expert" Tyrone Farley Flud <ff@linux.rocks> - 2025-01-28 16:14 +0000
  Re: Challenge For The "Expert" Tyrone Physfitfreak <physfitfreak@gmail.com> - 2025-01-28 10:43 -0600
    Re: Challenge For The "Expert" Tyrone Joel <joelcrump@gmail.com> - 2025-01-28 12:09 -0500
    Re: Challenge For The "Expert" Tyrone Farley Flud <ff@linux.rocks> - 2025-01-28 17:51 +0000
      Re: Challenge For The "Expert" Tyrone Joel <joelcrump@gmail.com> - 2025-01-28 12:59 -0500
      Re: Challenge For The "Expert" Tyrone Physfitfreak <physfitfreak@gmail.com> - 2025-01-28 12:25 -0600
        Re: Challenge For The "Expert" Tyrone Farley Flud <ff@linux.rocks> - 2025-01-28 21:50 +0000
          Re: Challenge For The "Expert" Tyrone Physfitfreak <physfitfreak@gmail.com> - 2025-01-28 16:21 -0600
            Re: Challenge For The "Expert" Tyrone Farley Flud <ff@linux.rocks> - 2025-01-28 22:45 +0000
              Re: Challenge For The "Expert" Tyrone Joel <joelcrump@gmail.com> - 2025-01-28 18:01 -0500
              Re: Challenge For The "Expert" Tyrone Physfitfreak <physfitfreak@gmail.com> - 2025-01-28 18:06 -0600
          Re: Challenge For The "Expert" Tyrone DFS <guhnoo-basher@linux.advocaca> - 2025-01-28 22:30 -0500
  Re: Challenge For The "Expert" Tyrone Tyrone <none@none.none> - 2025-02-01 05:50 +0000

csiph-web