Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.os2.programmer.misc > #1607

Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean?

From "Andi B." <andi.b@gmx.net>
Newsgroups comp.os.os2.programmer.misc
Subject Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean?
Date 2017-12-31 12:02 +0100
Organization A noiseless patient Spider
Message-ID <p2ag4l$sck$3@dont-email.me> (permalink)
References <p1vsob$l1p$1@dont-email.me> <11p86vVJT4Oe-pn2-bkgdMpp7SV6j@slamain> <p252th$6o6$1@dont-email.me> <11p86vVJT4Oe-pn2-yeSdUENdor3Y@slamain>

Show all headers | View raw


Steven Levine schrieb:
> On Fri, 29 Dec 2017 09:46:30 UTC, "Andi B." <andi.b@gmx.net> wrote:
>
> HI Andi,
>
>> > To verify this you need to look at the content of the
>> > ExceptionReportRecord.
>
>> Have to learn about.
>
> The structures are well documented, with the exception of the FP
> specific data.  A pointer to the Exception Report Record is passed to
> the handler.  If you dump the data as dwords, it's readable if you
> works out the field offsets.
>
>> Yes. All in the runtime and I do not find a way back to the caller.
>
> Most likely because some of the code is not using standard stack
> frames.  It's also possible the stack is corrupted.  What you need to
> do in this case is dump the stack as dwords and walk the stack by
> hand.
>
>> I can narrow down the problem to code like this -
>> typedef struct _DIM {
>>    HMODULE        hmod;
>>    ULONG          ulModuleId;
>>    CHAR           szModuleBaseName[32];
>>    ULONG          ulDriverCount;
>>    PFNIDS         pfnids;
>> <SNIP>
>> } DIM, *PDIM;
>>
>> static   PDIM  padim = NULL;
>>
>>     padim = malloc( ulDimTableSize);  // actually about 215 bytes in my case
>>     if (!padim)
>>        {
>>        rc = ERROR_NOT_ENOUGH_MEMORY;
>>        break;
>>        }
>>     memset( padim, 0xAA, ulDimTableSize);
>>     strcpy( padim->szModuleBaseName, "TestStringAB_TEST");
>>
>> The strcpy triggers the exception in ICAT.
>
> Can I assume that this is the code near src\lib\drvapi\drvaccess.c:86?

Yes.

>
> FWIW, I've implemented xwlan fixes in the past so I am somewhat
> familiar with the code.
>
> The buffer size is defined by:
>
>     ulDimTableSize = ulModuleCount * sizeof( DIM);
>
> Did you check ulModuleCount?  If WtkLoadModules returns 0 modules, the
> memset will succeed, but the strcpy will trap.

Yes. Moreover I added the strcpy above by myself to reassure. You may notice my comment 
about the 215 bytes which malloc successfully allocated a few lines above in the code I 
posted here (slightly changed to drvaccess). malloc allocates successfully, memset sets it 
correctly, my added strcpy line triggers the exception but letting the exception handler 
running strcpy worked as expected.

My code now is -

    TraceAB("ulDimTableSize=%d\n", ulDimTableSize);
    _interrupt(3);	// ICAT stops here as expected
    padim = malloc( ulDimTableSize);	// ulDimTableSize is 216
    if (!padim)		// padim is 0x00494130 = valid
       {
       rc = ERROR_NOT_ENOUGH_MEMORY;
       break;
       }
    memset( padim, 0xAA, ulDimTableSize);// padim including the string region is filled 
correctly
    TraceAB("padim=0x%08X\n", padim);  // additional trace messages writes to file and com1
    TraceAB("padim->szModuleBaseName=0x%08X\n", padim->szModuleBaseName);
    strcpy( padim->szModuleBaseName, "TestStringAB_TEST"); // <--- this triggers exception
    TraceAB("padim=0x%08X\n", padim);	

I've uploaded the the passtru window content here ' 
https://www.pic-upload.de/view-34567254/icat_xwlan_trap.png.html ' as the newsgroup does 
not allow attachments. This is when I tried to 'Step over' the strcpy line then in the 
exception dialog 'Examine....' and the reading from passtru. Register monitor and call 
stack window says the same.

The TraceAB function logs the printf style message to a file and in parallel sends it out 
at com1. So in parallel to running ICAT a see the TraceAB messages in pmdf (or zoc) at the 
same time on the host. Just to assure this is really this special strcpy which triggers 
the problem (with or without running ICAT).

For completeness here what pmdf says when running the above code (to rule out ICAT) 
including my TraceAB messages -

wlanDriverAccessInitialize
Symbols linked (genmac)
Symbols linked (genprism)
TrcMsgV len=48 (XWLAN:    0: Loading Driver Modules, count 2    )
WtkLoadModules done
ulDimTableSize=216
eax=000b0a6b ebx=00000000 ecx=00485020 edx=000003f8 esi=00000000 edi=00000000
eip=0003fe91 esp=000f7300 ebp=000f756c iopl=0 -- -- -- up ei pl nz ac pe nc
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00
005b:0003fe91 cc             int     3
##g
padim=0x00494130
padim->szModuleBaseName=0x00494138
Trap 13 (0DH) - General Protection Fault 0000
eax=61767264 ebx=61767264 ecx=61767264 edx=00000008 esi=00000000 edi=00000008
eip=0006427c esp=000f71d0 ebp=000f71ec iopl=0 rf -- -- nv up ei pl nz na po nc
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00
005b:0006427c 8a19           mov       bl,byte ptr [ecx]    ds:61767264=invalid
##k
005b:000642ef 01010101 01010101 01010101 01010101 _get_stack_trace + 5f
005b:0005ff0e 00480000 004970e4 000f7244 000665c4 _int_uheap_verify + 6de
005b:0005fc1d 00480000 00000000 00081f3c 000001a9 _int_uheap_verify + 3ed
005b:00061fbb 00480000 00081f3c 000001a9 00000001 _chk_if_heap + bb
005b:0005a537 00494138 000f760c 00081f3c 000001a9 _debug_strcpy + 47
005b:73656363 00632e73 6d75645f 6e6f4370 7463656e
##

>
> If that's not it, I would switch icat to assembly mode and step though
> the strcpy code.

I've done that before and went down _chk_if_heap(dbgstr) / _int_uheap_verify(rdbg) / 
_add_item 1a3(rdbg) / _get_stack_trace(memport) / _validate_ptr(memport). I then decided 
this all is more a 'debug kernel' or debugging (ICAT - pmdf) problem/behavior than a real 
application problem.

> When you get to the movs instruction, look at ESI
> and EDI, the source and destination addresses respectively.  ECX will
> be the copy count.

IIRC these all worked fine (memory display proves the string is copied correctly) but 
afterwards the _validate_ptr thinks there is something wrong (while I think it isn't).

We can go down this road again if you like. But I think we need realtime IRC chat in 
parallel.

>
>> I do not see why this should trash the stack so my above assumption is probably wrong.
>
> This looks much more like a heap issue, than a stack issue.

ok.

>
> The good thing is
>> - while being there I found out that http://trac.netlabs.org/wpstk does no compile with
>> VAC anymore and needs attention too.
>> So no problem finding new tasks.
>
> :-)
>
>> I've 'set CAT_KDB_INIT="vsf *"'. At the first sight I did not even find any reference to
>> vsf and vc except on your page.
>
> The V command is pretty much fully documented in the OS/2 Debugging
> Handbook.
>
>>And not much info in ICAT files.
>
> I would not expect the ICAT docs to cover this in much detail, since
> it is covered elsewhere.  icatfaq.html does show how to use  SET
> CAT_KDB_INIT to do what is typically done in kdb.ini
>
> What you want to use is:
>
> CAT_KDB_INIT="vsf *;vce"
>
> to let the kernel handle page faults normally.

I found this on your page and set it that way. Although I still didn't read the debugging 
handbook and don't really understand what the v* command does. But I still hope I can live 
without knowing the deeper details ;-). Idebug has a list box with the various exceptions 
to be selected. Something I didn't find in ICAT.

>
>>If I ever would find the
>> time to learn more about these basics in debugging....
>
> Necessity is the mother of invention, as they say. :-)
>
>> Maybe I should add exceptq to wlanstat an let your trap tool decode what's going wrong
>> then playing endless hours with ICAT and trying to decode myself.
>
> It's likely to be better in the long run, especially if an issue comes
> up on someone else's system.

Done although not yet tested. Problem is what I see here only happens with the debug 
kernel. And with that I never can run wlanstat to the point where exceptq does its thing. 
Running the same wlanstat app (without the int3) on the retail kernel runs without problems.

>
>>Moreover the starting
>> problem seems to be unrelated to what I'm looking here anyway.
>
> Yes, I tend to agree.  If I knew EDI at the time of the trap, I would
> probably know for sure.  Since the code is continues normally after
> the exception this is likely.  You can see the registers if you do
>
>    r
>
> in the PassThru window.  This will also tell you exactly what the
> kernel debugger thinks the trap is.
>
>>Maybe you want to have a
>> look at - http://trac.netlabs.org/xwlan/ticket/46 which is the reason why I'm started to
>> play ICAT.
>
> This one is definityly definitely stack corruption and exceptq will
> help because if you have symbols installed, it will give you a name
> for the EIP address.

To my eyes this looks very similar to what I see here.

Andreas

>
> Steven
>

Back to comp.os.os2.programmer.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

ICAT 'XCPT_BAD_ACCESS' - what does it mean? "Andi B." <andi.b@gmx.net> - 2017-12-27 11:30 +0100
  Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? Lars Erdmann <lars.erdmann@arcor.de> - 2017-12-27 13:53 +0100
    Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? Lars Erdmann <lars.erdmann@arcor.de> - 2017-12-27 13:55 +0100
  Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? "Steven Levine" <steve53@nomail.earthlink.net> - 2017-12-27 14:23 -0600
    Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? "Andi B." <andi.b@gmx.net> - 2017-12-29 10:46 +0100
      Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? "Steven Levine" <steve53@nomail.earthlink.net> - 2017-12-30 12:05 -0600
        Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? "Andi B." <andi.b@gmx.net> - 2017-12-31 12:02 +0100
          Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? Lars Erdmann <lars.erdmann@arcor.de> - 2017-12-31 21:30 +0100
          Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean? Paul Ratcliffe <abuse@orac12.clara34.co56.uk78> - 2018-01-03 19:26 +0000

csiph-web