Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2525 > unrolled thread
| Started by | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> |
|---|---|
| First post | 2020-06-19 23:22 +0800 |
| Last post | 2020-06-25 22:52 +0200 |
| Articles | 10 — 4 participants |
Back to article view | Back to comp.compilers
The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> - 2020-06-19 23:22 +0800
Re: The different groups and segments of object files gah4@u.washington.edu - 2020-06-19 14:12 -0700
Re: The different groups and segments of object files gah4@u.washington.edu - 2020-06-21 01:50 -0700
Re: The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 22:41 +0800
Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-25 22:40 +0200
Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-26 00:09 +0200
Re: The different groups and segments of object files gah4@u.washington.edu - 2020-07-02 00:36 -0700
Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-20 16:01 +0200
Re: The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 23:13 +0800
Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-25 22:52 +0200
| From | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> |
|---|---|
| Date | 2020-06-19 23:22 +0800 |
| Subject | The different groups and segments of object files |
| Message-ID | <20-06-004@comp.compilers> |
Dear c.compilers, The short story is that I've been playing with the OpenWatcom code generator. I can use it to make a custom object file. The platform is 32bit x86, though I'm happy to know how things work in 16bit and 64bit modes as well. The code generator has options I do not fully understand, and I do not know where I can find information on this subject. To further complicate matters I'm unsure what questions I should be asking. Here, I'm /exclusively/ asking about object files, as I'm at the moment generating 32bit code without considerations for physical segments [1]. What are segment groups for? The Watcom compiler interface seems to be based on the Microsoft documentation, so I've been reading old MS manuals as well. There we have the following paragraph: > All segments with the same group name must fit into a single physical > segment, which is up to 64K long. This allows all segments in a group > to be accessed through the same segment register. The Microsoft C > Compiler defines one group named DGROUP. As far as I can tell, Watcom also defines this DGROUP in 32bit mode, but I still don't know if it has any purpose anymore. When I am defining my own compiler, do the traditional names of object file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the MS manual) any meaning? When I place my code in a _TEXT segment, the disassembler will show a list of the instructions; this does not happen if I name the segment FOO. Is there any particular reason I want my code in a _TEXT segment rather than FOO? If this subject is covered in the Linkers & Loaders book in sufficient details to answer my questions, I have not come across it yet. Please refere to a specific chapter if there's something I should read there. [1] Or if I am, I'm not aware of it. -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk [Segment groups were useful on the '286 but are pointless on later machines since a single segment can be the full size of the address space. My dim recollection is that some software treated segment names like _TEXT and _DATA specially so your life will likely be easier if you put the code in _TEXT, the static data in _DATA, and the uninitialized or zero initialized data in _BSS. I didn't say anything about this in Linkers and Loaders since by that time segmented addressing was just an historical curiosity. -John]
[toc] | [next] | [standalone]
| From | gah4@u.washington.edu |
|---|---|
| Date | 2020-06-19 14:12 -0700 |
| Message-ID | <20-06-005@comp.compilers> |
| In reply to | #2525 |
On Friday, June 19, 2020 at 8:33:34 AM UTC-7, Johann 'Myrkraverk' Oskarsson wrote: > The short story is that I've been playing with the OpenWatcom code > generator. I can use it to make a custom object file. The platform > is 32bit x86, though I'm happy to know how things work in 16bit and > 64bit modes as well. > The code generator has options I do not fully understand, and I do > not know where I can find information on this subject. To further > complicate matters I'm unsure what questions I should be asking. (snip) The Watcom (at least before OpenWatcom) compilers were the only ones I knew to generate large model (more than one segment) 32 bit code. It might be that OS/2 even knows how to load and run it. (I am pretty sure that Windows and Unix don't.) In that model, pointers are 48 bit, with a 16 bit segment selector and 32 bit offset. This would have been another way to get past the 32 bit barrier without going to 64 bit. Well, there is a reason not to do it, which is that processors don't have a segment descriptor cache. Every load of a segment selector requires a side load of the descriptor, so too much overhead. In any case, code that you expected to be only in the 16 bit section, related to segments, should be in there, also. As I noted, it is OS/2 where this could have been used, so it might only be in the OS/2 object program format. I don't remember now how the OS/2 and Win32 formats differ. In 16 bit modes, you can have huge data, which allows for a multiple segment data object, with segment selectors spaced to allow for appropriate calculation. I believe this is only for dynamically allocated data, but it might also allow for static allocation in the object program. I tried to avoid that. [On the 386, all of the segments in a process were mapped into a single 32 bit paged linear address space, so the total size of all segments in a program was limited to 4G unless you did pseudo-swapping of segments in and out of the address space, which I don't think anyone did. So not only were they slow, they didn't help. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4@u.washington.edu |
|---|---|
| Date | 2020-06-21 01:50 -0700 |
| Message-ID | <20-06-007@comp.compilers> |
| In reply to | #2526 |
(I wrote) > The Watcom (at least before OpenWatcom) compilers were the only ones > I knew to generate large model (more than one segment) 32 bit code. > It might be that OS/2 even knows how to load and run it. > (I am pretty sure that Windows and Unix don't.) (and our great moderator replied) > [On the 386, all of the segments in a process were mapped into a > single 32 bit paged linear address space, so the total size of all > segments in a program was limited to 4G unless you did pseudo-swapping > of segments in and out of the address space, which I don't think > anyone did. > So not only were they slow, they didn't help. -John] I believe OS/2 could, but I never tried it to find out. And yes putting a 32 bit paging unit in between doesn't help. A program could use multiple segments totaling less than 4GB, or with, I hadn't thought of the name but pseudo-swapping sounds about right, use more than 4GB. With a little locality, it should be able to get enough done before pseudo-swapping. But as far as I know, there is still the overhead for loading segment descriptors. Well, the extra two (FS and GS) could help. But a little segment descriptor cache would have helped more. Since OS/2 could run both 16 bit and 32 bit code, it might not have been hard to add. In any case, the compilers can generate the code, whether or not any OS can run it. There was also PAE: https://en.wikipedia.org/wiki/Physical_Address_Extension which I thought would help, but never tried to figure out what it could do, or how to actually do it. [PAE let you attach more physical memory to an x86 but didn't increase the 4G process address space. For that you need x86-64 or Itanium. -John]
[toc] | [prev] | [next] | [standalone]
| From | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> |
|---|---|
| Date | 2020-06-25 22:41 +0800 |
| Message-ID | <20-06-020@comp.compilers> |
| In reply to | #2526 |
On 20/06/2020 5:12 am, gah4@u.washington.edu wrote: > On Friday, June 19, 2020 at 8:33:34 AM UTC-7, Johann 'Myrkraverk' Oskarsson wrote: > >> The short story is that I've been playing with the OpenWatcom code >> generator. I can use it to make a custom object file. The platform >> is 32bit x86, though I'm happy to know how things work in 16bit and >> 64bit modes as well. > >> The code generator has options I do not fully understand, and I do >> not know where I can find information on this subject. To further >> complicate matters I'm unsure what questions I should be asking. > > (snip) > > The Watcom (at least before OpenWatcom) compilers were the only ones > I knew to generate large model (more than one segment) 32 bit code. I'm told OpenWatcom still does this. One hobbyist operating system effort is based on OW for this reason. > It might be that OS/2 even knows how to load and run it. > (I am pretty sure that Windows and Unix don't.) > > In that model, pointers are 48 bit, with a 16 bit segment selector > and 32 bit offset. This would have been another way to get > past the 32 bit barrier without going to 64 bit. > > Well, there is a reason not to do it, which is that processors don't > have a segment descriptor cache. Every load of a segment selector > requires a side load of the descriptor, so too much overhead. > > In any case, code that you expected to be only in the 16 bit section, > related to segments, should be in there, also. As I noted, it is > OS/2 where this could have been used, so it might only be in the OS/2 > object program format. I don't remember now how the OS/2 and Win32 > formats differ. OS/2 still exists. It's being sold as a distro called ArcaOS. I haven't tried this on ArcaOS either (though I have it). > In 16 bit modes, you can have huge data, which allows for a multiple > segment data object, with segment selectors spaced to allow for > appropriate calculation. I believe this is only for dynamically > allocated data, but it might also allow for static allocation > in the object program. I tried to avoid that. Can you please elaborate on what you tried to avoid? Or refer to a reading material? -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk [On the 286 huge data was extremely slow since it required several shifts and masks for every address reference. Re segments and segment groups, the ones that survive are code (text), static R/W data, static R/O data, and uninitialized data (bss). There's others for dynamic linking and runtime symbol tables but they don't seem relevant here. -John]
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-06-25 22:40 +0200 |
| Message-ID | <20-06-023@comp.compilers> |
| In reply to | #2526 |
Am 19.06.2020 um 23:12 schrieb gah4@u.washington.edu: > In 16 bit modes, you can have huge data, which allows for a multiple > segment data object, with segment selectors spaced to allow for > appropriate calculation. I believe this is only for dynamically > allocated data, but it might also allow for static allocation > in the object program. I tried to avoid that. No need to avoid that. There exist huge segment descriptors that allow for 32 bit offsets (EAX...) on a 16 bit system without special address calculations. Microsoft documented that feature in parallel to the traditional modulo calculations for segment/offset. AFAIR the starting segment selector allowed to access at least 512 MB (8 consecutive segments). This feature did not allow to break the 4GB total limit, but it allowed for contiguous data areas bigger than 64KB. DoDi [Nobody I knew used huge model because it was so very very slow. There was no way on 16-bit x86 to access more than 64K at a time without runtime segment calculations. -John]
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-06-26 00:09 +0200 |
| Message-ID | <20-06-024@comp.compilers> |
| In reply to | #2544 |
Am 25.06.2020 um 22:40 schrieb Hans-Peter Diettrich: > AFAIR the starting > segment selector allowed to access at least 512 MB (8 consecutive > segments). This feature did not allow to break the 4GB total limit, but > it allowed for contiguous data areas bigger than 64KB. > > DoDi > [Nobody I knew used huge model because it was so very very slow. There > was no way on 16-bit x86 to access more than 64K at a time without runtime > segment calculations. -John] The 16/32 bit address size prefix inverts the default address size, i.e. allows to use 32 bit addresses and offset registers with 16 bit code. I used 32 bit offsets in 16 bit code with the Borland compilers, disassembled the generated machine code. I never used the HUGE model. DoDi [Oh, running on a 386, sure, that would have worked. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4@u.washington.edu |
|---|---|
| Date | 2020-07-02 00:36 -0700 |
| Message-ID | <20-07-001@comp.compilers> |
| In reply to | #2545 |
On Thursday, June 25, 2020 at 5:15:06 PM UTC-7, Hans-Peter Diettrich wrote: (snip) > The 16/32 bit address size prefix inverts the default address size, i.e. > allows to use 32 bit addresses and offset registers with 16 bit code. > I used 32 bit offsets in 16 bit code with the Borland compilers, > disassembled the generated machine code. I never used the HUGE model. > DoDi > [Oh, running on a 386, sure, that would have worked. -John] I suppose that works in real mode, but not in protected mode. Well, I think the thing that makes OS/2 2.0 and up unusual is that they know about both 16 bit segment descriptors and 32 bit descriptors. That is, can run both 16 bit protected mode code and 32 bit protected mode code. Using both in the same program would be a little strange, though. In any case, Watcom has a separate set of compilers for 16 bit and 32 bit code generation. Some can be cross compilers, run on other systems than the code they generate. I had forgotten until I just looked it up, how the differences work. Both use 64 bit descriptors, with origin, length, and some other bits. For 16 bit, the length is 16 bit, with granularity of one byte. In 286 days, I found this very useful for tracking down subscript bounds errors by allocating a segment of the exact length. The length field is 20 bits, but 32 bit segments have a 4K byte granularity for length so they can describe 4GB segments.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-06-20 16:01 +0200 |
| Message-ID | <20-06-006@comp.compilers> |
| In reply to | #2525 |
Am 19.06.2020 um 17:22 schrieb Johann 'Myrkraverk' Oskarsson: > When I am defining my own compiler, do the traditional names of object > file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the > MS manual) any meaning? When I place my code in a _TEXT segment, the > disassembler will show a list of the instructions; this does not happen > if I name the segment FOO. Is there any particular reason I want my > code in a _TEXT segment rather than FOO? I think that it's not the segment name that enforces special handling but segment attributes. Eventually these attributes are defined in the various segment groups only, not at segment level? It depends on the target OS and hardware whether deailed hardware protection is applied to memory segments. Code segments can be read-only or execute-only, all others non-execute. Const segments also can/should be read-only. The same protection schemes can apply to paged memory. A compiler only can assign attributes to memory sections, the use of these attributes depends on the linker and loader for the target platform. DoDi
[toc] | [prev] | [next] | [standalone]
| From | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> |
|---|---|
| Date | 2020-06-25 23:13 +0800 |
| Message-ID | <20-06-021@comp.compilers> |
| In reply to | #2527 |
On 20/06/2020 10:01 pm, Hans-Peter Diettrich wrote: > Am 19.06.2020 um 17:22 schrieb Johann 'Myrkraverk' Oskarsson: > >> When I am defining my own compiler, do the traditional names of object >> file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the >> MS manual) any meaning? When I place my code in a _TEXT segment, the >> disassembler will show a list of the instructions; this does not happen >> if I name the segment FOO. Is there any particular reason I want my >> code in a _TEXT segment rather than FOO? > > I think that it's not the segment name that enforces special handling > but segment attributes. Eventually these attributes are defined in the > various segment groups only, not at segment level? > > It depends on the target OS and hardware whether deailed hardware > protection is applied to memory segments. Code segments can be read-only > or execute-only, all others non-execute. Const segments also can/should > be read-only. The same protection schemes can apply to paged memory. A > compiler only can assign attributes to memory sections, the use of these > attributes depends on the linker and loader for the target platform. For this exercise, I'm using the OpenWatcom code generator. This generator comes is several flavours, though I'm using only one for now, the 386 [others are i86 (16bit), Alpha, PPC, MIPS, and a nonfunctional S/370] and supports several operating systems. The code generator allows me to specify an attribute per object file segment, EXEC, GLOBAL, ROM, INIT, COMMON, PRIVATE; to name the few that seem to have meaning outside OW itself. So they seem to be handled per segment, rather than segment groups. The group name is then queried by the code generator using a callback mechanism. It doesn't seem documented under what circumstances (how often) the code generator queries this, and I haven't browsed the code nor done enough experiments to see what the granularity is. Do you know if group names have any meaning in the modern world? DGROUP seems like a traditional name. The OW disassembler tells me this when I apply it to output from the C compiler, GROUP: 'DGROUP' CONST, CONST2, _DATA. This group does not seem to list the _TEXT segment, which is where the code lives. [Note, the code generator does not make assembly files, it generates object code directly; unlike some other compilers.] What I'm getting at is that, it's not so important what this code generator does or doesn't do, but why I should choose one group name over others; or the segment names [which as John has mentioned, choosing the traditional names seems to make it much easier to interact with existing linkers and object files.] -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk [Oops, see comment on previous message. -John]
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-06-25 22:52 +0200 |
| Message-ID | <20-06-022@comp.compilers> |
| In reply to | #2542 |
Am 25.06.2020 um 17:13 schrieb Johann 'Myrkraverk' Oskarsson: > Do you know if group names have any meaning in the modern world? DGROUP > seems like a traditional name. The OW disassembler tells me this when I > apply it to output from the C compiler, GROUP: 'DGROUP' CONST, CONST2, > _DATA. The segment groups allowed for a single common base (selector) to access data segments from various object modules. > This group does not seem to list the _TEXT segment, which is > where the code lives. Right, DGROUP names the default DATA group. One could have multiple data groups, equivalent to FORTRAN COMMON areas. I was wondering about the use of the segment registers of the 8086 until I came across the segment group model. Only then those composite addresses made sense, without reloading a sement register with every single data access. DoDi
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web