Groups > comp.compilers > #2525 > unrolled thread

The different groups and segments of object files

Started by	Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com>
First post	2020-06-19 23:22 +0800
Last post	2020-06-25 22:52 +0200
Articles	10 — 4 participants

Back to article view | Back to comp.compilers

  The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> - 2020-06-19 23:22 +0800
    Re: The different groups and segments of object files gah4@u.washington.edu - 2020-06-19 14:12 -0700
      Re: The different groups and segments of object files gah4@u.washington.edu - 2020-06-21 01:50 -0700
      Re: The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 22:41 +0800
      Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-25 22:40 +0200
        Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-26 00:09 +0200
          Re: The different groups and segments of object files gah4@u.washington.edu - 2020-07-02 00:36 -0700
    Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-20 16:01 +0200
      Re: The different groups and segments of object files Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 23:13 +0800
        Re: The different groups and segments of object files Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-06-25 22:52 +0200

#2525 — The different groups and segments of object files

From	Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com>
Date	2020-06-19 23:22 +0800
Subject	The different groups and segments of object files
Message-ID	<20-06-004@comp.compilers>

Dear c.compilers,

The short story is that I've been playing with the OpenWatcom code
generator.  I can use it to make a custom object file.  The platform
is 32bit x86, though I'm happy to know how things work in 16bit and
64bit modes as well.

The code generator has options I do not fully understand, and I do
not know where I can find information on this subject.  To further
complicate matters I'm unsure what questions I should be asking.

Here, I'm /exclusively/ asking about object files, as I'm at the
moment generating 32bit code without considerations for physical
segments [1].

What are segment groups for?  The Watcom compiler interface seems to
be based on the Microsoft documentation, so I've been reading old MS
manuals as well.  There we have the following paragraph:

 > All segments with the same group name must fit into a single physical
 > segment, which is up to 64K long. This allows all segments in a group
 > to be accessed through the same segment register. The Microsoft C
 > Compiler defines one group named DGROUP.

As far as I can tell, Watcom also defines this DGROUP in 32bit mode, but
I still don't know if it has any purpose anymore.

When I am defining my own compiler, do the traditional names of object
file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the
MS manual) any meaning?  When I place my code in a _TEXT segment, the
disassembler will show a list of the instructions; this does not happen
if I name the segment FOO.  Is there any particular reason I want my
code in a _TEXT segment rather than FOO?

If this subject is covered in the Linkers & Loaders book in sufficient
details to answer my questions, I have not come across it yet.  Please
refere to a specific chapter if there's something I should read there.

[1] Or if I am, I'm not aware of it.

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

[Segment groups were useful on the '286 but are pointless on later
machines since a single segment can be the full size of the address
space. My dim recollection is that some software treated segment names
like _TEXT and _DATA specially so your life will likely be easier if
you put the code in _TEXT, the static data in _DATA, and the
uninitialized or zero initialized data in _BSS. I didn't say anything
about this in Linkers and Loaders since by that time segmented
addressing was just an historical curiosity. -John]

[toc] | [next] | [standalone]

#2526

From	gah4@u.washington.edu
Date	2020-06-19 14:12 -0700
Message-ID	<20-06-005@comp.compilers>
In reply to	#2525

On Friday, June 19, 2020 at 8:33:34 AM UTC-7, Johann 'Myrkraverk' Oskarsson wrote:

> The short story is that I've been playing with the OpenWatcom code
> generator.  I can use it to make a custom object file.  The platform
> is 32bit x86, though I'm happy to know how things work in 16bit and
> 64bit modes as well.

> The code generator has options I do not fully understand, and I do
> not know where I can find information on this subject.  To further
> complicate matters I'm unsure what questions I should be asking.

(snip)

The Watcom (at least before OpenWatcom) compilers were the only ones
I knew to generate large model (more than one segment) 32 bit code.

It might be that OS/2 even knows how to load and run it.
(I am pretty sure that Windows and Unix don't.)

In that model, pointers are 48 bit, with a 16 bit segment selector
and 32 bit offset. This would have been another way to get
past the 32 bit barrier without going to 64 bit.

Well, there is a reason not to do it, which is that processors don't
have a segment descriptor cache.  Every load of a segment selector
requires a side load of the descriptor, so too much overhead.

In any case, code that you expected to be only in the 16 bit section,
related to segments, should be in there, also.  As I noted, it is
OS/2 where this could have been used, so it might only be in the OS/2
object program format.  I don't remember now how the OS/2 and Win32
formats differ.

In 16 bit modes, you can have huge data, which allows for a multiple
segment data object, with segment selectors spaced to allow for
appropriate calculation. I believe this is only for dynamically
allocated data, but it might also allow for static allocation
in the object program.  I tried to avoid that.

[On the 386, all of the segments in a process were mapped into a
single 32 bit paged linear address space, so the total size of all
segments in a program was limited to 4G unless you did pseudo-swapping
of segments in and out of the address space, which I don't think
anyone did.

So not only were they slow, they didn't help. -John]

[toc] | [prev] | [next] | [standalone]

#2528

From	gah4@u.washington.edu
Date	2020-06-21 01:50 -0700
Message-ID	<20-06-007@comp.compilers>
In reply to	#2526

(I wrote)

> The Watcom (at least before OpenWatcom) compilers were the only ones
> I knew to generate large model (more than one segment) 32 bit code.

> It might be that OS/2 even knows how to load and run it.
> (I am pretty sure that Windows and Unix don't.)

(and our great moderator replied)

> [On the 386, all of the segments in a process were mapped into a
> single 32 bit paged linear address space, so the total size of all
> segments in a program was limited to 4G unless you did pseudo-swapping
> of segments in and out of the address space, which I don't think
> anyone did.

> So not only were they slow, they didn't help. -John]

I believe OS/2 could, but I never tried it to find out.

And yes putting a 32 bit paging unit in between doesn't help.

A program could use multiple segments totaling less than 4GB,
or with, I hadn't thought of the name but pseudo-swapping sounds
about right, use more than 4GB.  With a little locality, it should
be able to get enough done before pseudo-swapping.

But as far as I know, there is still the overhead for loading
segment descriptors.  Well, the extra two (FS and GS) could help.
But a little segment descriptor cache would have helped more.

Since OS/2 could run both 16 bit and 32 bit code, it might not have
been hard to add.

In any case, the compilers can generate the code, whether or not
any OS can run it.

There was also PAE:

https://en.wikipedia.org/wiki/Physical_Address_Extension

which I thought would help, but never tried to figure out what
it could do, or how to actually do it.
[PAE let you attach more physical memory to an x86 but didn't increase the 4G
process address space.  For that you need x86-64 or Itanium. -John]

[toc] | [prev] | [next] | [standalone]

#2541

From	Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Date	2020-06-25 22:41 +0800
Message-ID	<20-06-020@comp.compilers>
In reply to	#2526

On 20/06/2020 5:12 am, gah4@u.washington.edu wrote:
> On Friday, June 19, 2020 at 8:33:34 AM UTC-7, Johann 'Myrkraverk' Oskarsson wrote:
>
>> The short story is that I've been playing with the OpenWatcom code
>> generator.  I can use it to make a custom object file.  The platform
>> is 32bit x86, though I'm happy to know how things work in 16bit and
>> 64bit modes as well.
>
>> The code generator has options I do not fully understand, and I do
>> not know where I can find information on this subject.  To further
>> complicate matters I'm unsure what questions I should be asking.
>
> (snip)
>
> The Watcom (at least before OpenWatcom) compilers were the only ones
> I knew to generate large model (more than one segment) 32 bit code.

I'm told OpenWatcom still does this.  One hobbyist operating system
effort is based on OW for this reason.

> It might be that OS/2 even knows how to load and run it.
> (I am pretty sure that Windows and Unix don't.)
>
> In that model, pointers are 48 bit, with a 16 bit segment selector
> and 32 bit offset. This would have been another way to get
> past the 32 bit barrier without going to 64 bit.
>
> Well, there is a reason not to do it, which is that processors don't
> have a segment descriptor cache.  Every load of a segment selector
> requires a side load of the descriptor, so too much overhead.
>
> In any case, code that you expected to be only in the 16 bit section,
> related to segments, should be in there, also.  As I noted, it is
> OS/2 where this could have been used, so it might only be in the OS/2
> object program format.  I don't remember now how the OS/2 and Win32
> formats differ.

OS/2 still exists.  It's being sold as a distro called ArcaOS.  I
haven't tried this on ArcaOS either (though I have it).

> In 16 bit modes, you can have huge data, which allows for a multiple
> segment data object, with segment selectors spaced to allow for
> appropriate calculation. I believe this is only for dynamically
> allocated data, but it might also allow for static allocation
> in the object program.  I tried to avoid that.

Can you please elaborate on what you tried to avoid?  Or refer to a
reading material?

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

[On the 286 huge data was extremely slow since it required several
shifts and masks for every address reference. Re segments and segment
groups, the ones that survive are code (text), static R/W data, static
R/O data, and uninitialized data (bss). There's others for dynamic
linking and runtime symbol tables but they don't seem relevant here.
-John]

[toc] | [prev] | [next] | [standalone]

#2544

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2020-06-25 22:40 +0200
Message-ID	<20-06-023@comp.compilers>
In reply to	#2526

Am 19.06.2020 um 23:12 schrieb gah4@u.washington.edu:

> In 16 bit modes, you can have huge data, which allows for a multiple
> segment data object, with segment selectors spaced to allow for
> appropriate calculation. I believe this is only for dynamically
> allocated data, but it might also allow for static allocation
> in the object program.  I tried to avoid that.

No need to avoid that. There exist huge segment descriptors that allow
for 32 bit offsets (EAX...) on a 16 bit system without special address
calculations. Microsoft documented that feature in parallel to the
traditional modulo calculations for segment/offset. AFAIR the starting
segment selector allowed to access at least 512 MB (8 consecutive
segments). This feature did not allow to break the 4GB total limit, but
it allowed for contiguous data areas bigger than 64KB.

DoDi
[Nobody I knew used huge model because it was so very very slow. There
was no way on 16-bit x86 to access more than 64K at a time without runtime
segment calculations. -John]

[toc] | [prev] | [next] | [standalone]

#2545

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2020-06-26 00:09 +0200
Message-ID	<20-06-024@comp.compilers>
In reply to	#2544

Am 25.06.2020 um 22:40 schrieb Hans-Peter Diettrich:

> AFAIR the starting
> segment selector allowed to access at least 512 MB (8 consecutive
> segments). This feature did not allow to break the 4GB total limit, but
> it allowed for contiguous data areas bigger than 64KB.
>
> DoDi
> [Nobody I knew used huge model because it was so very very slow. There
> was no way on 16-bit x86 to access more than 64K at a time without runtime
> segment calculations. -John]

The 16/32 bit address size prefix inverts the default address size, i.e.
allows to use 32 bit addresses and offset registers with 16 bit code.

I used 32 bit offsets in 16 bit code with the Borland compilers,
disassembled the generated machine code. I never used the HUGE model.

DoDi
[Oh, running on a 386, sure, that would have worked. -John]

[toc] | [prev] | [next] | [standalone]

#2547

From	gah4@u.washington.edu
Date	2020-07-02 00:36 -0700
Message-ID	<20-07-001@comp.compilers>
In reply to	#2545

On Thursday, June 25, 2020 at 5:15:06 PM UTC-7, Hans-Peter Diettrich wrote:

(snip)

> The 16/32 bit address size prefix inverts the default address size, i.e.
> allows to use 32 bit addresses and offset registers with 16 bit code.

> I used 32 bit offsets in 16 bit code with the Borland compilers,
> disassembled the generated machine code. I never used the HUGE model.

> DoDi
> [Oh, running on a 386, sure, that would have worked. -John]

I suppose that works in real mode, but not in protected mode.

Well, I think the thing that makes OS/2 2.0 and up unusual is that
they know about both 16 bit segment descriptors and 32 bit descriptors.
That is, can run both 16 bit protected mode code and 32 bit protected
mode code.

Using both in the same program would be a little strange, though.

In any case, Watcom has a separate set of compilers for 16 bit
and 32 bit code generation. Some can be cross compilers, run on
other systems than the code they generate.

I had forgotten until I just looked it up, how the differences work.
Both use 64 bit descriptors, with origin, length, and some other bits.
For 16 bit, the length is 16 bit, with granularity of one byte.

In 286 days, I found this very useful for tracking down subscript
bounds errors by allocating a segment of the exact length.

The length field is 20 bits, but 32 bit segments have a 4K byte
granularity for length so they can describe 4GB segments.

[toc] | [prev] | [next] | [standalone]

#2527

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2020-06-20 16:01 +0200
Message-ID	<20-06-006@comp.compilers>
In reply to	#2525

Am 19.06.2020 um 17:22 schrieb Johann 'Myrkraverk' Oskarsson:

> When I am defining my own compiler, do the traditional names of object
> file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the
> MS manual) any meaning?  When I place my code in a _TEXT segment, the
> disassembler will show a list of the instructions; this does not happen
> if I name the segment FOO.  Is there any particular reason I want my
> code in a _TEXT segment rather than FOO?

I think that it's not the segment name that enforces special handling
but segment attributes. Eventually these attributes are defined in the
various segment groups only, not at segment level?

It depends on the target OS and hardware whether deailed hardware
protection is applied to memory segments. Code segments can be read-only
or execute-only, all others non-execute. Const segments also can/should
be read-only. The same protection schemes can apply to paged memory. A
compiler only can assign attributes to memory sections, the use of these
attributes depends on the linker and loader for the target platform.

DoDi

[toc] | [prev] | [next] | [standalone]

#2542

From	Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Date	2020-06-25 23:13 +0800
Message-ID	<20-06-021@comp.compilers>
In reply to	#2527

On 20/06/2020 10:01 pm, Hans-Peter Diettrich wrote:
> Am 19.06.2020 um 17:22 schrieb Johann 'Myrkraverk' Oskarsson:
>
>> When I am defining my own compiler, do the traditional names of object
>> file segments, _TEXT, _DATA, STACK, _BSS, CONST (to name a few from the
>> MS manual) any meaning?  When I place my code in a _TEXT segment, the
>> disassembler will show a list of the instructions; this does not happen
>> if I name the segment FOO.  Is there any particular reason I want my
>> code in a _TEXT segment rather than FOO?
>
> I think that it's not the segment name that enforces special handling
> but segment attributes. Eventually these attributes are defined in the
> various segment groups only, not at segment level?
>
> It depends on the target OS and hardware whether deailed hardware
> protection is applied to memory segments. Code segments can be read-only
> or execute-only, all others non-execute. Const segments also can/should
> be read-only. The same protection schemes can apply to paged memory. A
> compiler only can assign attributes to memory sections, the use of these
> attributes depends on the linker and loader for the target platform.

For this exercise, I'm using the OpenWatcom code generator.  This
generator comes is several flavours, though I'm using only one for now,
the 386 [others are i86 (16bit), Alpha, PPC, MIPS, and a nonfunctional
S/370] and supports several operating systems.

The code generator allows me to specify an attribute per object file
segment, EXEC, GLOBAL, ROM, INIT, COMMON, PRIVATE; to name the few that
seem to have meaning outside OW itself.  So they seem to be handled per
segment, rather than segment groups.

The group name is then queried by the code generator using a callback
mechanism.  It doesn't seem documented under what circumstances (how
often) the code generator queries this, and I haven't browsed the code
nor done enough experiments to see what the granularity is.

Do you know if group names have any meaning in the modern world?  DGROUP
seems like a traditional name.  The OW disassembler tells me this when I
apply it to output from the C compiler, GROUP: 'DGROUP' CONST, CONST2,
_DATA.  This group does not seem to list the _TEXT segment, which is
where the code lives.  [Note, the code generator does not make assembly
files, it generates object code directly; unlike some other compilers.]

What I'm getting at is that, it's not so important what this code
generator does or doesn't do, but why I should choose one group name
over others; or the segment names [which as John has mentioned, choosing
the traditional names seems to make it much easier to interact with
existing linkers and object files.]

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
[Oops, see comment on previous message. -John]

[toc] | [prev] | [next] | [standalone]

#2543

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2020-06-25 22:52 +0200
Message-ID	<20-06-022@comp.compilers>
In reply to	#2542

Am 25.06.2020 um 17:13 schrieb Johann 'Myrkraverk' Oskarsson:

> Do you know if group names have any meaning in the modern world?  DGROUP
> seems like a traditional name.  The OW disassembler tells me this when I
> apply it to output from the C compiler, GROUP: 'DGROUP' CONST, CONST2,
> _DATA.

The segment groups allowed for a single common base (selector) to access
data segments from various object modules.

> This group does not seem to list the _TEXT segment, which is
> where the code lives.

Right, DGROUP names the default DATA group. One could have multiple data
groups, equivalent to FORTRAN COMMON areas.

I was wondering about the use of the segment registers of the 8086 until
I came across the segment group model. Only then those composite
addresses made sense, without reloading a sement register with every
single data access.

DoDi

[toc] | [prev] | [standalone]

csiph-web

The different groups and segments of object files

Contents

#2525 — The different groups and segments of object files

#2526

#2528

#2541

#2544

#2545

#2547

#2527

#2542

#2543