Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #123009 > unrolled thread
| Started by | James Harris <james.harris.1@gmail.com> |
|---|---|
| First post | 2017-11-19 21:11 +0000 |
| Last post | 2018-02-10 05:46 +0000 |
| Articles | 20 on this page of 42 — 11 participants |
Back to article view | Back to comp.lang.c
Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-19 21:11 +0000
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-23 00:50 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-26 15:55 +0000
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-26 23:55 +0000
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-11-27 06:43 +0000
Re: Recasting data as long ints and chars "Pascal J. Bourguignon" <pjb@informatimago.com> - 2017-11-27 09:26 +0100
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-11-27 08:50 +0000
Re: Recasting data as long ints and chars "James R. Kuyper" <jameskuyper@verizon.net> - 2017-11-27 10:35 -0500
Re: Recasting data as long ints and chars supercat@casperkitty.com - 2017-11-27 08:00 -0800
Re: Recasting data as long ints and chars "Pascal J. Bourguignon" <pjb@informatimago.com> - 2017-11-27 23:49 +0100
Re: Recasting data as long ints and chars jameskuyper@verizon.net - 2017-11-27 15:56 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-15 12:45 -0800
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-27 15:08 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-27 14:55 +0000
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-27 15:51 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-28 17:13 +0000
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-29 01:07 +0000
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-11-29 18:59 +0000
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-30 00:53 +0000
Re: Recasting data as long ints and chars Keith Thompson <kst-u@mib.org> - 2017-11-29 17:01 -0800
Re: Recasting data as long ints and chars Ben Bacarisse <ben.usenet@bsb.me.uk> - 2017-11-30 11:40 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-30 19:53 +0000
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-11-29 18:37 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-11-30 22:38 +0000
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-12-07 05:06 +0000
Re: Recasting data as long ints and chars supercat@casperkitty.com - 2017-12-07 07:51 -0800
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2017-12-08 23:14 +0000
Re: Recasting data as long ints and chars supercat@casperkitty.com - 2017-12-12 16:19 -0800
Re: Recasting data as long ints and chars Keith Thompson <kst-u@mib.org> - 2017-12-12 17:10 -0800
Re: Recasting data as long ints and chars supercat@casperkitty.com - 2017-12-13 09:02 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-15 12:39 -0800
Re: Recasting data as long ints and chars Robert Wessel <robertwessel2@yahoo.com> - 2017-12-15 17:39 -0600
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-15 15:52 -0800
Re: Recasting data as long ints and chars Robert Wessel <robertwessel2@yahoo.com> - 2017-12-16 00:35 -0600
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-17 14:02 -0800
Re: Recasting data as long ints and chars "Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid> - 2017-12-17 15:15 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-17 18:47 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-17 14:09 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-17 14:19 -0800
Re: Recasting data as long ints and chars herrmannsfeldt@gmail.com - 2017-12-17 14:45 -0800
Re: Recasting data as long ints and chars Spiros Bousbouras <spibou@gmail.com> - 2017-12-13 19:13 +0000
Re: Recasting data as long ints and chars James Harris <james.harris.1@gmail.com> - 2018-02-10 05:46 +0000
Page 1 of 3 [1] 2 3 Next page →
| From | James Harris <james.harris.1@gmail.com> |
|---|---|
| Date | 2017-11-19 21:11 +0000 |
| Subject | Recasting data as long ints and chars |
| Message-ID | <ouss2m$sfm$1@dont-email.me> |
IME it's unusual to need to view C variables in their "raw" form but I seem to have come across a requirement to do just that. A basic solution would be easy enough but I wondered if you guys had any thoughts on how to do it efficiently. This is not the kind of optimisation that can be left to the compiler, as you will see from the description below. I've written a bespoke sort routine. It works well enough on ints but I am looking to extend it to handle arbitrary data types. The keys on which the sort would operate would be a whole number of bytes in length (i.e. no partial bytes), and all keys would be of the same length. The relatively simple solution would be to pass the custom sort routine an array which specified the offset of each byte position in order. That has the first issue: coping with endianness. For example, if the sort routine were to sort 4-byte ints by working from MSB to LSB it would need to be passed an array of offsets: 3,2,1,0 on a little-endian machine, and 0,1,2,3 on a big-endian machine. Sorting byte-by-byte would not always be ideal, however. It would be faster to pick up and manipulate parts of a long key in multi-byte chunks, where possible. As an example, if the keys were character strings the same size as unsigned longs and were stored in big-endian order and the routine were to be running on a big-endian machine, and the keys were sufficiently aligned ... (phew) ... it would be better to treat the keys as unsigned longs. As another example, if the keys were structs which included embedded ints it would be worth picking those parts up as ints even on a little-endian machine. I can think of two basic ways to do the mechanics: 1. Call the sort routine with a data structure saying how keys are laid out in memory: custom_sort(array_of_uints, keyspec_for_uints); custom_sort(array_of_longs, keyspec_for_longs); The second parameter would point to an data structure which specified the layout of sort keys. The data structure would include an array of the parts of the key. 2. Provide a callback which would return the requested part of the key spec: custom_sort(array_of_uints, callback_for_uints); The sort routine would call callback_for_uints with a parameter saying where it had got to, and callback_for_uints would return a value which included offset, alignment and datatype of the next part of the key. All of this is, of course, very low level compared with much C programming. Hence the question: Have you any thoughts on how to write it in C and make the keyspec portable? -- James Harris
[toc] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-23 00:50 +0000 |
| Message-ID | <87efopstrv.fsf@bsb.me.uk> |
| In reply to | #123009 |
James Harris <james.harris.1@gmail.com> writes: > IME it's unusual to need to view C variables in their "raw" form but I > seem to have come across a requirement to do just that. <snip> > I've written a bespoke sort routine. <snip> > That has the first issue: coping with endianness. For example, if the > sort routine were to sort 4-byte ints by working from MSB to LSB it > would need to be passed an array of offsets: 3,2,1,0 on a > little-endian machine, and 0,1,2,3 on a big-endian machine. <snip lots more> I'm just going to say why (from my point of view) you've had no answers yet. There's quite a lot of text but I did not find enough detail to get my teeth into. You have a solution, but I have no idea how it might be made better, partly because I don't really know what solution you have right now. And at the detailed level there are some confusing things. Endianness is very rarely an issue but I suspect it (or something like it), is for you for reasons not yet explained. The bytes from LSB to MSB can easily be obtained by shifting and masking (and I am sure you know that) so I suspect that is not available to you for some of the data you are considering. <snip> -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Harris <james.harris.1@gmail.com> |
|---|---|
| Date | 2017-11-26 15:55 +0000 |
| Message-ID | <oveo59$aa0$1@dont-email.me> |
| In reply to | #123348 |
On 23/11/2017 00:50, Ben Bacarisse wrote:
> James Harris <james.harris.1@gmail.com> writes:
>
>> IME it's unusual to need to view C variables in their "raw" form but I
>> seem to have come across a requirement to do just that.
> <snip>
>> I've written a bespoke sort routine.
> <snip>
>> That has the first issue: coping with endianness. For example, if the
>> sort routine were to sort 4-byte ints by working from MSB to LSB it
>> would need to be passed an array of offsets: 3,2,1,0 on a
>> little-endian machine, and 0,1,2,3 on a big-endian machine.
>
> <snip lots more>
Thanks for the reply, Ben. Your snip comment made me smile. The post was
only 57 lines long but maybe I included too much in one post - and it
just felt like "lots"...! :-)
The basic query is difficult to describe due to the inherent underlying
complexity but I have been trying to think of a better way to put it.
Perhaps it's best to use an example and to speak about just one part of
the problem at a time - which I'll do, below. I think I can explain the
main points by analogy with radix sorting.
>
> I'm just going to say why (from my point of view) you've had no answers
> yet. There's quite a lot of text but I did not find enough detail to
> get my teeth into. You have a solution, but I have no idea how it might
> be made better, partly because I don't really know what solution you
> have right now.
>
> And at the detailed level there are some confusing things. Endianness
> is very rarely an issue but I suspect it (or something like it), is for
> you for reasons not yet explained. The bytes from LSB to MSB can easily
> be obtained by shifting and masking (and I am sure you know that) so I
> suspect that is not available to you for some of the data you are
> considering.
Imagine a function to carry out a radix sort. A programmer could write
one such function for each type of record to be sorted. The solution I
have now effectively does that: it only sorts integers and was intended
as a proof of concept.
But it would, of course, be more flexible to have one routine which was
able to sort arbitrary records. Then it could be called to sort
different types of data. That, though, would require a way to specify
the parts of the record which have to be used as sort keys. In turn,
that raises the question of how to tell the sort function about the
structure of the keys. E.g. imagine records like the following.
struct rec {
uint32_t key_part_1;
char[20] name;
char[3] key_part_2;
};
Assuming the radix sort works one byte at a time, to sort records of
struct rec type the sort function would have to be told that the key was
made up of bytes 0 to 3 and 24 to 26. And, what's more, the order of the
bytes of key_part_1 would depend on the architecture. So a list of bytes
could be supplied to an MSD radix sort as either of these two.
3, 2, 1, 0, 24, 25, 26 (little-endian machine)
0, 1, 2, 3, 24, 25, 26 (big-endian machine)
Given this scenario, the first part of the original query was about a
good way to tell the sort routine such a list of byte positions, and to
do so portably.
In summary, if calling a general purpose MSD radix sort which sorted
byte-by-byte (i.e. where each 'digit' was a byte) what would be a good
and portable way to specify the offsets of the bytes of the sort key
fields in descending order of significance?
--
James Harris
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-26 23:55 +0000 |
| Message-ID | <87fu90mw6q.fsf@bsb.me.uk> |
| In reply to | #123521 |
James Harris <james.harris.1@gmail.com> writes:
<snip>
> Imagine a function to carry out a radix sort. A programmer could write
> one such function for each type of record to be sorted. The solution I
> have now effectively does that: it only sorts integers and was
> intended as a proof of concept.
>
> But it would, of course, be more flexible to have one routine which
> was able to sort arbitrary records. Then it could be called to sort
> different types of data. That, though, would require a way to specify
> the parts of the record which have to be used as sort keys. In turn,
> that raises the question of how to tell the sort function about the
> structure of the keys. E.g. imagine records like the following.
>
> struct rec {
> uint32_t key_part_1;
> char[20] name;
> char[3] key_part_2;
> };
>
> Assuming the radix sort works one byte at a time, to sort records of
> struct rec type the sort function would have to be told that the key
> was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
> of the bytes of key_part_1 would depend on the architecture. So a list
> of bytes could be supplied to an MSD radix sort as either of these
> two.
>
> 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
> 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>
> Given this scenario, the first part of the original query was about a
> good way to tell the sort routine such a list of byte positions, and
> to do so portably.
The last bit is confusing me in that I would have to work hard to come
up with a non-portable way to tell the sort routine the list of byte
positions. That makes me think you mean something else. Do you want a
portable way to *determine* the list of positions in some portable and
quasi-automatic way? If so, how automatic do you want it to be?
> In summary, if calling a general purpose MSD radix sort which sorted
> byte-by-byte (i.e. where each 'digit' was a byte) what would be a good
> and portable way to specify the offsets of the bytes of the sort key
> fields in descending order of significance?
An array of unsigned int? The fact that the answer seems so clear to me
again makes me think I'm missing something.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2017-11-27 06:43 +0000 |
| Message-ID | <g2V32ppLhPDKcENafQyUTwdjN1mwQ@bongo-ra.co> |
| In reply to | #123526 |
On Sun, 26 Nov 2017 23:55:57 +0000
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> James Harris <james.harris.1@gmail.com> writes:
> <snip>
> > Imagine a function to carry out a radix sort. A programmer could write
> > one such function for each type of record to be sorted. The solution I
> > have now effectively does that: it only sorts integers and was
> > intended as a proof of concept.
> >
> > But it would, of course, be more flexible to have one routine which
> > was able to sort arbitrary records. Then it could be called to sort
> > different types of data. That, though, would require a way to specify
> > the parts of the record which have to be used as sort keys. In turn,
> > that raises the question of how to tell the sort function about the
> > structure of the keys. E.g. imagine records like the following.
> >
> > struct rec {
> > uint32_t key_part_1;
> > char[20] name;
> > char[3] key_part_2;
> > };
> >
> > Assuming the radix sort works one byte at a time, to sort records of
> > struct rec type the sort function would have to be told that the key
> > was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
> > of the bytes of key_part_1 would depend on the architecture. So a list
> > of bytes could be supplied to an MSD radix sort as either of these
> > two.
> >
> > 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
> > 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
Do you know that the only possibilities are little-endian and big-endian ?
> > Given this scenario, the first part of the original query was about a
> > good way to tell the sort routine such a list of byte positions, and
> > to do so portably.
>
> The last bit is confusing me in that I would have to work hard to come
> up with a non-portable way to tell the sort routine the list of byte
> positions. That makes me think you mean something else. Do you want a
> portable way to *determine* the list of positions in some portable and
> quasi-automatic way? If so, how automatic do you want it to be?
>
> > In summary, if calling a general purpose MSD radix sort which sorted
> > byte-by-byte (i.e. where each 'digit' was a byte) what would be a good
> > and portable way to specify the offsets of the bytes of the sort key
> > fields in descending order of significance?
>
> An array of unsigned int?
size_t seems to me like a better choice of type than unsigned int .
> The fact that the answer seems so clear to me
> again makes me think I'm missing something.
[toc] | [prev] | [next] | [standalone]
| From | "Pascal J. Bourguignon" <pjb@informatimago.com> |
|---|---|
| Date | 2017-11-27 09:26 +0100 |
| Message-ID | <m2efokcelh.fsf@informatimago.com> |
| In reply to | #123528 |
Spiros Bousbouras <spibou@gmail.com> writes:
>> > 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
>> > 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>
> Do you know that the only possibilities are little-endian and big-endian ?
You mean that little-endian and big-endian are NOT the only
possibilities?
In practice there are (old) processors where you'd have:
1, 0, 3, 2, 24, 25, 26
--
__Pascal J. Bourguignon
http://www.informatimago.com
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2017-11-27 08:50 +0000 |
| Message-ID | <g2V32ppLhPDKcENafQyUTwdjN1mwb@bongo-ra.co> |
| In reply to | #123530 |
On Mon, 27 Nov 2017 09:26:02 +0100 "Pascal J. Bourguignon" <pjb@informatimago.com> wrote: > Spiros Bousbouras <spibou@gmail.com> writes: > > >> > 3, 2, 1, 0, 24, 25, 26 (little-endian machine) > >> > 0, 1, 2, 3, 24, 25, 26 (big-endian machine) > > > > Do you know that the only possibilities are little-endian and big-endian ? > > You mean that little-endian and big-endian are NOT the only > possibilities? > > > In practice there are (old) processors where you'd have: > > 1, 0, 3, 2, 24, 25, 26 I don't know how old they are or what the full range of possibilities is but yes , I was referring to the fact that they exist(ed).
[toc] | [prev] | [next] | [standalone]
| From | "James R. Kuyper" <jameskuyper@verizon.net> |
|---|---|
| Date | 2017-11-27 10:35 -0500 |
| Message-ID | <4c779337-fbfa-6fd9-c727-834e074ee673@verizon.net> |
| In reply to | #123530 |
On 11/27/2017 03:26 AM, Pascal J. Bourguignon wrote: > Spiros Bousbouras <spibou@gmail.com> writes: > >>>> 3, 2, 1, 0, 24, 25, 26 (little-endian machine) >>>> 0, 1, 2, 3, 24, 25, 26 (big-endian machine) >> >> Do you know that the only possibilities are little-endian and big-endian ? > > You mean that little-endian and big-endian are NOT the only > possibilities? Yes, though the complete list of possibilities depends in part on what you mean by "possibilities". Some people, especially those who would describe themselves as "practical, common sense" people, only consider something possible if has actually been done; more accurately, they would consider those to be the only possibilities worth worrying about. Other people (myself included) believe that something should be considered "possible" if it could be done without violating any of the applicable constraints, even it if has never yet been done. Given an integer type that occupies N bytes, there's N! different possible (in the second sense) orderings of the bytes. For N==4, that's 24 orderings. A decade or two ago I found a web site listing the various orderings that had actually been used, and examples of devices where they were used. Unfortunately, I didn't appreciate the value of what I'd found, or I would have saved a copy of the web page, or at least a link to it. I've never been able to locate it again, so all of the following statements are based upon my fallible organic memory - feel free to discount them accordingly: Of those 24 possible orderings, 8 had been put into actual use for devices that the authors were familiar with (possible in the first sense described above). Given the difficulty of collecting such information, I think it's virtually guaranteed that there's a least a couple of orderings in actual use that they didn't know about. Overwhelmingly, the two most common orderings were big-endian and little-endian. The other possible orderings are generically called middle-endian. Of the middle-endian orderings, overwhelmingly the most common were 1 0 3 2 and 2 3 0 1, and some people use the term middle-endian exclusively for those two orderings - either because they are ignorant of the other orderings, or because they use a different term for those other orderings. I'd like to emphasize that not all 8 of those orderings were used by CPUs - many of them were used only by specialized devices. The web page was not about C, but about byte orderings, and therefore did not identify which of those orderings were in use by CPUs for which there was a conforming implementation of C. Note that not only does the standard not constrain the byte ordering in any way, it also does not constrain the bit ordering, either. I can't imagine any reason why anyone would do so, but it wouldn't violate any requirement imposed by the C standard to, for instance, store each of the four lowest-order bits in a different byte of a four-byte integer.
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2017-11-27 08:00 -0800 |
| Message-ID | <a5efc5bb-bbf8-4571-8a85-62a9fb9e201c@googlegroups.com> |
| In reply to | #123535 |
On Monday, November 27, 2017 at 9:36:17 AM UTC-6, James R. Kuyper wrote: Eight? Really? What devices used something other than 0123, 3210, or 2301? > Note that not only does the standard not constrain the byte ordering in > any way, it also does not constrain the bit ordering, either. I can't > imagine any reason why anyone would do so, but it wouldn't violate any > requirement imposed by the C standard to, for instance, store each of > the four lowest-order bits in a different byte of a four-byte integer. How much value is gained by having a Standard regard 2.6313e+35 arrangements of bits in a 32-bit word as equally likely, when in practice 99.9% of machines will use one of TWO, and 99.9999999%+ of the possible arrangements will NEVER be used by ANY implementation anywhere? I would favor having the Standard define a category of conformance that allows for things like weird bit orderings, padding bits, byte sizes other than eight, non-two's-complement representations, etc. but that doesn't mean that it shouldn't recognize traits that are common to 99.9%+ of implementations.
[toc] | [prev] | [next] | [standalone]
| From | "Pascal J. Bourguignon" <pjb@informatimago.com> |
|---|---|
| Date | 2017-11-27 23:49 +0100 |
| Message-ID | <m27eubcp68.fsf@informatimago.com> |
| In reply to | #123535 |
"James R. Kuyper" <jameskuyper@verizon.net> writes: > Note that not only does the standard not constrain the byte ordering > in any way, it also does not constrain the bit ordering, either. I > can't imagine any reason why anyone would do so, but it wouldn't > violate any requirement imposed by the C standard to, for instance, > store each of the four lowest-order bits in a different byte of a > four-byte integer. The notion of thing-ordering has a sense only if there's is a way to _address_ things. You cannot have bit-ordering if you can't have a pointer to a bit. In the case of the C programming language, pointers may address at most char (sizeof(char)=1). Therefore the notion of bit ordering is meaningless in C. It is also meaningless in hardware, in the sense that bits are not ordered logically (by an increasing address), but physically, spacio-temporally. You can have some bits laid out on the right or on the left, perhaps in front or behind, or even above or below other bits, some bits may be transmitted before or after some other bits when serialized. And this changes from chip to chip, from bus to bus. It doesn't matter, as long as the hardware is consistent (you read back the same bit you wrote, in the same order, when you read or write a byte). -- __Pascal J. Bourguignon http://www.informatimago.com
[toc] | [prev] | [next] | [standalone]
| From | jameskuyper@verizon.net |
|---|---|
| Date | 2017-11-27 15:56 -0800 |
| Message-ID | <8e037c7d-9582-4915-aa8c-3802dab7b1b5@googlegroups.com> |
| In reply to | #123541 |
On 11/27/2017 05:49 PM, Pascal J. Bourguignon wrote:
> "James R. Kuyper" <jameskuyper@verizon.net> writes:
>
>> Note that not only does the standard not constrain the byte ordering
>> in any way, it also does not constrain the bit ordering, either. I
>> can't imagine any reason why anyone would do so, but it wouldn't
>> violate any requirement imposed by the C standard to, for instance,
>> store each of the four lowest-order bits in a different byte of a
>> four-byte integer.
>
> The notion of thing-ordering has a sense only if there's is a way to
> _address_ things.
You're right - you can't determine absolute bit orderings, since there's no way
to address individual bits. However, in any context which allows you to access
memory using two different types, you can determine the relative bit order of
those types. Unions provide one such context, and any object can be accessed
either as an array of unsigned char or by using it's effective type.
Unless those two types are a signed type and the corresponding unsigned type,
the standard imposes no requirements that the order be consistent between them.
Consider the following code:
for(uint32_t i=1; i; i<<=1)
{
printf("%" PRIx32 "\t", i);
for(unsigned char *p=(unsigned char*)&i;
p < (unsigned char*)(1+&i); p++)
printf("%02x ", *p);
printf("\n");
}
If int32_t is a supported type, then CHAR_BIT must be 8, 16, or 32, and that
code should print exactly 32 lines.
From what the C standard does tell us about the representation of integer types,
including the fact that int32_t is not allowed to have any padding bits, it's
possible to deduce that every single line should show only one byte with a non-zero value, and that value should be a power of 2 that is <= UCHAR_MAX. For any
permitted value of CHAR_BIT, there are 32 possible sets of byte values meeting that requirement, and all of those sets will occur exactly once in the output.
People who pay too much attention to what's actually being done by real world
systems, and not enough attention to what the standard actually permits, would
claim that all of the lines where the non-zero bit is in a given byte must
appear consecutively, and in increasing order of the value of that bit,
interpreted as an unsigned char. And if they tested with virtually any real world implementation of C, they would get results consistent with that claim.
However, the standard imposes no such requirement. There are 32! different
possible orders in which those 32 distinct sets of byte values can occur, and
not one of those orders would violate any requirement imposed by the C standard.
[toc] | [prev] | [next] | [standalone]
| From | herrmannsfeldt@gmail.com |
|---|---|
| Date | 2017-12-15 12:45 -0800 |
| Message-ID | <bcb652f4-7e65-4476-bb27-2bf1748a7c58@googlegroups.com> |
| In reply to | #123530 |
On Monday, November 27, 2017 at 12:26:18 AM UTC-8, Pascal J. Bourguignon wrote: > Spiros Bousbouras <spi...@gmail.com> writes: > >> > 3, 2, 1, 0, 24, 25, 26 (little-endian machine) > >> > 0, 1, 2, 3, 24, 25, 26 (big-endian machine) > > > > Do you know that the only possibilities are little-endian and big-endian ? > You mean that little-endian and big-endian are NOT the only > possibilities? > In practice there are (old) processors where you'd have: > 1, 0, 3, 2, 24, 25, 26 VAX uses middle endian for floating point, but little endian for integers. There are additional problems with sorting floating point than bit (and byte) order. The PDP-10 uses two's complement for floating point, such that you can (and do) compare floating point values with fixed point compare instructions. Note that even for twos complement fixed point, you have to treat the byte with sign different from the other bytes, when comparing a byte at a time.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-27 15:08 +0000 |
| Message-ID | <871skj3gje.fsf@bsb.me.uk> |
| In reply to | #123528 |
Spiros Bousbouras <spibou@gmail.com> writes:
> On Sun, 26 Nov 2017 23:55:57 +0000
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>> James Harris <james.harris.1@gmail.com> writes:
>> <snip>
>> > Imagine a function to carry out a radix sort. A programmer could write
>> > one such function for each type of record to be sorted. The solution I
>> > have now effectively does that: it only sorts integers and was
>> > intended as a proof of concept.
>> >
>> > But it would, of course, be more flexible to have one routine which
>> > was able to sort arbitrary records. Then it could be called to sort
>> > different types of data. That, though, would require a way to specify
>> > the parts of the record which have to be used as sort keys. In turn,
>> > that raises the question of how to tell the sort function about the
>> > structure of the keys. E.g. imagine records like the following.
>> >
>> > struct rec {
>> > uint32_t key_part_1;
>> > char[20] name;
>> > char[3] key_part_2;
>> > };
>> >
>> > Assuming the radix sort works one byte at a time, to sort records of
>> > struct rec type the sort function would have to be told that the key
>> > was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
>> > of the bytes of key_part_1 would depend on the architecture. So a list
>> > of bytes could be supplied to an MSD radix sort as either of these
>> > two.
>> >
>> > 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
>> > 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>
> Do you know that the only possibilities are little-endian and
> big-endian ?
I do. The OP probably does too. In fact the idea of listing byut
offsets might be because of such possibilities.
>> > Given this scenario, the first part of the original query was about a
>> > good way to tell the sort routine such a list of byte positions, and
>> > to do so portably.
>>
>> The last bit is confusing me in that I would have to work hard to come
>> up with a non-portable way to tell the sort routine the list of byte
>> positions. That makes me think you mean something else. Do you want a
>> portable way to *determine* the list of positions in some portable and
>> quasi-automatic way? If so, how automatic do you want it to be?
>>
>> > In summary, if calling a general purpose MSD radix sort which sorted
>> > byte-by-byte (i.e. where each 'digit' was a byte) what would be a good
>> > and portable way to specify the offsets of the bytes of the sort key
>> > fields in descending order of significance?
>>
>> An array of unsigned int?
>
> size_t seems to me like a better choice of type than unsigned int .
I was not certain. It seems unlikely that you need to index bytes that
far into individual records, but the exact type is not really the point.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Harris <james.harris.1@gmail.com> |
|---|---|
| Date | 2017-11-27 14:55 +0000 |
| Message-ID | <ovh906$kj4$1@dont-email.me> |
| In reply to | #123526 |
On 26/11/2017 23:55, Ben Bacarisse wrote:
> James Harris <james.harris.1@gmail.com> writes:
> <snip>
>> Imagine a function to carry out a radix sort. A programmer could write
>> one such function for each type of record to be sorted. The solution I
>> have now effectively does that: it only sorts integers and was
>> intended as a proof of concept.
>>
>> But it would, of course, be more flexible to have one routine which
>> was able to sort arbitrary records. Then it could be called to sort
>> different types of data. That, though, would require a way to specify
>> the parts of the record which have to be used as sort keys. In turn,
>> that raises the question of how to tell the sort function about the
>> structure of the keys. E.g. imagine records like the following.
>>
>> struct rec {
>> uint32_t key_part_1;
>> char[20] name;
>> char[3] key_part_2;
>> };
>>
>> Assuming the radix sort works one byte at a time, to sort records of
>> struct rec type the sort function would have to be told that the key
>> was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
>> of the bytes of key_part_1 would depend on the architecture. So a list
>> of bytes could be supplied to an MSD radix sort as either of these
>> two.
>>
>> 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
>> 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>>
>> Given this scenario, the first part of the original query was about a
>> good way to tell the sort routine such a list of byte positions, and
>> to do so portably.
>
> The last bit is confusing me in that I would have to work hard to come
> up with a non-portable way to tell the sort routine the list of byte
> positions. That makes me think you mean something else. Do you want a
> portable way to *determine* the list of positions in some portable and
> quasi-automatic way? If so, how automatic do you want it to be?
When I speak of "portable" I am thinking that the source would be the
same. The offsets would have to be determined at compile time (or
later). It's easy to do this non-portably. For example, a non-portable
way to carry out the sort on 32-bit integers might define an array of
offsets such as
int i32_offsets[] = {3, 2, 1, 0, -1};
The -1 indicates the end of the list. In any call to the sort function
the offsets would be given, as in
radm_sort(values_array, <other parameters>, i32_offsets, 0);
But in a number of respects this would not be portable. E.g. it would
not work on a big-endian machine.
>
>> In summary, if calling a general purpose MSD radix sort which sorted
>> byte-by-byte (i.e. where each 'digit' was a byte) what would be a good
>> and portable way to specify the offsets of the bytes of the sort key
>> fields in descending order of significance?
>
> An array of unsigned int? The fact that the answer seems so clear to me
> again makes me think I'm missing something.
Does it make more sense now?
--
James Harris
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-27 15:51 +0000 |
| Message-ID | <87d143k9e1.fsf@bsb.me.uk> |
| In reply to | #123533 |
James Harris <james.harris.1@gmail.com> writes:
> On 26/11/2017 23:55, Ben Bacarisse wrote:
>> James Harris <james.harris.1@gmail.com> writes:
>> <snip>
>>> Imagine a function to carry out a radix sort. A programmer could write
>>> one such function for each type of record to be sorted. The solution I
>>> have now effectively does that: it only sorts integers and was
>>> intended as a proof of concept.
>>>
>>> But it would, of course, be more flexible to have one routine which
>>> was able to sort arbitrary records. Then it could be called to sort
>>> different types of data. That, though, would require a way to specify
>>> the parts of the record which have to be used as sort keys. In turn,
>>> that raises the question of how to tell the sort function about the
>>> structure of the keys. E.g. imagine records like the following.
>>>
>>> struct rec {
>>> uint32_t key_part_1;
>>> char[20] name;
>>> char[3] key_part_2;
>>> };
>>>
>>> Assuming the radix sort works one byte at a time, to sort records of
>>> struct rec type the sort function would have to be told that the key
>>> was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
>>> of the bytes of key_part_1 would depend on the architecture. So a list
>>> of bytes could be supplied to an MSD radix sort as either of these
>>> two.
>>>
>>> 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
>>> 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>>>
>>> Given this scenario, the first part of the original query was about a
>>> good way to tell the sort routine such a list of byte positions, and
>>> to do so portably.
>>
>> The last bit is confusing me in that I would have to work hard to come
>> up with a non-portable way to tell the sort routine the list of byte
>> positions. That makes me think you mean something else. Do you want a
>> portable way to *determine* the list of positions in some portable and
>> quasi-automatic way? If so, how automatic do you want it to be?
>
> When I speak of "portable" I am thinking that the source would be the
> same. The offsets would have to be determined at compile time (or
> later). It's easy to do this non-portably. For example, a non-portable
> way to carry out the sort on 32-bit integers might define an array of
> offsets such as
>
> int i32_offsets[] = {3, 2, 1, 0, -1};
>
> The -1 indicates the end of the list.
Ah, so the problem is generating the list at compile time? That's
tricky without a build-phase -- i.e. a program that runs "at compile
time".
With running code there are lots of options. The end result could be to
write a series of macros to a file that get #included into the source:
#define UINT64_BNUMS 7, 6, 5, 4, 3, 2, 1, 0
#define UINT32_BNUMS 3, 2, 1, 0
#define UINT16_BNUMS 1, 0
so you can write
int i32_offsets[] = { UINT32_BNUMS, -1 };
To handle integer types deeper in the struct you'd need a macro that can
add an offset to a list. I think that can be done with macro magic.
I'll have go if you think this is the way you will be going.
The other issue is padding between struct members but that can be done
with the standard offsetof macro.
<snip>
> Does it make more sense now?
You'll have to tell me if I've got what you mean!
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Harris <james.harris.1@gmail.com> |
|---|---|
| Date | 2017-11-28 17:13 +0000 |
| Message-ID | <ovk5ev$ee3$1@dont-email.me> |
| In reply to | #123537 |
On 27/11/2017 15:51, Ben Bacarisse wrote:
> James Harris <james.harris.1@gmail.com> writes:
>
>> On 26/11/2017 23:55, Ben Bacarisse wrote:
>>> James Harris <james.harris.1@gmail.com> writes:
>>> <snip>
>>>> Imagine a function to carry out a radix sort. A programmer could write
>>>> one such function for each type of record to be sorted. The solution I
>>>> have now effectively does that: it only sorts integers and was
>>>> intended as a proof of concept.
>>>>
>>>> But it would, of course, be more flexible to have one routine which
>>>> was able to sort arbitrary records. Then it could be called to sort
>>>> different types of data. That, though, would require a way to specify
>>>> the parts of the record which have to be used as sort keys. In turn,
>>>> that raises the question of how to tell the sort function about the
>>>> structure of the keys. E.g. imagine records like the following.
>>>>
>>>> struct rec {
>>>> uint32_t key_part_1;
>>>> char[20] name;
>>>> char[3] key_part_2;
>>>> };
>>>>
>>>> Assuming the radix sort works one byte at a time, to sort records of
>>>> struct rec type the sort function would have to be told that the key
>>>> was made up of bytes 0 to 3 and 24 to 26. And, what's more, the order
>>>> of the bytes of key_part_1 would depend on the architecture. So a list
>>>> of bytes could be supplied to an MSD radix sort as either of these
>>>> two.
>>>>
>>>> 3, 2, 1, 0, 24, 25, 26 (little-endian machine)
>>>> 0, 1, 2, 3, 24, 25, 26 (big-endian machine)
>>>>
>>>> Given this scenario, the first part of the original query was about a
>>>> good way to tell the sort routine such a list of byte positions, and
>>>> to do so portably.
>>>
>>> The last bit is confusing me in that I would have to work hard to come
>>> up with a non-portable way to tell the sort routine the list of byte
>>> positions. That makes me think you mean something else. Do you want a
>>> portable way to *determine* the list of positions in some portable and
>>> quasi-automatic way? If so, how automatic do you want it to be?
>>
>> When I speak of "portable" I am thinking that the source would be the
>> same. The offsets would have to be determined at compile time (or
>> later). It's easy to do this non-portably. For example, a non-portable
>> way to carry out the sort on 32-bit integers might define an array of
>> offsets such as
>>
>> int i32_offsets[] = {3, 2, 1, 0, -1};
>>
>> The -1 indicates the end of the list.
>
> Ah, so the problem is generating the list at compile time? That's
> tricky without a build-phase -- i.e. a program that runs "at compile
> time".
Compile time would be ideal, if feasible.
>
> With running code there are lots of options. The end result could be to
> write a series of macros to a file that get #included into the source:
>
> #define UINT64_BNUMS 7, 6, 5, 4, 3, 2, 1, 0
> #define UINT32_BNUMS 3, 2, 1, 0
> #define UINT16_BNUMS 1, 0
>
> so you can write
>
> int i32_offsets[] = { UINT32_BNUMS, -1 };
>
> To handle integer types deeper in the struct you'd need a macro that can
> add an offset to a list. I think that can be done with macro magic.
> I'll have go if you think this is the way you will be going.
>
> The other issue is padding between struct members but that can be done
> with the standard offsetof macro.
>
> <snip>
>> Does it make more sense now?
>
> You'll have to tell me if I've got what you mean!
I think so. And I see that picking code in a #if block can be done if
suitable tests can be devised. But I am not sure that such an approach
would scale well and I wonder if there would be a more flexible way. To
illustrate, consider
unsigned U_ORDER = 0x00010203;
unsigned char *p = (unsigned char *) &U_ORDER;
The idea would be that p[0] through p[3] would give the offsets of the
bytes of an unsigned int in descending order of significance, and that
it would work irrespective of endianness - big, little or anything else.
But it has problems: it assumes sizeof(unsigned) is 4 and that CHAR_BIT
is 8. Hence it is not portable.
Signed ints would be harder. At least the byte with the sign would need
to be treated differently. That could be done with a simple mapping
function (possibly using xor or subtract operations) which would be
fine. But the code would need to adapt to how signed ints were laid out
in memory on different machines. And, again, that seems to be hard to do
portably.
I guess I am looking for a portable way for C datatypes to be treated as
a sequence of bytes, and hoping there would be some language support
(i.e. set of definitions in the C standards) for doing so. Although my
sort routine is not a radix sort it faces the same problems of
decomposing an arbitrary key into manageable chunks so I was thinking
that this kind of problem might have already been solved.
AIUI any C object apart from a bit field can be viewed as a sequence of
bytes but that only gets me part way to a portable approach to knowing
what is in each byte.
It seems like a simple requirement, but implementing it portably is
where it becomes difficult.
--
James Harris
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-29 01:07 +0000 |
| Message-ID | <87efohsxin.fsf@bsb.me.uk> |
| In reply to | #123576 |
James Harris <james.harris.1@gmail.com> writes:
> On 27/11/2017 15:51, Ben Bacarisse wrote:
>> James Harris <james.harris.1@gmail.com> writes:
<snip>
>>> When I speak of "portable" I am thinking that the source would be the
>>> same. The offsets would have to be determined at compile time (or
>>> later). It's easy to do this non-portably. For example, a non-portable
>>> way to carry out the sort on 32-bit integers might define an array of
>>> offsets such as
>>>
>>> int i32_offsets[] = {3, 2, 1, 0, -1};
>>>
>>> The -1 indicates the end of the list.
>>
>> Ah, so the problem is generating the list at compile time? That's
>> tricky without a build-phase -- i.e. a program that runs "at compile
>> time".
>
> Compile time would be ideal, if feasible.
I don't think you can do it "at compile time" rather than "at
configuration time".
<snip>
>> You'll have to tell me if I've got what you mean!
>
> I think so. And I see that picking code in a #if block can be done if
> suitable tests can be devised.
That's was not what I was suggesting. But I'm not sure what I was
suggesting will be what you want anyway.
> But I am not sure that such an approach
> would scale well and I wonder if there would be a more flexible
> way. To illustrate, consider
>
> unsigned U_ORDER = 0x00010203;
> unsigned char *p = (unsigned char *) &U_ORDER;
>
> The idea would be that p[0] through p[3] would give the offsets of the
> bytes of an unsigned int in descending order of significance, and that
> it would work irrespective of endianness - big, little or anything
> else. But it has problems: it assumes sizeof(unsigned) is 4 and that
> CHAR_BIT is 8. Hence it is not portable.
This is run-time code, yes? If you are prepared to do it at run-time
then you don't need to assume the size of anything. You can start with
unsigned nb = 0;
unsigned U_ORDER = 0;
and loop setting
(U_ORDER << CHAR_BIT) | nb++
sizeof (unsigned) times. And everywhere you are tempted to use p[0] to
p[3] use p[0] to p[sizeof U_ORDER - 1] instead.
<snip>
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2017-11-29 18:59 +0000 |
| Message-ID | <RrDMy1cnJtEcOsyJAqgdzUr1TUNmG@bongo-ra.co> |
| In reply to | #123593 |
On Wed, 29 Nov 2017 01:07:28 +0000 Ben Bacarisse <ben.usenet@bsb.me.uk> wrote: > unsigned nb = 0; > unsigned U_ORDER = 0; > > and loop setting > > (U_ORDER << CHAR_BIT) | nb++ If sizeof(unsigned) == 1 then U_ORDER << CHAR_BIT is undefined behaviour. > sizeof (unsigned) times. And everywhere you are tempted to use p[0] to > p[3] use p[0] to p[sizeof U_ORDER - 1] instead.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2017-11-30 00:53 +0000 |
| Message-ID | <87bmjklh7u.fsf@bsb.me.uk> |
| In reply to | #123614 |
Spiros Bousbouras <spibou@gmail.com> writes: > On Wed, 29 Nov 2017 01:07:28 +0000 > Ben Bacarisse <ben.usenet@bsb.me.uk> wrote: >> unsigned nb = 0; >> unsigned U_ORDER = 0; >> >> and loop setting >> >> (U_ORDER << CHAR_BIT) | nb++ > > If sizeof(unsigned) == 1 then U_ORDER << CHAR_BIT > is undefined behaviour. Why? But that's a side issue. The key thing is "loop setting" -- i.e. rather than a fixed assignment the code should loop setting the byte numbers. When sizeof U_ORDER == 1, you won't loop. There is only one byte, it gets given number 0 and no others need to be set so the code in question is not run when the size is 1. >> sizeof (unsigned) times. And everywhere you are tempted to use p[0] to >> p[3] use p[0] to p[sizeof U_ORDER - 1] instead. As here -- everything is to be done from i = 0 to i < sizeof U_ORDER. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2017-11-29 17:01 -0800 |
| Message-ID | <lnindsy3y6.fsf@kst-u.example.com> |
| In reply to | #123629 |
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Spiros Bousbouras <spibou@gmail.com> writes:
>> On Wed, 29 Nov 2017 01:07:28 +0000
>> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>>> unsigned nb = 0;
>>> unsigned U_ORDER = 0;
>>>
>>> and loop setting
>>>
>>> (U_ORDER << CHAR_BIT) | nb++
>>
>> If sizeof(unsigned) == 1 then U_ORDER << CHAR_BIT
>> is undefined behaviour.
>
> Why?
Because CHAR_BIT (which would have to be at least 16) would be equal to
the width of the left operand.
N1570 6.5.7p3:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If the
value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior
is undefined
> But that's a side issue.
[snip]
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.lang.c
csiph-web