Groups > comp.lang.c > #169256 > unrolled thread

FAQ list update: Q1.1: choosing types

Started by	scs@eskimo.com (Steve Summit)
First post	2023-02-13 15:18 +0000
Last post	2023-02-15 05:31 -0800
Articles	5 on this page of 25 — 14 participants

Back to article view | Back to comp.lang.c

  FAQ list update: Q1.1: choosing types scs@eskimo.com (Steve Summit) - 2023-02-13 15:18 +0000
    Re: FAQ list update: Q1.1: choosing types Anton Shepelev <anton.txt@g{oogle}mail.com> - 2023-02-13 18:27 +0300
    Re: FAQ list update: Q1.1: choosing types scott@slp53.sl.home (Scott Lurndal) - 2023-02-13 16:27 +0000
      Re: FAQ list update: Q1.1: choosing types Oğuz <oguzismailuysal@gmail.com> - 2023-02-14 08:55 +0300
    Re: FAQ list update: Q1.1: choosing types Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-02-13 18:07 +0000
    Re: FAQ list update: Q1.1: choosing types Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-02-13 10:48 -0800
    Re: FAQ list update: Q1.1: choosing types Kaz Kylheku <864-117-4973@kylheku.com> - 2023-02-14 03:10 +0000
      Re: FAQ list update: Q1.1: choosing types Blue-Maned_Hawk <bluemanedhawk@gmail.com> - 2023-02-14 17:33 -0500
    Re: FAQ list update: Q1.1: choosing types Öö Tiib <ootiib@hot.ee> - 2023-02-13 23:24 -0800
    Re: FAQ list update: Q1.1: choosing types bart c <bart4858@gmail.com> - 2023-02-14 06:22 -0800
      Re: FAQ list update: Q1.1: choosing types Blue-Maned_Hawk <bluemanedhawk@gmail.com> - 2023-02-14 17:48 -0500
    Re: FAQ list update: Q1.1: choosing types John Dill <jadill33@gmail.com> - 2023-02-14 08:17 -0800
      Re: FAQ list update: Q1.1: choosing types Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-02-14 12:34 -0800
        Re: FAQ list update: Q1.1: choosing types John Dill <jadill33@gmail.com> - 2023-02-15 10:04 -0800
          Re: FAQ list update: Q1.1: choosing types Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-02-15 10:28 -0800
            Re: FAQ list update: Q1.1: choosing types Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-02-15 10:57 -0800
              Re: FAQ list update: Q1.1: choosing types Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-02-15 11:17 -0800
                Re: FAQ list update: Q1.1: choosing types Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-02-15 11:32 -0800
            Re: FAQ list update: Q1.1: choosing types John Dill <jadill33@gmail.com> - 2023-02-15 11:01 -0800
              Re: FAQ list update: Q1.1: choosing types Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-02-15 11:28 -0800
                Re: FAQ list update: Q1.1: choosing types John Dill <jadill33@gmail.com> - 2023-02-15 14:00 -0800
    Re: FAQ list update: Q1.1: choosing types Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-02-15 02:54 -0800
    Re: FAQ list update: Q1.1: choosing types Anton Shepelev <anton.txt@g{oogle}mail.com> - 2023-02-15 14:43 +0300
    Re: FAQ list update: Q1.1: choosing types David Brown <david.brown@hesbynett.no> - 2023-02-15 13:55 +0100
      Re: FAQ list update: Q1.1: choosing types Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-02-15 05:31 -0800

Page 2 of 2 — ← Prev page 1 [2]

#169284

From	John Dill <jadill33@gmail.com>
Date	2023-02-15 14:00 -0800
Message-ID	<68f01228-2eb1-45ba-b6db-723442607b7an@googlegroups.com>
In reply to	#169282

On Wednesday, February 15, 2023 at 2:28:26 PM UTC-5, Keith Thompson wrote:
> John Dill <jadi...@gmail.com> writes: 
> > On Wednesday, February 15, 2023 at 1:29:07 PM UTC-5, Keith Thompson wrote: 
> >> John Dill <jadi...@gmail.com> writes:
> [...]
> >> > C_STATIC_ASSERT (sizeof (ssize_t) == sizeof (size_t), 
> >> > ssize_t_is_compatible_with_size_t); 
> >> > \endcode
> [...]
> >> The name "ssize_t_is_compatible_with_size_t" is a bit misleading. 
> >> C type compatibility is defined very narrowly. A signed type is 
> >> never *compatible* with an unsigned type. What is "compatible" 
> >> supposed to mean in this context? 
> > 
> > That's a good point. A better term is "size equivalent" perhaps?
> I misread the arguments to the C_STATIC_ASSERT() macro; I thought you 
> were testing two different conditions. I see that 
> sizeof (ssize_t) == sizeof (size_t) 
> is the condition being tested, and 
> size_t_is_compatible_with_size_t 
> is a name assigned to that condition (used in an error message?). 
> 
> How about "size_t_ssize_t_same_size"? (Yeah, it's a bit ugly.) 
> 
> But is that really a requirement? Note that POSIX doesn't guarantee 
> that size_t and ssize_t have the same size. I'd be surprised by an 
> implementation in which their sizes differ. But if size_t and ssize_t 
> are the same size, then there will be sizes representable as a size_t 
> that exceed SSIZE_MAX; if that's a problem for real code, an 
> implementation might make ssize_t bigger than size_t.

In my domain, there are additional rules imposed with the guise of
preventing "surprising" situations.  Anything that a static assert flags
would be one that would potentially cause "wild" behavior that could
be unaccounted for.  Wild behavior happens because programmers
make implicit assumptions and when we encounter an environment
that deviates from those assumptions, we want to know ASAP.

In the context of the above, having a size_t and ssize_t be different
physical storage sizes would be "surprising" in a purely arbitrary manner
specific to development by our local group in this domain that may be
immaterial to the C or POSIX standard at large.

These restrictions are domain specific and perhaps outside the scope
of the C FAQ but it is part of "real world" type rules that affect the type
choices that I make.

Best regards,
John D.

[toc] | [prev] | [next] | [standalone]

#169273

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-02-15 02:54 -0800
Message-ID	<865yc3ci6a.fsf@linuxsc.com>
In reply to	#169256

scs@eskimo.com (Steve Summit) writes:

> In its very first question, ["How should I decide which integer
> type to use?"] the FAQ list currently says:
>
>   [long answer]
>
> Besides cleaning up the wording (this answer, like much of my
> writing, is afflicted with Too Many Parentheticals :-) ),

First, as you indirectly point out, this question is actually two
questions, namely, which representation should be chosen, and
which name should be used (e.g., using typedef).  It's important
IMO to identify this dichotomy right up front, and not just leave
it as an afterthought.

Second, for the representation part of the question, the easy and
safe answer is simply to use long long in cases where a signed
type is needed, and unsigned long long in cases where an unsigned
type is needed.  That choice works pretty well in most "normal"
environments or situations.

Third, in many cases the choice is determined by whatever outside
functions or libraries are being used.  For example, when using
stdio.h functions, characters are returned as 'int', and 'int' is
what should be used in conjunction with those functions.  (It may
be worth considering whether to combine this point with the next
one.)

Fourth, after that there are any number of various situations or
environments that deserve being addressed, which I think should
be done by spinning them off into subquestions.  (Note that doing
this should ameliorate the problem of Too Many Parentheticals.)

Fifth, there are several schools of thought about whether to
prefer signed types or unsigned types in various common code
patterns, as for example indexing, and that issue should be
mentioned and addressed explicitly.  Obviously this aspect can
and probably should be a separate question.

> I am
> strongly considering making one or two additional, potentially
> controversial points:
>
>     If you want to make the considered assumption that your
>     code will only ever run on "modern" hardware, you can say
>     that plain int can hold up to +-2147483647 (that is, has
>     32 bits).

If you think this point is important to mention, it should be
brought up not at the top level but under one of the followon
questions under "Fourth" above.

>     Some coding guidelines, notably MISRA, deprecate the
>     "abstract" types short, int, and long, and mandate always
>     using exact-size types like int16_t and int32_t.

Clearly a separate question, and maybe even a separate category
of questions.  How to make good choices for integer types is
both an important area of discussion and a large area of
discussion.  Such important subtopics should be addressed
individually and not just mentioned in passing.  Of course,
it is perfectly reasonable (and IMO desirable) to give a
"see also" reference for these sorts of concerns.

> To the first point, believe me, I do know what the Standard
> guarantees about type int, and I remember well the pejorative
> phrase "All the world's a VAX" as promulgated in, among other
> things, Henry Spencer's 10 Commandments.  But I also know that
> the codebase I work on every day in my day job makes this
> assumption, and I'm pretty sure I'm not alone.

I concur, with the understanding that this point is a subpoint
in a separate question rather than being brought up in the
initial question.

> To the second point, while I personally disagree up, down, and
> sideways with the dictum "don't use plain int", I can't ignore the
> fact that many practicing C programmers do advocate and follow it,
> and many projects faithfully honor MISRA.

A question like "Is it okay just to use plain int?" is a good
question, but it should be a subsidiary question.

Regarding MISRA, does the FAQ have a separate section for style
guides and coding standards?  Presuming it does (and IMO it
should), any answers that refer to MISRA should be put there.

> Opinions (in support or denigration of either point) welcome.

I hope my comments have provided some value to your efforts.

You are welcome to use any of my writing above freely, with the
understanding that I reserve all original rights to myself, as
for example reuse, claim of original authorship, etc.

[toc] | [prev] | [next] | [standalone]

#169274

From	Anton Shepelev <anton.txt@g{oogle}mail.com>
Date	2023-02-15 14:43 +0300
Message-ID	<20230215144309.8b2e2afe0aa604745d08c065@g{oogle}mail.com>
In reply to	#169256

Steve Summit:

> Opinions (in support or denigration of either point)
> welcome.

Considering the quantity and quality of feedback, how about
maintaining this FAQ in Gitlab, as a set of Markdown files,
similar to Learn X in Y Minutes:

   Site  : https://learnxinyminutes.com/
   Source: https://github.com/adambard/learnxinyminutes-docs

That way, many people can contribute, whereas you as
maintainer will review the contributions using  all the
niceties of Gitlab. If Markdown is too modern, the entries
can be stored in plain text files.

-- 
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments

[toc] | [prev] | [next] | [standalone]

#169275

From	David Brown <david.brown@hesbynett.no>
Date	2023-02-15 13:55 +0100
Message-ID	<tsikn9$2t1qg$1@dont-email.me>
In reply to	#169256

On 13/02/2023 16:18, Steve Summit wrote:
> In its very first question, the FAQ list currently says:
> 
> 	If you might need large values (above 32,767 or
> 	below -32,767), use long.  Otherwise, if space is
> 	very important (i.e. if there are large arrays or
> 	many structures), use short.  Otherwise, use int.
> 	If well-defined overflow characteristics are important
> 	and negative values are not, or if you want to steer
> 	clear of sign-extension problems when manipulating bits
> 	or bytes, use one of the corresponding unsigned types.
> 	(Beware when mixing signed and unsigned values in
> 	expressions, though; see question 3.19.)
> 
> 	Although character types (especially unsigned char) can
> 	be used as ``tiny'' integers, doing so is sometimes more
> 	trouble than it's worth.  The compiler will have to emit
> 	extra code to convert between char and int (making the
> 	executable larger), and unexpected sign extension can be
> 	troublesome.  (Using unsigned char can help; see question
> 	12.1 for a related problem.)
> 
> 	A similar space/time tradeoff applies when deciding
> 	between float and double.  (Many compilers still convert
> 	all float values to double during expression evaluation.)
> 
> 	...If for some reason you need to declare something with
> 	an exact size (usually the only good reason for doing so
> 	is when attempting to conform to some externally-imposed
> 	storage layout, but see question 20.5), be sure to
> 	encapsulate the choice behind an appropriate typedef,
> 	but see question 1.3.
> 
> Besides cleaning up the wording (this answer, like much of my
> writing, is afflicted with Too Many Parentheticals :-) ), I am
> strongly considering making one or two additional, potentially
> controversial points:
> 
> 	If you want to make the considered assumption that your
> 	code will only ever run on "modern" hardware, you can say
> 	that plain int can hold up to +-2147483647 (that is, has
> 	32 bits).
> 
> 	Some coding guidelines, notably MISRA, deprecate the
> 	"abstract" types short, int, and long, and mandate always
> 	using exact-size types like int16_t and int32_t.
> 
> To the first point, believe me, I do know what the Standard
> guarantees about type int, and I remember well the pejorative
> phrase "All the world's a VAX" as promulgated in, among other
> things, Henry Spencer's 10 Commandments.  But I also know that
> the codebase I work on every day in my day job makes this
> assumption, and I'm pretty sure I'm not alone.
> 
> To the second point, while I personally disagree up, down, and
> sideways with the dictum "don't use plain int", I can't ignore the
> fact that many practicing C programmers do advocate and follow it,
> and many projects faithfully honor MISRA.
> 
> Opinions (in support or denigration of either point) welcome.
> 
> 					Steve Summit
> 					scs@eskimo.com

There is, IMHO, a lot outdated in that entry.

Unfortunately, I don't think you'll be able to give clear answers here 
that are not controversial.  So I think you should point out that there 
are different possibilities, with different pros and cons.

For my usage, there is no place for "short" or "long".  There is 
certainly no place for "char" as an integer type.  "char" is, in my 
book, for simple characters.  "int" is fine for local variables and 
parameters - it's convenient for small numbers.

I think these days it is acceptable to assume that "int" is 32-bit, for 
most programmers.  Even in the world of small microcontrollers, 8-bit 
and 16-bit systems are mostly outdated - if you are writing code for 
such a device, you /know/ you are writing code for it.  If you don't 
know you are writing code that must be usable on 16-bit int machines, 
you are not writing such code.

It used to be the case that "int" meant "the fastest type with at least 
16-bit range".  "long" meant "the fastest type with at least 32-bit 
range".  "short" meant "the smallest type with at least 16 bit range".

This is no longer the case in many modern C implementations - in 
particular, it is incorrect in pretty much every 64-bit system.

When working with existing code, consistency is important - programmers 
should follow the style of the rest of the code/project/team unless 
there is very good reason to do otherwise.

When choosing a type, the most important thing is correctness.  The type 
should be big enough to support the range needed, on all the platforms 
that might need to be supported.  As I said above, it is realistic for 
the majority of programmers to assume that their code is for 32-bit or 
greater.  (If you are using Windows API's or POSIX functions, this is 
guaranteed.)  This "int" is fine up to +/-2G, and beyond that, "long 
long int" is the next option.  "long" is useless.

Secondary to correctness comes clarity.  And third in line is efficiency.

The trouble with "clarity" is that it depends on the circumstances. 
"int" is neat and simple, it gets syntax highlighting in most 
editors/IDE's, and has simple formatting conversion specifiers for 
printf and friends.

The size-specific types (int32_t, uint8_t, etc.) are clear in another 
way - they say exactly what size they are.  But the exact size might not 
be the key feature you need, and their formatting specifiers are ugly. 
It should be noted, however, that most modern compiled languages use 
such types (possibly shortened to int32 or i32) rather than non-specific 
sizes.

Then there are the "fast" types, like "int_fast32_t" that are more 
explicit in their requirements and possibly faster (especially on 64-bit 
systems), but more verbose.

(I see no serious use of the "int_least32_t" types - int_leastN_t is 
always the same as intN_t except on very niche targets.)

For local variables and singular stored variables (where size doesn't 
matter), "int" is therefore typically a first choice.  If that is not 
big enough, I'd pick "int64_t" - preferring explicit sizing.  If code 
efficiency is vital, I'd pick "int_fast32_t" over "int" if the code 
needs to be portable to 32-bit and 64-bit targets.

For arrays where size matters, or for any data that will move outside 
the program (file formats, network protocols, etc.), size-specific 
intN_t types are the only sensible choice.

I see no place for "short" anywhere, nor can I think of a reason to 
prefer "long" or "long long" (except for consistency with existing 
code).  "char" is for characters - "signed char" and "unsigned char" are 
meaningless (use int8_t and uint8_t for small integers for storage).

In small systems embedded programming, you generally know the size of 
the target device, and using fixed sized types throughout keeps the code 
clearer.

[toc] | [prev] | [next] | [standalone]

#169276

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-02-15 05:31 -0800
Message-ID	<bc05c662-58fb-4533-b724-9575cd61842an@googlegroups.com>
In reply to	#169275

On Wednesday, 15 February 2023 at 12:55:19 UTC, David Brown wrote:
> 
> When choosing a type, the most important thing is correctness. The type 
> should be big enough to support the range needed, on all the platforms 
> that might need to be supported. As I said above, it is realistic for 
> the majority of programmers to assume that their code is for 32-bit or 
> greater. (If you are using Windows API's or POSIX functions, this is 
> guaranteed.) This "int" is fine up to +/-2G, and beyond that, "long 
> long int" is the next option. "long" is useless. 
> 
> Secondary to correctness comes clarity. And third in line is efficiency. 
> 
Really what matters is interoperability of software parts. Projects seldom
fail because the processor isn't powerful enough to handle the calculations 
due to microefficiency failures. And they seldom fail because types chosen
are too narrow. They usually fail because the interconnections between the
various components become too complex to manage.

The more integer types you have in the program, the more potential for interoperability
problems. Whilst these can be coded round, that defeats the "clarity" objective,
for instance if temporary variables are created just to have an int32_t rather than an
int_fast32_t. With single varibales, it's seldom much or a problem. The real issue is
when you have an array of int_fast32_t s. Then you have to write a loop to convert to
int32_t s to interface to the other component.

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

FAQ list update: Q1.1: choosing types

Contents

#169284

#169273

#169274

#169275

#169276