Path: csiph.com!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Bart Newsgroups: comp.compilers Subject: Re: Bounds checking, Optimization techniques and undefined behavior Date: Mon, 6 May 2019 13:07:30 +0100 Organization: virginmedia.com Lines: 66 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-05-036@comp.compilers> References: <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-006@comp.compilers> <19-05-016@comp.compilers> <19-05-020@comp.compilers> <19-05-024@comp.compilers> <19-05-025@comp.compilers> <19-05-028@comp.compilers> <19-05-031@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="35679"; mail-complaints-to="abuse@iecc.com" Keywords: standards, types, C Posted-Date: 06 May 2019 10:44:31 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <19-05-031@comp.compilers> Content-Language: en-GB Xref: csiph.com comp.compilers:2271 On 05/05/2019 22:38, George Neuner wrote: > On Sun, 5 May 2019 11:14:51 +0100, Bart wrote: >> You intend p to refer to the 4-element slice A[3..6], but how does the >> language know that? How can it stop code from writing to p[5]? > > You declare 'p' as int (*p)[4] and then the compiler could check the > use. Theoretically at least, I'm not sure it actually is done in many > situations. I declare pointers to arrays as T(*)[] when generating C code. But you're right in that no one else does that when writing C. Note that this is an open bound; usually the bound will be dynamic, and held in a separate variable, which the language does not know is the bound. C has something called VLAs, which is really a type where any bounds are defined as a runtime expression. If you had a loop which extracted different slices on each iteration, you would obliged to declare 'p' within the loop, so it has a slightly different type (with different bounds) each time around. But this is very restrictive (for example I don't like using local block scopes). It is also a rather heavyweight feature just to allow the possibility of bounds checking. (Also something I haven't implemented in my own C compiler; I just don't know how to approach it. And I don't like the feature.) Proper slicing (since we are not restricted to C or other existing languages) is simpler and better. >> struct {int a,b,c,d;} S; >> >> p = &S.a; >> >> You intend p to be used to access a,b,c,d as an int[4] array, but p's >> bounds will say it's only one element long. > > The larger problem is that C even permits that. I was half-expecting someone to say it was undefined behaviour. I suppose you will say the way to declare that pointer is as: int (*p)[4] = (int(*)[4])&S.a; The problem is that if you want to make C a safer, checked language, none of this stops people writing it the wrong way. > If you want the > struct elements also to be available as an array, you should have used > a union. Maybe the struct is defined elsewhere and is not your code to change. Or maybe the struct is {int a,b,c[20];}, and you want to treat a, b, c[0], c[1] as an array. The fact is that this is a low level language. You need to be able to do stuff like this. > C has a lot of warts, no question ... but its biggest problem is that > the routine (ab)use of pointers in, so-called, "idiomatic" C in a real > sense is working against the compiler - making it's job much harder. So hard that I wouldn't even attempt it. Creating a more restrictive, safer (or easier to check) language would be easier (IMO).