Path: csiph.com!eternal-september.org!feeder.eternal-september.org!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: George Neuner Newsgroups: comp.compilers Subject: Re: Bounds checking, Optimization techniques and undefined behavior Date: Sun, 05 May 2019 17:38:21 -0400 Organization: A noiseless patient Spider Lines: 81 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-05-031@comp.compilers> References: <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-006@comp.compilers> <19-05-016@comp.compilers> <19-05-020@comp.compilers> <19-05-024@comp.compilers> <19-05-025@comp.compilers> <19-05-028@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="18529"; mail-complaints-to="abuse@iecc.com" Keywords: design, debug Posted-Date: 05 May 2019 21:40:59 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2266 On Sun, 5 May 2019 11:14:51 +0100, Bart wrote: >On 04/05/2019 10:45, Andy Walker wrote: >> On 03/05/2019 23:10, Bart wrote: > >>> C is just a mess; it has arrays of sorts, but people generally use raw >>> pointers without associated bounds. Maybe that's one reason why your C >>> didn't have it. Or did it somehow manage it if enabled? >> >>     This isn't really a problem with C, the language.  It's clear in >> the reference manual right back to K&R C and in the various standards >> that pointers always have associated bounds. > >But how do they get there? Take this: > > int A[10], *p; > p = &A[3]; > >You intend p to refer to the 4-element slice A[3..6], but how does the >language know that? How can it stop code from writing to p[5]? You declare 'p' as int (*p)[4] and then the compiler could check the use. Theoretically at least, I'm not sure it actually is done in many situations. But few programmers even take the trouble to declare the pointers appropriately. >Or you intend to index p[-2] to get at the preceding elements. Actually >using negative indexing is quite common, but surely all array bounds in >C are presumed to start from 0? > >Or this: > > struct {int a,b,c,d;} S; > > p = &S.a; > >You intend p to be used to access a,b,c,d as an int[4] array, but p's >bounds will say it's only one element long. The larger problem is that C even permits that. If you want the struct elements also to be available as an array, you should have used a union. >Or this: > > int *p = malloc(sizeof(int)*1000); > > int *q = p+400; > >You are allocating one pool of memory than sub-allocating that into >smaller objects, here into a 20-element array headed by q. But how does >the language know that? Again 'q' is not declared appropriately such that the compiler *could* check it. >With language support, it need have no cost. For example, suppose that >array A did carry its bounds with it (or are statically known), then in >code like this: > > for i in A do # (iterate over bounds not values) > A[i] := 0 > end > >the compiler knows it doesn't need to bounds-check each access. Or here: > > forall x in A do # (iterate over values) > print x > end C has a lot of warts, no question ... but its biggest problem is that the routine (ab)use of pointers in, so-called, "idiomatic" C in a real sense is working against the compiler - making it's job much harder. George