Path: csiph.com!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christian Gollwitzer Newsgroups: comp.compilers Subject: Re: Optimization techniques and undefined behavior Date: Thu, 2 May 2019 22:22:29 +0200 Organization: A noiseless patient Spider Lines: 102 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-05-015@comp.compilers> References: <72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk> <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-008@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="53323"; mail-complaints-to="abuse@iecc.com" Keywords: errors, arithmetic Posted-Date: 02 May 2019 18:08:12 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2251 Am 02.05.19 um 16:51 schrieb David Brown: > On 01/05/2019 14:53, Bart wrote: >> On 30/04/2019 13:46, David Brown wrote: >>> It is really incredibly simple (at least in this case).  Do the >>> multiplications using types that won't overflow.  That might be an >>> unsigned type if its range is suitable (not because it has defined >>> overflow behaviour, but use it if it has enough range) or a bigger >>> signed integer type. >> >> That's just kicking the can further down the road. >> > > Yes (especially for the use of a larger signed type).  That's fine.  C > let's you easily kick the can a /long/ way down the road.  How often do > you need integers that will overflow a 64-bit type? > >> If you have two unknown values A and B, and need to multiply, you won't >> know if the result will overflow. >> > First off, how often do you actually have unknown values to deal with? > Usually you know something at least. Well exactly that was my point. In order to parse the file format, you get unknown values, and similar errors led to buffer overflows in the past: https://www.kb.cert.org/vuls/id/523889/ That one was really nasty, because libpng is contained in a lot of security critical software (browsers, web servers handling uploads...). So, this IS a real problem, maybe just not common in the code that you write. > bytes_per_pixel is not going to be more than, say, 16 - that's for > 32-bit per colour, including alpha channel.  No picture format is going > to have more.  Width and height will also be limited - the highest > resolution image sensors are in the range of 120 megapixels.  If you are > talking about a panorama picture sewn together from 5 billion such > camera pictures, you might have a point - and that's just using 64-bit > types. To get back to reality. I chose the PGM format just as an example, because it is so simple that we could actually post the source code of the parser in this message. However, in my previous job, I had to deal with multidimensional scientific data sets. The uncompressed file formats typically have a binary (or even ASCII) header and a block of memory (the so called "raw" format), and there may be up to 5 dimensions. A format close to PGm is NRRD: http://teem.sourceforge.net/nrrd/format.html Software that reads/writes this can be found here: http://fiji.sc/ The total data set size can be as large as your computer's main memory, we had machines with up to 1 TB RAM and data sets regularly with up to 100 GB in size. So let's say you have a 4D data set, then you get, as input x y z w datatype In some formats, if it's only 2 dimensions, then the 3rd and 4th are simply set to 1. Also, there may be oddly shaped "images" with 100,000 pixels in one dimension and only 2 or 3 in another dimension. So the "real" task is: * Read in x, y, z, w and the data type identifier. Each of x,y,z,w must be checked to be within the range of size_t. * Then compute the number of bytes by multiplying these numbers. If it is smaller then the number of bytes in main memory, alloc memory and load the file. The easiest way to do that is with unlimited integers, like in Python: memsize = x*y*z*w*sizeof(datatype) if memsize > mainmemory: error "Out of memory" In C++, you need to jump through strange hoops to detect the overflow. You simply can't limit it such that it will not overflow for 64 bit - because that would mean you restrict each dimension to 65535, which is definitely NOT enough. Instead, what would be needed is a function safe_multiply(x,y) which returns x*y if there is no overflow and throws an error otherwise, so you could write: try { size_t memsize = safe_multiply(x,y,z,w,sizeof(datatype); } catch (overflow_error & err) { std::cerr<<"Not enough memory"; } Even that safe_multiply() is extremely hard to write, unless your compiler supports 128 bit math as an extension. Christian