Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #123231 > unrolled thread
| Started by | jacobnavia <jacob@jacob.remcomp.fr> |
|---|---|
| First post | 2017-11-21 23:52 +0100 |
| Last post | 2017-12-15 09:18 -0800 |
| Articles | 20 on this page of 91 — 21 participants |
Back to article view | Back to comp.lang.c
NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-21 23:52 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-11-21 15:16 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 00:38 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-11-21 16:02 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 01:13 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-11-21 16:52 -0800
Re: NULL as the empty string Robert Wessel <robertwessel2@yahoo.com> - 2017-11-21 18:09 -0600
Re: NULL as the empty string Siri Cruise <chine.bleu@yahoo.com> - 2017-11-21 16:34 -0800
Re: NULL as the empty string David Brown <david.brown@hesbynett.no> - 2017-11-22 12:12 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-21 15:57 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 01:06 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-22 15:42 -0800
Re: NULL as the empty string Melzzzzz <Melzzzzz@zzzzz.com> - 2017-11-22 23:49 +0000
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-22 15:56 -0800
Re: NULL as the empty string Melzzzzz <Melzzzzz@zzzzz.com> - 2017-11-23 00:06 +0000
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-23 17:31 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-24 09:42 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-24 13:47 -0800
Re: NULL as the empty string Jorgen Grahn <grahn+nntp@snipabacken.se> - 2017-11-22 06:46 +0000
Re: NULL as the empty string John Bode <jfbode1029@gmail.com> - 2017-12-08 10:27 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 11:11 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-08 21:39 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 13:03 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-08 22:50 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 15:19 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-09 00:35 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 16:05 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-09 01:22 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 17:39 -0800
Re: NULL as the empty string John Bode <jfbode1029@gmail.com> - 2017-12-11 12:22 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-09 01:29 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-08 17:47 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-09 07:05 +0100
Re: NULL as the empty string David Brown <david.brown@hesbynett.no> - 2017-12-09 18:37 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-09 11:53 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-12 10:49 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-12 13:39 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-12 16:05 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-13 03:43 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-13 08:45 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-13 09:12 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-13 13:27 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-13 14:02 -0800
Re: NULL as the empty string asetofsymbols@gmail.com - 2017-12-13 14:58 -0800
Re: NULL as the empty string asetofsymbols@gmail.com - 2017-12-13 15:11 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-14 03:49 -0800
Re: NULL as the empty string mark.bluemel@gmail.com - 2017-12-14 04:05 -0800
Re: NULL as the empty string David Brown <david.brown@hesbynett.no> - 2017-12-14 13:09 +0100
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-14 05:02 -0800
Re: NULL as the empty string David Brown <david.brown@hesbynett.no> - 2017-12-14 14:54 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-14 07:38 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-14 09:50 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-14 09:20 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-14 09:53 -0800
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-14 12:57 -0800
Re: NULL as the empty string herrmannsfeldt@gmail.com - 2017-12-14 17:22 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-14 17:26 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-08 21:23 +0100
Re: NULL as the empty string Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2017-12-08 13:41 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-08 22:54 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-21 15:17 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 00:26 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-11-21 16:03 -0800
Re: NULL as the empty string "Pascal J. Bourguignon" <pjb@informatimago.com> - 2017-11-22 00:27 +0100
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 00:42 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-11-21 16:05 -0800
Re: NULL as the empty string herrmannsfeldt@gmail.com - 2017-12-06 22:33 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-07 12:04 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-07 23:20 +0100
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-07 15:04 -0800
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-11-21 15:28 -0800
Re: NULL as the empty string Thiago Adams <thiago.adams@gmail.com> - 2017-11-21 16:04 -0800
Re: NULL as the empty string Siri Cruise <chine.bleu@yahoo.com> - 2017-11-21 16:25 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-11-22 01:34 +0100
Re: NULL as the empty string bartc <bc@freeuk.com> - 2017-11-22 00:36 +0000
Re: NULL as the empty string Öö Tiib <ootiib@hot.ee> - 2017-11-21 23:07 -0800
NULL as the empty string asetofsymbols@gmail.com - 2017-11-23 22:23 -0800
Re: NULL as the empty string Geoff <geoff@invalid.invalid> - 2017-12-09 09:05 -0800
Re: NULL as the empty string Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2017-12-09 12:40 -0500
Re: NULL as the empty string gordonb.yj0bc@burditt.org (Gordon Burditt) - 2017-12-09 13:50 -0600
Re: NULL as the empty string Ian Collins <ian-news@hotmail.com> - 2017-12-10 08:59 +1300
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-09 12:22 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-11 01:42 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-10 19:20 -0800
Re: NULL as the empty string jacobnavia <jacob@jacob.remcomp.fr> - 2017-12-11 18:56 +0100
Re: NULL as the empty string Keith Thompson <kst-u@mib.org> - 2017-12-11 11:19 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-15 09:29 -0800
Re: NULL as the empty string Thiago Adams <thiago.adams@gmail.com> - 2018-01-05 08:28 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2018-01-05 09:37 -0800
Re: NULL as the empty string Thiago Adams <thiago.adams@gmail.com> - 2018-01-05 17:08 -0800
Re: NULL as the empty string supercat@casperkitty.com - 2017-12-15 09:18 -0800
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2017-12-13 09:12 -0800 |
| Message-ID | <lnh8suo8kl.fsf@kst-u.example.com> |
| In reply to | #124277 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
> Assuming that NaN represents missing values, taking the mean makes sense.
> Simply ignore those values. Taking the sum is a slippier concept and
> you don't want it handled at hardware level or in the language definition.
> It belongs in the high-level code written by statistical people, who
> may not know much about optimising compilers.
The mean is by definition the sum divided by the count. *If* the
specification calls for NaNs to be ignored, then they should be
ignored whether you're computing a sum or a mean. (If all entries
are NaNs, then I suppose the sum would be 0 and the mean would
be undefined.)
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-13 13:27 -0800 |
| Message-ID | <542d2a4c-15fb-4040-a766-4638b66c7ea4@googlegroups.com> |
| In reply to | #124293 |
On Wednesday, December 13, 2017 at 5:13:10 PM UTC, Keith Thompson wrote: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > [...] > > Assuming that NaN represents missing values, taking the mean makes sense. > > Simply ignore those values. Taking the sum is a slippier concept and > > you don't want it handled at hardware level or in the language definition. > > It belongs in the high-level code written by statistical people, who > > may not know much about optimising compilers. > > The mean is by definition the sum divided by the count. *If* the > specification calls for NaNs to be ignored, then they should be > ignored whether you're computing a sum or a mean. (If all entries > are NaNs, then I suppose the sum would be 0 and the mean would > be undefined.) > For "sum", it makes more sense to add the mean rather than 0 if NaNs represent missing values.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2017-12-13 14:02 -0800 |
| Message-ID | <ln8te6nv5z.fsf@kst-u.example.com> |
| In reply to | #124305 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
> On Wednesday, December 13, 2017 at 5:13:10 PM UTC, Keith Thompson wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>> [...]
>> > Assuming that NaN represents missing values, taking the mean makes sense.
>> > Simply ignore those values. Taking the sum is a slippier concept and
>> > you don't want it handled at hardware level or in the language definition.
>> > It belongs in the high-level code written by statistical people, who
>> > may not know much about optimising compilers.
>>
>> The mean is by definition the sum divided by the count. *If* the
>> specification calls for NaNs to be ignored, then they should be
>> ignored whether you're computing a sum or a mean. (If all entries
>> are NaNs, then I suppose the sum would be 0 and the mean would
>> be undefined.)
>>
> For "sum", it makes more sense to add the mean rather than 0 if NaNs
> represent missing values.
So the sum of (2, NaN, 4) should be 9?
If that's what the specification calls for fine, but I can't imagine any
reason to assume it. The whole idea of a NaN is that it propagates
through calculations. 2+NaN+4 should be Nan -- or *maybe* 6 if you
decide to ignore NaNs.
For that matter, the "+" operator computes the some of (a sequence of)
two values. Should 2+NaN yield 2?
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | asetofsymbols@gmail.com |
|---|---|
| Date | 2017-12-13 14:58 -0800 |
| Message-ID | <cdd2bd75-4ec4-4f88-af23-aecef44b77b8@googlegroups.com> |
| In reply to | #124306 |
b=2+sqrt(c*d/e)-3 If c*d/e=NAN or infinite (one error symbol) b has to be NAN, the error has to propagate until the result is print so one see there is some error the time print the result of b, or some number build on b.
[toc] | [prev] | [next] | [standalone]
| From | asetofsymbols@gmail.com |
|---|---|
| Date | 2017-12-13 15:11 -0800 |
| Message-ID | <6c435236-529b-4faa-bf25-5c417b56f240@googlegroups.com> |
| In reply to | #124308 |
This can be ok in detect overflow too for unsigned if one value of the range of unsigned is reserved for error (example -1== 0xFFFFFFFF) the error 0xFFFFFFFF can be show in the end result, because it propagated thru formulas For example x=b*c+d and b*c overflow unsigned=> x==-1 unsigned x=NAN => x==-1
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-14 03:49 -0800 |
| Message-ID | <cf245e3f-5d19-468e-9941-39e28e0a8810@googlegroups.com> |
| In reply to | #124306 |
On Wednesday, December 13, 2017 at 10:02:52 PM UTC, Keith Thompson wrote: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > > On Wednesday, December 13, 2017 at 5:13:10 PM UTC, Keith Thompson wrote: > >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > >> [...] > >> > Assuming that NaN represents missing values, taking the mean makes sense. > >> > Simply ignore those values. Taking the sum is a slippier concept and > >> > you don't want it handled at hardware level or in the language definition. > >> > It belongs in the high-level code written by statistical people, who > >> > may not know much about optimising compilers. > >> > >> The mean is by definition the sum divided by the count. *If* the > >> specification calls for NaNs to be ignored, then they should be > >> ignored whether you're computing a sum or a mean. (If all entries > >> are NaNs, then I suppose the sum would be 0 and the mean would > >> be undefined.) > >> > > For "sum", it makes more sense to add the mean rather than 0 if NaNs > > represent missing values. > > So the sum of (2, NaN, 4) should be 9? > > If that's what the specification calls for fine, but I can't imagine any > reason to assume it. The whole idea of a NaN is that it propagates > through calculations. 2+NaN+4 should be Nan -- or *maybe* 6 if you > decide to ignore NaNs. > There's an argument for NaN because NaN propagates. But that's not very helpful for most real datasets. If NaN represents missing data, and the missing data is drawn at random from the same population as the present data, 9 is our best estimate of what the sum of three values will be. > > For that matter, the "+" operator computes the some of (a sequence of) > two values. Should 2+NaN yield 2? > It just depends. Sometimes yes, for example we are counting things and we want to know how many we have definitely identified. If we're taxing apples, and we don't have any data for farmer Giles' apple orchard, but we do for farmer Joe, we can send a tax demand for farmer Joe, but farmer Giles will have to be flagged up as missing and sent a demand later when the information comes in. However if we know the area each farmer has under cultivation, and we know that the harvest varies from year to year but is fairly constant per tree per year, we can obtain a much better estimate than simply assigning the missing farmers the mean. Then maybe farmers with big farms take longer to count their apples than farmers with small farms, so the missing data is not in fact a random sample of the population - you have to be very alert to that sort of thing. The point I'm making is that this shouldn't be handled at machine or language level. It's a much higher-level consideration than that.
[toc] | [prev] | [next] | [standalone]
| From | mark.bluemel@gmail.com |
|---|---|
| Date | 2017-12-14 04:05 -0800 |
| Message-ID | <3408d343-aa5e-480f-a6f7-38f58cf2a57a@googlegroups.com> |
| In reply to | #124315 |
On Thursday, 14 December 2017 11:49:29 UTC, Malcolm McLean wrote: > On Wednesday, December 13, 2017 at 10:02:52 PM UTC, Keith Thompson wrote: > > So the sum of (2, NaN, 4) should be 9? > If NaN represents missing data, and the > missing data is drawn at random from the same population as the > present data, 9 is our best estimate of what the sum of three values > will be. Heh! Not in my case, as the set should be 2 , 2.82842712 , 4 (there is a reasonable coherent explanation, which I will leave the reader to derive).
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2017-12-14 13:09 +0100 |
| Message-ID | <p0tpkv$is3$1@dont-email.me> |
| In reply to | #124315 |
On 14/12/17 12:49, Malcolm McLean wrote: > On Wednesday, December 13, 2017 at 10:02:52 PM UTC, Keith Thompson wrote: >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>> On Wednesday, December 13, 2017 at 5:13:10 PM UTC, Keith Thompson wrote: >>>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>>> [...] >>>>> Assuming that NaN represents missing values, taking the mean makes sense. >>>>> Simply ignore those values. Taking the sum is a slippier concept and >>>>> you don't want it handled at hardware level or in the language definition. >>>>> It belongs in the high-level code written by statistical people, who >>>>> may not know much about optimising compilers. >>>> >>>> The mean is by definition the sum divided by the count. *If* the >>>> specification calls for NaNs to be ignored, then they should be >>>> ignored whether you're computing a sum or a mean. (If all entries >>>> are NaNs, then I suppose the sum would be 0 and the mean would >>>> be undefined.) >>>> >>> For "sum", it makes more sense to add the mean rather than 0 if NaNs >>> represent missing values. >> >> So the sum of (2, NaN, 4) should be 9? >> >> If that's what the specification calls for fine, but I can't imagine any >> reason to assume it. The whole idea of a NaN is that it propagates >> through calculations. 2+NaN+4 should be Nan -- or *maybe* 6 if you >> decide to ignore NaNs. >> > There's an argument for NaN because NaN propagates. But that's not very > helpful for most real datasets. If NaN represents missing data, and the > missing data is drawn at random from the same population as the > present data, 9 is our best estimate of what the sum of three values > will be. NaN is used to indicate an error - that something has gone wrong in your calculations, data was invalid, out of bounds, etc. There are only two sensible options that I can see - propagate it, or use it to cause a trap, exception, error message, early exit, etc. It would be crazy to try to define behaviour for it like you are suggesting - because there is /no/ correct behaviour to define. Are you seriously trying to tell us that you want 2 + NaN + 4 to equal 9 when you write it like that, because you want the NaN to be the average of 2 and 4, but in the expression 2 + NaN + 4 + 9 you would want the NaN to be 5 as the average of 2, 4 and 9, and therefore 2 + NaN + 4 is now 11? Sometimes you want to work with datasets with potentially missing, bad, or inaccurate data. A single floating point number is not sufficient there - nor is normal floating point arithmetic. You need more sophisticated tracking of the metadata that depends on the task in hand. >> >> For that matter, the "+" operator computes the some of (a sequence of) >> two values. Should 2+NaN yield 2? >> > The point I'm making is that this shouldn't be handled at machine > or language level. It's a much higher-level consideration than that. > Exactly. Thus it makes no sense whatsoever to give a definition of how to treat NaNs (other than as signalling an error). It has to be handled at a higher level - not by using the mean, or a random sample, or 0, or whatever.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-14 05:02 -0800 |
| Message-ID | <ba6098b6-98ed-404e-81e2-60e83a2b2be3@googlegroups.com> |
| In reply to | #124318 |
On Thursday, December 14, 2017 at 12:09:11 PM UTC, David Brown wrote: > On 14/12/17 12:49, Malcolm McLean wrote: > > > Sometimes you want to work with datasets with potentially missing, bad, > or inaccurate data. A single floating point number is not sufficient > there - nor is normal floating point arithmetic. You need more > sophisticated tracking of the metadata that depends on the task in hand. > For example I've got a csv file loader that contains a function double csv_readfield(CSV *csv, int column, int row); it returns NaN if the data is missing, which is allowed in CSV files. Whilst at some level that's an error, quite likely caller expects missing values. But NaN won't always mean "missing data", it can also mean data which is invalid in some way, e.g. someone tried to take the square root of a negative number > > > The point I'm making is that this shouldn't be handled at machine > > or language level. It's a much higher-level consideration than that. > > > > Exactly. Thus it makes no sense whatsoever to give a definition of how > to treat NaNs (other than as signalling an error). It has to be handled > at a higher level - not by using the mean, or a random sample, or 0, or > whatever. > So "sum" is a poor name for our function. Really we want sum_finite (the sum of all finite numbers in the list), sum_estimate (replacing NaNs with the mean as best guess), sum_arithmetical (generating a NaN if passed NaNs or both + and - infinity). Which one we choose depends on what the data actually means, if we just look at it as a list of context-free real numbers there is no answer.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2017-12-14 14:54 +0100 |
| Message-ID | <p0tvr6$r5$1@dont-email.me> |
| In reply to | #124319 |
On 14/12/17 14:02, Malcolm McLean wrote: > On Thursday, December 14, 2017 at 12:09:11 PM UTC, David Brown wrote: >> On 14/12/17 12:49, Malcolm McLean wrote: >>> >> Sometimes you want to work with datasets with potentially missing, bad, >> or inaccurate data. A single floating point number is not sufficient >> there - nor is normal floating point arithmetic. You need more >> sophisticated tracking of the metadata that depends on the task in hand. >> > For example I've got a csv file loader that contains a function > > double csv_readfield(CSV *csv, int column, int row); > > it returns NaN if the data is missing, which is allowed in CSV files. > Whilst at some level that's an error, quite likely caller expects > missing values. If the caller expects missing data, then the function is badly specified - because it has no way to return that information. NaN is not the same as missing data, just as missing data is not the same as 0. > But NaN won't always mean "missing data", it can also mean data > which is invalid in some way, e.g. someone tried to take the square root > of a negative number Exactly. You clearly understand the problem - you are simply failing to appreciate the consequences of it, or ways to avoid it. When you have a function declared like the one above, there is no way to indicate these different possibilities for a lack of valid data. Thus the higher level code that calls the function, can't make any decision on it. If it receives a NaN from the function, all it can do is give up with an error "bad data". In some circumstances, that's fine of course. But what it /cannot/ sensibly do is decide on some arbitrary interpretation. It does not know if the NaN is due to a missing number, a syntax error, a real NaN in the CSV file, an invalid row or column, or any one of a number of mistakes. >> >>> The point I'm making is that this shouldn't be handled at machine >>> or language level. It's a much higher-level consideration than that. >>> >> >> Exactly. Thus it makes no sense whatsoever to give a definition of how >> to treat NaNs (other than as signalling an error). It has to be handled >> at a higher level - not by using the mean, or a random sample, or 0, or >> whatever. >> > So "sum" is a poor name for our function. Really we want sum_finite > (the sum of all finite numbers in the list), sum_estimate (replacing > NaNs with the mean as best guess), sum_arithmetical (generating a > NaN if passed NaNs or both + and - infinity). Which one we choose > depends on what the data actually means, if we just look at it as > a list of context-free real numbers there is no answer. > That would be one way to handle things. You need to /specify/ your functions appropriately. You have to decide if you are going to return generic errors, or pass back more information (like distinguishing between forms of missing or incorrect data), or treat the missing data in some specific way. The one thing that never makes sense is to pick an arbitrary method and fail to document it. As an example of how to get things wrong, try putting a table like this into a spreadsheet and drawing a graph: x y =====|====== 1 1000 2 1001 3 999 4 =1/0 5 1002 6 1001 7 1000 8 1001 Do that in LibreOffice Calc, and the graph will scale the y axis to around 1000, and show a pair of broken lines in full detail. Do it in MS Excel, and the program will treat the bad value as 0 and give you a useless graph scaled 0 to around 1000 on the y axis.
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2017-12-14 07:38 -0800 |
| Message-ID | <2f3ef77f-7dcf-497b-8137-22e4e32838d2@googlegroups.com> |
| In reply to | #124318 |
On Thursday, December 14, 2017 at 6:09:11 AM UTC-6, David Brown wrote: > On 14/12/17 12:49, Malcolm McLean wrote: > > There's an argument for NaN because NaN propagates. But that's not very > > helpful for most real datasets. If NaN represents missing data, and the > > missing data is drawn at random from the same population as the > > present data, 9 is our best estimate of what the sum of three values > > will be. > > NaN is used to indicate an error - that something has gone wrong in your > calculations, data was invalid, out of bounds, etc. There are only two > sensible options that I can see - propagate it, or use it to cause a > trap, exception, error message, early exit, etc. If one is expecting that data will be complete, the fact that part of it isn't shows a problem. If, however, one expects that data might not be complete but wants to process what one can, being able to handle those parts that are complete and effectively ignore the ones that aren't may be useful. If one has asked everyone to send in a report of how many apples they will be able to supply, some people may report that they can't send in any, while others might simply not answer. Distinguishing "replied zero" from "didn't reply" will be useful for some purposes, but if one wants a lower bound on the number of apples to expect, a total that assumes people who didn't reply won't send any may be more useful than simply saying NaN unless or until all replies are received.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-14 09:50 -0800 |
| Message-ID | <94bef1d8-c7ea-4068-a176-3955dddc0619@googlegroups.com> |
| In reply to | #124324 |
On Thursday, December 14, 2017 at 3:38:32 PM UTC, supe...@casperkitty.com wrote: > On Thursday, December 14, 2017 at 6:09:11 AM UTC-6, David Brown wrote: > > On 14/12/17 12:49, Malcolm McLean wrote: > > > There's an argument for NaN because NaN propagates. But that's not very > > > helpful for most real datasets. If NaN represents missing data, and the > > > missing data is drawn at random from the same population as the > > > present data, 9 is our best estimate of what the sum of three values > > > will be. > > > > NaN is used to indicate an error - that something has gone wrong in your > > calculations, data was invalid, out of bounds, etc. There are only two > > sensible options that I can see - propagate it, or use it to cause a > > trap, exception, error message, early exit, etc. > > If one is expecting that data will be complete, the fact that part of it > isn't shows a problem. If, however, one expects that data might not be > complete but wants to process what one can, being able to handle those > parts that are complete and effectively ignore the ones that aren't may be > useful. > > If one has asked everyone to send in a report of how many apples they will > be able to supply, some people may report that they can't send in any, while > others might simply not answer. Distinguishing "replied zero" from "didn't > reply" will be useful for some purposes, but if one wants a lower bound on > the number of apples to expect, a total that assumes people who didn't reply > won't send any may be more useful than simply saying NaN unless or until all > replies are received. > Yes, "we didn't harvest any apples at all this year due to a fungal disease" is quite different from "I don't know how many apples we will harvest because the season is late this year".
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2017-12-14 09:20 -0800 |
| Message-ID | <ln4lotns58.fsf@kst-u.example.com> |
| In reply to | #124315 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
> On Wednesday, December 13, 2017 at 10:02:52 PM UTC, Keith Thompson wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
>> > For "sum", it makes more sense to add the mean rather than 0 if NaNs
>> > represent missing values.
>>
>> So the sum of (2, NaN, 4) should be 9?
>>
>> If that's what the specification calls for fine, but I can't imagine any
>> reason to assume it. The whole idea of a NaN is that it propagates
>> through calculations. 2+NaN+4 should be Nan -- or *maybe* 6 if you
>> decide to ignore NaNs.
>>
> There's an argument for NaN because NaN propagates. But that's not very
> helpful for most real datasets. If NaN represents missing data, and the
> missing data is drawn at random from the same population as the
> present data, 9 is our best estimate of what the sum of three values
> will be.
As I said, that's fine if that's what the specification calls for.
I'd never consider returning a "sum" of 9 (which is a guess relying
on speculation about what the NaN entry means) without an explicit
specification.
[...]
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2017-12-14 09:53 -0800 |
| Message-ID | <1ba8a8bf-208e-4fb6-9638-6d450a4681e3@googlegroups.com> |
| In reply to | #124328 |
On Thursday, December 14, 2017 at 11:20:17 AM UTC-6, Keith Thompson wrote: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > > On Wednesday, December 13, 2017 at 10:02:52 PM UTC, Keith Thompson wrote: > >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > [...] > >> > For "sum", it makes more sense to add the mean rather than 0 if NaNs > >> > represent missing values. > >> > >> So the sum of (2, NaN, 4) should be 9? > >> > >> If that's what the specification calls for fine, but I can't imagine any > >> reason to assume it. The whole idea of a NaN is that it propagates > >> through calculations. 2+NaN+4 should be Nan -- or *maybe* 6 if you > >> decide to ignore NaNs. > >> > > There's an argument for NaN because NaN propagates. But that's not very > > helpful for most real datasets. If NaN represents missing data, and the > > missing data is drawn at random from the same population as the > > present data, 9 is our best estimate of what the sum of three values > > will be. > > As I said, that's fine if that's what the specification calls for. > I'd never consider returning a "sum" of 9 (which is a guess relying > on speculation about what the NaN entry means) without an explicit > specification. Naturally, a function to compute a sum while ignoring NaN values should document that behavior. The choice to use such a function rather than one which would indicate the lack of complete data should be made by the high- level application, but the choice of what functions a language or framework should provide must be made at the language or framework level. The decision to include or omit a function with given semantics should be based on how often such semantics would be useful, and not be affected by how many situations would exist where they would be inappropriate. If there are many cases where other semantics would be more helpful, that would imply that a function should also exist with those semantics. If two different kinds of semantics would each be useful in many cases, a good language or framework should provide both.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-14 12:57 -0800 |
| Message-ID | <a397bb27-36a7-4bc1-bb8a-70f4a131b74d@googlegroups.com> |
| In reply to | #124330 |
On Thursday, December 14, 2017 at 5:53:26 PM UTC, supe...@casperkitty.com wrote: > Naturally, a function to compute a sum while ignoring NaN values should > document that behavior. The choice to use such a function rather than one > which would indicate the lack of complete data should be made by the high- > level application, but the choice of what functions a language or framework > should provide must be made at the language or framework level. The decision > to include or omit a function with given semantics should be based on how > often such semantics would be useful, and not be affected by how many > situations would exist where they would be inappropriate. If there are > many cases where other semantics would be more helpful, that would imply > that a function should also exist with those semantics. If two different > kinds of semantics would each be useful in many cases, a good language or > framework should provide both. > A naive "sum" function is simply an addition loop. So what happens when passed a NaN? To estimate from the mean implies that the function takes two passes over the data, which is too hard for a compiler to implement. So the choices are to treat NaN as zero, or to propagate NaNs. Neither is ideal but propagation of NaNs at least signals an error condition.
[toc] | [prev] | [next] | [standalone]
| From | herrmannsfeldt@gmail.com |
|---|---|
| Date | 2017-12-14 17:22 -0800 |
| Message-ID | <c34a8ea0-661e-4834-996c-8b109d09b6c7@googlegroups.com> |
| In reply to | #124293 |
On Wednesday, December 13, 2017 at 9:13:10 AM UTC-8, Keith Thompson wrote: (snip) > The mean is by definition the sum divided by the count. *If* the > specification calls for NaNs to be ignored, then they should be > ignored whether you're computing a sum or a mean. (If all entries > are NaNs, then I suppose the sum would be 0 and the mean would > be undefined.) And the mean of zero values is?
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2017-12-14 17:26 -0800 |
| Message-ID | <lnshccn5na.fsf@kst-u.example.com> |
| In reply to | #124343 |
herrmannsfeldt@gmail.com writes:
> On Wednesday, December 13, 2017 at 9:13:10 AM UTC-8, Keith Thompson wrote:
>
> (snip)
>
>> The mean is by definition the sum divided by the count. *If* the
>> specification calls for NaNs to be ignored, then they should be
>> ignored whether you're computing a sum or a mean. (If all entries
>> are NaNs, then I suppose the sum would be 0 and the mean would
>> be undefined.)
>
> And the mean of zero values is?
Undefined, unless you're working with a specification that says
otherwise. (Were you expecting something else?)
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | jacobnavia <jacob@jacob.remcomp.fr> |
|---|---|
| Date | 2017-12-08 21:23 +0100 |
| Message-ID | <p0esck$n2o$1@dont-email.me> |
| In reply to | #124016 |
Le 08/12/2017 à 19:27, John Bode a écrit : > On Tuesday, November 21, 2017 at 5:16:29 PM UTC-6, Keith Thompson wrote: >> jacobnavia <jacob@jacob.remcomp.fr> writes: >>> Whaat would happen if we decide to give meaning to NULL? >> >> NULL (more precisely a null pointer) has a meaning. It's a pointer >> value that doesn't point to anything. >> > > I think I get what Jacob is getting at - instead of NULL being a macro that expands > to a 0, have it be a special keyword with special, context-dependent semantics. That is, > a call against strlen or strcmp with NULL would be interpreted differently by the compiler, > which would emit code that immediately returned a 0 or false result, without having to > actually evaluate anything, or store an actual empty string or pointer to an empty > string. > > If that's what Jacob actually means, well, it feels to me like a solution to a problem that > isn't really a problem. It's a micro-optimization. Yes, if you have thousands of distinct > pointers to thousands of distinct empty strings, it will add up. But, having thousands of > distinct pointers to thousands of distinct empty strings sounds like a fairly esoteric use > case to begin with. > Yes, it was rather a question of principle. NULL has many uses, and one of them is "absence of data". I.e. an empty string. It is empty, no data is there. NULL is different from any other string since it is empty. This is a micro-optimization obviously, and with the GB of RAM around, it would take a gargantuan number of "\0" to make any difference.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2017-12-08 13:41 -0800 |
| Message-ID | <a47d8a8f-b6e7-428c-8866-084e4c8f98d8@googlegroups.com> |
| In reply to | #124022 |
On Friday, December 8, 2017 at 8:24:06 PM UTC, jacobnavia wrote: > > Yes, it was rather a question of principle. NULL has many uses, and one > of them is "absence of data". I.e. an empty string. It is empty, no data > is there. > Another meaning is "invalid pointer". You lost that meaning for null char *s if you say that a null char * is the same as the empty string. Callee is of course still free to interpret an invalid pointer as the empty string if it makes sense, but it needs a code patch. Formations such as strcpy(NULL, NULL) should probably be allowed and be defined as No-ops. But that's a strcpy interface question.
[toc] | [prev] | [next] | [standalone]
| From | jacobnavia <jacob@jacob.remcomp.fr> |
|---|---|
| Date | 2017-12-08 22:54 +0100 |
| Message-ID | <p0f1mt$ubd$1@dont-email.me> |
| In reply to | #124025 |
Le 08/12/2017 à 22:41, Malcolm McLean a écrit : > Formations such as strcpy(NULL, NULL) should probably be allowed and > be defined as No-ops. But that's a strcpy interface question. Truth table: strcpy(str,str1) --> the same as now strcpy(NULL,str) --> Crash. The rest of the code assumes it is working on a copy but no space can be obtained by strcpy, so a crash is needed. strcpy(str,NULL) --> Sets the first byte of str to zero. strcpy(NULL,NULL) --> No-Op.
[toc] | [prev] | [next] | [standalone]
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
Back to top | Article view | comp.lang.c
csiph-web