Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: George Neuner Newsgroups: comp.compilers Subject: Re: Is it the job of a parser to validate the input data? Date: Thu, 12 Aug 2021 09:34:00 -0400 Organization: A noiseless patient Spider Lines: 37 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <21-08-012@comp.compilers> References: <21-08-010@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="55462"; mail-complaints-to="abuse@iecc.com" Keywords: parse, syntax Posted-Date: 12 Aug 2021 11:52:54 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2697 On Wed, 11 Aug 2021 22:24:49 +0000, Roger L Costello wrote: >There are many data formats which contain things like this: > >A number, N >N occurrences of something > >For example, 3 followed by the names of three students: > >3 >John Doe >Sally Smith >Judy Jones > >I have a question about parsing such data. Is it the job of a parser to ensure >that the number of student names matches the number? Or, is it the job of the >parser to merely tokenize whatever is in the input and then create an abstract >syntax tree containing the tokens? > >I imagine you will tell me, "it depends". But what is typically the case? It's the job of a parser to ensure that the input's syntax is correct. What that means exactly is up to the developer. If you consider that in your 'language' a list consists of a number followed by exactly that many strings ... well then you could argue that the parser should enforce that. However, as John mentioned, often it is difficult to generate really meaningful error messages during parsing. I would contend that in your example the /syntax/ of lists is really is a number followed by zero or more strings (number string*), and that verifying the string count is semantics, not syntax. I believe that, whenever possible, semantics are best left until after parsing is finished. YMMV, George