Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #24933

Scanning versus Parsing

Date 2013-08-02 13:54 +0100
Subject Scanning versus Parsing
From Ian van Breda <igvb@btopenworld.com>
Newsgroups comp.lang.forth
Message-ID <CE216AB0.36B5%igvb@btopenworld.com> (permalink)

Show all headers | View raw


The terms 'parse' and 'parsing' are widely use in the documentation on
both in the original ANS Standard of 1994, [1], and in the draft
Proposal, [2].
 
However, the term 'parsing', in both the treatment of natural language
and the theory of computer languages, implies analysis of the structure
of a sentence or language construct. In computing it refers to checking
that the tokens extracted from the source code of a program follow the
grammatical rules of the language [3] and [4]. There are many examples
in the literature.
 
For example, parsing applies to the 'productions' that form the grammar
of a language, such as in a while statement in Pascal
 
   <while_statement>  ->  while <boolean_expression> do <statement> ;
 
which follows the rule that a while statement must begin with a 'while'
token, followed by a 'boolean expression'.  This in turn is followed by
a 'do' token, followed by a 'statement', which itself may be a compound
statement.
 
The universally accepted term for extracting language tokens from the
input source code, as elsewhere in computing, is 'scanning'.  In Forth
this is done by extracting words delimited by spaces, tabs and line
endings.
 
This suggests that the vast majority of the descriptive text in the
proposed Standard needs to be changed from variants of 'parse' to their
'scan' equivalents
 
We cannot change PARSE itself, as it is now cast in stone (the penalty
for inaccuracy in setting up the ANS Standard). However, it is possible
to adopt SCAN-NAME in place of PARSE-NAME. It would also be useful to have
a multi-line SCAN which bypasses comments.
 
The argument that it has always been done this way is surely not valid
in the face of the fundamental usage of the term, parsing, elsewhere:
Forth cannot ignore the outside world and, at least in this case, it is
irrefutably common practice to use scanning for this type of process.
Forth seems to be unique in using the term parsing instead of scanning.
 
This problem is highlighted when we try to use Forth to generate parsers
for other languages, for which Forth works extremely well.
 
 
[1] ANS Forth (ANS X3.215-1994 Information Systems ­ Programming Languages
   FORTH), 1994.
[2] Forth Standards Committee. Forth 200x Draft 11.1. 29th February 2012.
[3] C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.
   Pearson Education, Inc, 2010.
[4] I.G. van Breda. Building an LR parser for Pascal using Forth.
   EuroForth 2012.

Ian van Breda 
 

Back to comp.lang.forth | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-02 13:54 +0100
  Re: Scanning versus Parsing Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-08-02 09:14 -0500
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 09:24 +0100
  Re: Scanning versus Parsing anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2013-08-02 15:25 +0000
  Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 10:28 +0000
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 14:31 +0100
      Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 17:47 +0000
  Re: Scanning versus Parsing Hans Bezemer <the.beez.speaks@gmail.com> - 2013-08-11 19:20 +0200
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-15 11:07 +0100

csiph-web