Re: Scanning versus Parsing

Date	2013-08-15 11:07 +0100
Subject	Re: Scanning versus Parsing
From	Ian van Breda <igvb@btopenworld.com>
Newsgroups	comp.lang.forth
Message-ID	<CE3266E0.396C%igvb@btopenworld.com> (permalink)
References	<CE216AB0.36B5%igvb@btopenworld.com> <5207c6fb$0$15869$e4fe514c@news2.news.xs4all.nl>

Show all headers | View raw

Hans Bezemer wrote on 11/08/2013 18:20:

> Ian van Breda wrote:
>> The universally accepted term for extracting language tokens from the
>> input source code, as elsewhere in computing, is 'scanning'.  In Forth
>> this is done by extracting words delimited by spaces, tabs and line
>> endings.
> I don't agree: I know the process as "tokenizing" (see:
> http://en.wikipedia.org/wiki/Tokenization). I know that very well, because
> tokenizing is a separate process in my 4tH compiler (it not only
> eliminates "double" delimiters and the various whitespaces, it also takes
> care of strings and such) which also classifies tokens to type while doing
> it.
> 
> After that, if parses the whole bunch AND compiles it at the same time by
> simply executing its compile time behavior.
> 
> In Forth similar stuff is done: a whitespace is scanned and then words are
> executed - which IMHO is "analysing a string of symbols, either in natural
> language or in computer languages, according to the rules of a formal
> grammar". In as much Forth HAS a formal grammer.
> 
> E.g. IF is parsed by scanning a whitespace, executing its compile time
> behavior and compiling a "0-BRANCH" instruction. Many things are happening
> at the same time, which causes all this confusion.
> 
> Calling it "scanning" is as much right or wrong as calling it "parsing" (in
> its classic meaning), "execution" or "compilation".
> 
> Note we also say we "compile" a Forth source while true (wordwise)
> compilation is just a small part of the process.
> 
> So, in short: don't try to be a purist when the whole vocabulary concerning
> the process is blurred already. You won't fix anything, at best you're
> confusing things even more.
> 
> Hans Bezemer
> 
Wikipedia indeed refers to tokenisation but it is more commonly known as
scanning, see books by Gries, Fischer et al, Waite et al.  Appel refers to
Lexical Analysis which amounts to the same thing.

All of the above refer to parsing of the formal grammar.  See, for example,
Fischer et al, 'Crafting a Compiler', Chapter 3 'Theory and Practice of
Scanning' and Chapter 4 'Formal Grammars and Parsing', which separate the
scanning/tokenisation from the parsing process.

Forth does not have a formal grammar (e.g. BNF) in the sense that C and
Pascal do.  However, we still need to use, e.g., IF ... ELSE ... THEN, which
must be complete within a single definition.  Many Forth's include some form
of syntax checking, to ensure that the IF/ELSE/THEN is complete and also to
check that the return stack is balanced.  For example, we can use
IF  ... >R  ELSE  ... >R  THEN  ... R> and still balance the return stack.
But this follows both the scanning/tokenisation and parsing processes.

IF and '0-Brach' are something of a red herring in this context, in that
they come later the compiling process.  In particular, it comes under the
general heading of code generation.

It is this lack of a formal grammar (in the BNF sense) that gives Forth its
exceptional power.  For example, a Pascal LL/LR parser can be built on the
BNF formal grammar by simply INCLUDEing a file containing the 'productions'
of the grammar. e.g.
   <program_parameter_list>  ->  '(' <file_variable_list> ')' ;
where *all* of the words are simply Forth definitions held in the
dictionary, a unique feature of Forth.  This can be used to build a series
if parse tables for a variety of languages that are independent of the
platform on which the compiler is run.

In using the terms 'scanning' and, particularly, 'parsing', I am using the
terms as used elsewhere, both in compiler technology and in the theory of
natural languages.  The definition is also as described in The Oxford
Illustrated Dictionary and certainly suffers no 'blurring' or confusion in
its meaning.

This is nothing to do with being a 'purist'.  I am not 'trying to fix'
anything either.  Instead, I am seeking a unification and clarification
between Forth and other languages and for which the meaning is very clear
and unequivocal.  Of course, if you think that this is a waste of time, then
we shall just have to disagree.

Ian van Breda

Back to comp.lang.forth | Previous | Next — Previous in thread | Find similar

Thread

Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-02 13:54 +0100
  Re: Scanning versus Parsing Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-08-02 09:14 -0500
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 09:24 +0100
  Re: Scanning versus Parsing anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2013-08-02 15:25 +0000
  Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 10:28 +0000
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 14:31 +0100
      Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 17:47 +0000
  Re: Scanning versus Parsing Hans Bezemer <the.beez.speaks@gmail.com> - 2013-08-11 19:20 +0200
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-15 11:07 +0100

csiph-web