Re: Scanning versus Parsing

Date	2013-08-06 09:24 +0100
Subject	Re: Scanning versus Parsing
From	Ian van Breda <igvb@btopenworld.com>
Newsgroups	comp.lang.forth
Message-ID	<CE267166.3731%igvb@btopenworld.com> (permalink)
References	<CE216AB0.36B5%igvb@btopenworld.com> <8IWdnZJHcowtI2bMnZ2dnUVZ_h2dnZ2d@supernews.com>

Show all headers | View raw

Andrew Haley wrote on 02/08/2013 15:14:

> Ian van Breda <igvb@btopenworld.com> wrote:
> 
>> The universally accepted term for extracting language tokens from
>> the input source code, as elsewhere in computing, is 'scanning'.  In
>> Forth this is done by extracting words delimited by spaces, tabs and
>> line endings.
>> 
>> This suggests that the vast majority of the descriptive text in the
>> proposed Standard needs to be changed from variants of 'parse' to
>> their 'scan' equivalents
> 
> No, it doesn't.  The term is defined in Section 2.1, Definitions of
> terms, and is consistent with historic Forth usage.  There are many
> terms used in Forth which are not consistent with the rest of computer
> science: "word" is one such.

Irrespective of what the standard may say, my dictionary describes parsing
as: 'Describe (word) grammatically, stating inflexion, relation to sentence,
etc.; resolve (sentence) into component parts of speech and describe them
grammatically'.

'Word' does indeed have many meanings in common use, so is not relevant in
this context.

>> The argument that it has always been done this way is surely not
>> valid in the face of the fundamental usage of the term, parsing,
>> elsewhere: Forth cannot ignore the outside world and, at least in
>> this case, it is irrefutably common practice to use scanning for
>> this type of process.
> 
> I take your point, but it's hardly the job of the committee to
> determine what language Forthers use.  We could change to match
> commonplace CS use, at the cost of being incosistent with all the
> Forth literature.  That's not a good trade-off.
> 
>> Forth seems to be unique in using the term parsing instead of
>> scanning.
> 
> That's true, but Forth uses its own terminology for many things, and I
> don't think we should break with forty years of usage.  Vive la
> difference!
> 

Unfortunately, the term 'parsing', as used in Forth (no grammatical
context), conflicts with the term as used in other languages and also in the
theory of natural languages.  I have a number of books on compiling, all
starting with scanning followed by parsing of the grammar as a separate
activity: Gries, Fischer et al, Wait et al, Appel, Jensen at al., typically
use a BNF form of grammar, or something similar.  This is the only conflict
of this type in either the ANS Standard or in the proposed standard that I
can see.

It was clear to me, in the context of large telescope systems and in the
laboratory, that it was necessary to be able accommodate a variety of
languages in the Forth environment, particularly, in respect of imaging
systems, where there is a variety of libraries and numerical algorithms
available.

One of the main complaints made by astronomers was that Forth cannot in be
easily integrated with other languages.  However, there is a problem with
*other* languages in that they generally us a single stack for both data and
return addresses.  Forth has a considerable advantage in using separate data
and return stacks, particularly allowing compound definitions to be
implemented.  However, this is not a problem *if* the other language uses a
separate data stack, in which case, we can integrate the two approaches.
This is a failing in the *implementation* of other languages.

Another advantage of Forth over other operating systems, is in it's use of
multi-tasking by its very nature: the original Forth came with multi-tasking
as part and parcel of the system, both terminal and background tasks.  In
this context, it was easy to implement time-slicing.  This allows the system
to respond to interrupts very efficiently for both low level and high level
events.  By using a task-specific user-table means that the response to
interrupts is as good as it gets.

The problem here was to build a compiler for other languages which could be
integrated into Forth.  The cornerstones in generating such a compiler are
scanning and parsing.

For the first part, scanning, is particularly easy in Forth, where the
source text is done (mostly) by splitting it into 'words', but in languages
such as C or Pascal, a rather more complicated process is needed, e.g.
   x=y+z;
will need to be separated into six tokens, but all juxtaposed in this case.

The second part is to treat the grammar as a BNF file that can be simply
INCLUDEd.  Each 'production' is similar to a series of named colon-style
definitions, one for each type of 'nonterminal' (left-hand side).  It is a
bit more complicated than a series of colon definitions in that there may
have more that one 'production' for a given definition and two or more
productions may begin with the the same phrase.

This can be used for to generate parse tables in Forth for an given language
that uses a BNF-style of grammar.  The resulting tables can both be used in
Forth itself or on any other platform and can be used to generates compilers
for different languages.

The result is both LL and LR compatible - usually a simplified version of LR
compatibility is used but Forth generates fully LR compatible tables.

The problem comes with the use of parsing instead of scanning where the
Forth standard meets the other languages head-on.

Of course, it sounds better if you use 'parsing' instead of the rather more
mundane 'scanning'.

Thread

Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-02 13:54 +0100
  Re: Scanning versus Parsing Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-08-02 09:14 -0500
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 09:24 +0100
  Re: Scanning versus Parsing anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2013-08-02 15:25 +0000
  Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 10:28 +0000
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-06 14:31 +0100
      Re: Scanning versus Parsing albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-08-06 17:47 +0000
  Re: Scanning versus Parsing Hans Bezemer <the.beez.speaks@gmail.com> - 2013-08-11 19:20 +0200
    Re: Scanning versus Parsing Ian van Breda <igvb@btopenworld.com> - 2013-08-15 11:07 +0100

csiph-web