Groups > comp.compilers > #2484 > unrolled thread

Applesoft tokenization phases?

Started by	"Ev. Drikos" <drikosev@gmail.com>
First post	2020-03-12 17:46 +0200
Last post	2020-03-21 19:42 +0000
Articles	6 — 5 participants

Back to article view | Back to comp.compilers

  Applesoft tokenization phases? "Ev. Drikos" <drikosev@gmail.com> - 2020-03-12 17:46 +0200
    Re: Applesoft tokenization phases? George Neuner <gneuner2@comcast.net> - 2020-03-13 17:55 -0400
    Re: Applesoft tokenization phases? awanderin <awanderin@gmail.com> - 2020-03-16 00:07 -0600
      Re: Applesoft tokenization phases? "Ev. Drikos" <drikosev@gmail.com> - 2020-03-18 00:14 +0200
        Re: Applesoft tokenization phases? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-03-20 07:06 -0400
          Re: Applesoft tokenization phases? Martin Ward <martin@gkc.org.uk> - 2020-03-21 19:42 +0000

#2484 — Applesoft tokenization phases?

From	"Ev. Drikos" <drikosev@gmail.com>
Date	2020-03-12 17:46 +0200
Subject	Applesoft tokenization phases?
Message-ID	<20-03-013@comp.compilers>

Hello,

This question relates to thread "Languages with Optional Spaces".

In an Applesoft II manual I've found at "classiccmp.org" [1], page 7,
we read that in a variable name any alphanumeric characters after the
first two are ignored, unless they contain a reserved world. FEND ie
would be illegal as it contains END.

To implement such a rule one could first recognize keywords and then
recognize any names. We see in p123 that stmt I is tokenized as II:
  I.   stmt: 100 FOR A = LOFT OR CAT  To 15
II. tokens: 100 FOR A = LOF TO RC AT To 15

Yet, I've found ie a program at "hoist-point.com" [2] that contains:
110 DIFF = ABS(A(I)-N)

Also, an online AppleSoft simulator at calormen.com [3] accepts ie both
DIFF and FEND as valid variable names.

As it seems, this issue can affect a design choice for the tokenization
phases of an Applesoft front-end. Is the manual just informative or the
online simulator does not accept (precisely) the particular dialect?


Ev. Drikos


PS: The eight (8) Basic examples in the Compukit UK101 Simulator project
seem to be compatible to Applesoft, although I haven't seen this term in
the site [4] or the computer manual that contains a Basic Reference; not
sure also if the Superboard II accepts a dialect compatible to Applesoft.

------------------------------------------------------------------------
[1] http://www.classiccmp.org/cini/pdf/Apple/
[2] http://www.hoist-point.com/applesoft_basic_tutorial.htm
[3] https://www.calormen.com/jsbasic/
[4] http://uk101.sourceforge.net/docs/index.html

[toc] | [next] | [standalone]

#2486

From	George Neuner <gneuner2@comcast.net>
Date	2020-03-13 17:55 -0400
Message-ID	<20-03-015@comp.compilers>
In reply to	#2484

On Thu, 12 Mar 2020 17:46:00 +0200, "Ev. Drikos" <drikosev@gmail.com>
wrote:

>This question relates to thread "Languages with Optional Spaces".
>
>In an Applesoft II manual I've found at "classiccmp.org" [1], page 7,
>we read that in a variable name any alphanumeric characters after the
>first two are ignored, unless they contain a reserved world. FEND ie
>would be illegal as it contains END.
>
>To implement such a rule one could first recognize keywords and then
>recognize any names. We see in p123 that stmt I is tokenized as II:
>  I.   stmt: 100 FOR A = LOFT OR CAT  To 15
>II. tokens: 100 FOR A = LOF TO RC AT To 15
>
>Yet, I've found ie a program at "hoist-point.com" [2] that contains:
>110 DIFF = ABS(A(I)-N)
>
>Also, an online AppleSoft simulator at calormen.com [3] accepts ie both
>DIFF and FEND as valid variable names.
>
>As it seems, this issue can affect a design choice for the tokenization
>phases of an Applesoft front-end. Is the manual just informative or the
>online simulator does not accept (precisely) the particular dialect?

I recall there being some minor differences between the disk based
AppleSoft BASIC on the Apple][ and ][+ (which required the additional
language card to run) and the ROM AppleSoft BASIC on the //e, //c, and
//gs.  But I no longer recall exactly what those differences were.

[ISTM the Apple /// also had a different ROM BASIC.]

Unfortunately, I no longer have my //e or //gs, or any of my old
AppleSoft BASIC code to look at.  But my [perhaps faulty] recollection
is that they did allow variable names to contain keywords as long as
the name did not begin with the keyword.

FWIW, I remember the //e AppleSoft manual having a different cover
[the //gs did not come with a BASIC manual].

My suspicion is that the manual that is at classiccmp.org is for the
original disk based version, but that the simulator is based on the
later ROM version.

YMMV,
George
[This is drifting into alt.folklore.computers territory. -John]

[toc] | [prev] | [next] | [standalone]

#2487

From	awanderin <awanderin@gmail.com>
Date	2020-03-16 00:07 -0600
Message-ID	<20-03-016@comp.compilers>
In reply to	#2484

"Ev. Drikos" <drikosev@gmail.com> writes:
> This question relates to thread "Languages with Optional Spaces".
>
> In an Applesoft II manual I've found at "classiccmp.org" [1], page 7,
> we read that in a variable name any alphanumeric characters after the
> first two are ignored, unless they contain a reserved world. FEND ie
> would be illegal as it contains END.
>
> To implement such a rule one could first recognize keywords and then
> recognize any names. We see in p123 that stmt I is tokenized as II:
>  I.   stmt: 100 FOR A = LOFT OR CAT  To 15
> II. tokens: 100 FOR A = LOF TO RC AT To 15
>
> Yet, I've found ie a program at "hoist-point.com" [2] that contains:
> 110 DIFF = ABS(A(I)-N)

If you type that into Applesoft BASIC, it parses it as:

   110 D IF F =  ABS (A(I) - N)

The spaces are how Applesoft lists it (it puts spaces around each token;
variables are not parsed as tokens at entry time, they are only parsed
at run-time).


> Also, an online AppleSoft simulator at calormen.com [3] accepts ie both
> DIFF and FEND as valid variable names.

It is doing things differently than actual Applesoft.

> As it seems, this issue can affect a design choice for the tokenization
> phases of an Applesoft front-end. Is the manual just informative or the
> online simulator does not accept (precisely) the particular dialect?

The latter; the simulator accepts a different dialect.

--
Jerry    awanderin at gmail dot com

[toc] | [prev] | [next] | [standalone]

#2488

From	"Ev. Drikos" <drikosev@gmail.com>
Date	2020-03-18 00:14 +0200
Message-ID	<20-03-017@comp.compilers>
In reply to	#2487

On 16/03/2020 08:07, awanderin wrote:
> "Ev. Drikos" <drikosev@gmail.com> writes:
>> ...
>> Yet, I've found ie a program at "hoist-point.com" [2] that contains:
>> 110 DIFF = ABS(A(I)-N)
>
> If you type that into Applesoft BASIC, it parses it as:
>
>     110 D IF F =  ABS (A(I) - N)
>
> The spaces are how Applesoft lists it...
>

Another vague point or simply a point where I'm not really sure that I
translate properly the manual are the reserved keywords before a certain
delimiter. Likely an Applesoft parser must reject this valid UK101 code:

10 X=SHIMEM:
20 END

>> Also, an online AppleSoft simulator at calormen.com [3] accepts ie both
>> DIFF and FEND as valid variable names.
>
> It is doing things differently than actual Applesoft.
>
>> As it seems, this issue can affect a design choice for the tokenization
>> phases of an Applesoft front-end. Is the manual just informative or the
>> online simulator does not accept (precisely) the particular dialect?
>
> The latter; the simulator accepts a different dialect.
> --

I've also read your comment for spacing rules on the Commodore machines.

Thanks.

@everyone

IMHO, if spaces are important then a Lexer can be simpler. One can build
one (L1) with a lexer generator that supports intersection/negation and
just re-scan the DATA statements.

Due to complex spacing rules of AppleSoft II, one could scan in advance:
1. DATA statements by preserving spaces in literals & strings, and
    1.1 Strings & Comments along with the Keyword AT
2. All other Keywords once the remaining spaces have been skipped

Thereafter one has to scan just few more tokens, ie names & delimiters,
yet this task is too simple for a generated Lexer (L2).

With a proprietary tool (Syntaxis) that supports cascaded scanners, I
could model a solution that supports both forms (space optional or not)
by combining 1, 2, and then for space efficiency just reused L1 instead
of L2 for the remaining tokens. The tool built parses several examples
but in any case the implementation is too fresh to be considered stable.

Obviously one could hand code a lexer by following the above 3 phases.

Ev. Drikos

[toc] | [prev] | [next] | [standalone]

#2493

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2020-03-20 07:06 -0400
Message-ID	<20-03-022@comp.compilers>
In reply to	#2488

Jerry Awanderin wrote:
> > Also, an online AppleSoft simulator at calormen.com [3] accepts ie both
> > DIFF and FEND as valid variable names.
>
> It is doing things differently than actual Applesoft.
>
> > As it seems, this issue can affect a design choice for the tokenization
> > phases of an Applesoft front-end. Is the manual just informative or the
> > online simulator does not accept (precisely) the particular dialect?
>
> The latter; the simulator accepts a different dialect.

For situations like this, we added classes (and inheritance) to Yacc++
lexers and parsers, so that one could define a common subset and then
extend it to cover the cases that are unique or different dialects.
However, one can do similar things with lexer states, flag/switch
variables, or a host of other techniques.  You just have to decide how
much mess you are willing to handle and how important supporting the
variations are.

There is a reason, BASIC compilers that remove spaces often restrict
variables (like the original Basic version did) to a letter optionally
followed by a digit (and a dollar sign to indicate strings).  It makes
the lexing much simpler, even without spaces.

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris

[toc] | [prev] | [next] | [standalone]

#2495

From	Martin Ward <martin@gkc.org.uk>
Date	2020-03-21 19:42 +0000
Message-ID	<20-03-024@comp.compilers>
In reply to	#2493

On 17/03/20 22:14, Ev. Drikos wrote:
> Another vague point or simply a point where I'm not really sure that I
> translate properly the manual are the reserved keywords before a certain
> delimiter. Likely an Applesoft parser must reject this valid UK101 code:
>
> 10 X=SHIMEM:
> 20 END

For those less familiar with UK101 BASIC: this code is only
valid because "HIMEM" is not a keyword in UK101 BASIC.
UK101 does not allow keywords as part of a variable.

> There is a reason, BASIC compilers that remove spaces often restrict
> variables (like the original Basic version did) to a letter optionally
> followed by a digit (and a dollar sign to indicate strings).  It makes
> the lexing much simpler, even without spaces.

Lexing is not the issue: the real reason for the restriction
is to make the symbol table much simpler. Each variable in the table
takes up a fixed amount of space. For the UK101, only the first two
characters of a variable are significant, so only two characters
are needed in the symbol table. "Lexing" a statement just consists
of replacing all keywords outside strings by special characters.
When the interpreted parses it, each statement either starts
with a keyword or is an assignment and starts with a variable.

--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

[toc] | [prev] | [standalone]

csiph-web

Applesoft tokenization phases?

Contents

#2484 — Applesoft tokenization phases?

#2486

#2487

#2488

#2493

#2495