Groups > comp.compilers > #450 > unrolled thread

Ignore break line sometimes

Started by	Geovani de Souza <geovanisouza92@gmail.com>
First post	2012-02-11 06:56 -0800
Last post	2012-02-14 00:25 +0000
Articles	12 — 12 participants

Back to article view | Back to comp.compilers

  Ignore break line sometimes Geovani de Souza <geovanisouza92@gmail.com> - 2012-02-11 06:56 -0800
    Re: Ignore break line sometimes Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-02-11 17:28 +0100
    Re: Ignore break line sometimes George Neuner <gneuner2@comcast.net> - 2012-02-11 12:59 -0500
    RE: Ignore break line sometimes "Karsten Nyblad" <uu3kw29sb7@snkmail.com> - 2012-02-12 09:21 +0100
      Re: Ignore break line sometimes Kaz Kylheku <kaz@kylheku.com> - 2012-02-13 00:16 +0000
    Re: Ignore break line sometimes Stefan Monnier <monnier@iro.umontreal.ca> - 2012-02-12 10:48 -0500
    Re: Ignore break line sometimes Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-02-12 12:03 -0600
      Re: Ignore break line sometimes Gene Wirchenko <genew@ocis.net> - 2012-02-19 20:57 -0800
        Re: Ignore break line sometimes glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-02-20 08:09 +0000
          Re: Ignore break line sometimes arnold@skeeve.com (Aharon Robbins) - 2012-02-23 21:51 +0000
            Re: Ignore break line sometimes "Jonathan Thornburg" <jthorn@astro.indiana.edu> - 2012-02-27 03:49 +0000
    Re: Ignore break line sometimes "BartC" <bc@freeuk.com> - 2012-02-14 00:25 +0000

#450 — Ignore break line sometimes

From	Geovani de Souza <geovanisouza92@gmail.com>
Date	2012-02-11 06:56 -0800
Subject	Ignore break line sometimes
Message-ID	<12-02-010@comp.compilers>

Hi all!

I'm trying write an parser to my compiler, and I'm interessed to ignore the break line (\n) sometimes. E.g:

if true then [\n]
  foo(); [\n]
end; [\n]

So, in the first line, the '\n' after 'then' isn't important, but in the second "foo();" could replace the need of the semicolon to conclude the statement, or still, in the 'end'.

Too ignore '\n' in the white lines.

How can I do this?

[toc] | [next] | [standalone]

#451

From	Hans-Peter Diettrich <DrDiettrich1@aol.com>
Date	2012-02-11 17:28 +0100
Message-ID	<12-02-011@comp.compilers>
In reply to	#450

Geovani de Souza schrieb:
> I'm trying write an parser to my compiler, and I'm interessed to
> ignore the break line (\n) sometimes. E.g:
>
> if true then [\n] foo(); [\n] end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in
> the second "foo();" could replace the need of the semicolon to
> conclude the statement, or still, in the 'end'.

That's why many (compiled) languages ignore line ends and other
whitespace, and require explicit statement termination, e.g. by a
semicolon. Interpreters instead often prefer the "one statement per
line" approach, with the option to concatenate statements by e.g. a colon.

IMO you should make a decision about the meaning of whitespace in
general, and of line endings in detail, in your language.

Please give an example that would compile differently when linefeeds are
  removed, and then answer yourself the question whether this really
will make sense.

DoDi

[toc] | [prev] | [next] | [standalone]

#452

From	George Neuner <gneuner2@comcast.net>
Date	2012-02-11 12:59 -0500
Message-ID	<12-02-012@comp.compilers>
In reply to	#450

On Sat, 11 Feb 2012 06:56:17 -0800 (PST), Geovani de Souza
<geovanisouza92@gmail.com> wrote:

>I'm trying write an parser to my compiler, and I'm interessed
> to ignore the break line (\n) sometimes. E.g:
>
>if true then [\n]
>  foo(); [\n]
>end; [\n]
>
>So, in the first line, the '\n' after 'then' isn't important, but in
>the second "foo();" could replace the need of the semicolon
>to conclude the statement, or still, in the 'end'.
>
>Too ignore '\n' in the white lines.
>
>How can I do this?

IMO making the newlines significant is a really bad idea ... but
leaving that aside I believe the most effective way would be to have
your lexer return a special "end-of-line" code for either semicolon or
newline and make the end-of-line code optional where it need not be.

You don't say whether your parser is handwritten or tool generated (or
which tools) ... so I can't really give an example.

George

[toc] | [prev] | [next] | [standalone]

#455

From	"Karsten Nyblad" <uu3kw29sb7@snkmail.com>
Date	2012-02-12 09:21 +0100
Message-ID	<12-02-015@comp.compilers>
In reply to	#450

> I'm trying write an parser to my compiler, and I'm interessed to
ignore the break line (\n) sometimes. E.g:
>
> if true then [\n]
>   foo(); [\n]
> end; [\n]

One option is to write a recursive descendent parser, and have two ways
of calling the lexer:  One that return line ends and one that does not.

An other option is to base your parsing on a parser generator like
bison, and modify the code that drives the automaton.  That code is
modified such that when the lexer returns a line feed token, you copy
the stack of states, and on the copy you simulate the actions that the
parser would have taken.  When the simulation stacks the line feed, you
throw away the copy and resume parsing on the real stack with the line
feed in the window.  When the simulation encounters an error, you throw
away the simulation AND the line feed and call the lexer again.

If you chose the second option, it is important that you chose the right
parser generator, because some parser generators already generate code
that can help you.  Many LR parser generators, e.g., bison, include
facilities for generalised LR parsing, and many LL parser generators
include facilities for backtracking.  That might help you.

Karsten Nyblad

[toc] | [prev] | [next] | [standalone]

#458

From	Kaz Kylheku <kaz@kylheku.com>
Date	2012-02-13 00:16 +0000
Message-ID	<12-02-018@comp.compilers>
In reply to	#455

On 2012-02-12, Karsten Nyblad <uu3kw29sb7@snkmail.com> wrote:
> that can help you.  Many LR parser generators, e.g., bison, include
> facilities for generalised LR parsing, and many LL parser generators
> include facilities for backtracking.  That might help you.

General LR and backtracking, just to make semicolons optional when
there are newlines?  LOL.

[toc] | [prev] | [next] | [standalone]

#456

From	Stefan Monnier <monnier@iro.umontreal.ca>
Date	2012-02-12 10:48 -0500
Message-ID	<12-02-016@comp.compilers>
In reply to	#450

> So, in the first line, the '\n' after 'then' isn't important, but in the
> second "foo();" could replace the need of the semicolon to conclude the
> statement, or still, in the 'end'.

A simple approach is to treat every newline as a semi-colon, and then to
adapt your grammar so as to accept (and ignore) extra semi-colons.
I.e. accept "if true then; foo(); ; end; ;"


        Stefan

[toc] | [prev] | [next] | [standalone]

#457

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-02-12 12:03 -0600
Message-ID	<12-02-017@comp.compilers>
In reply to	#450

On 2/11/2012 8:56 AM, Geovani de Souza wrote:
> Hi all!
>
> I'm trying write an parser to my compiler, and I'm interessed to
> ignore the break line (\n) sometimes. E.g:
>
> if true then [\n] foo(); [\n] end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in
> the second "foo();" could replace the need of the semicolon to
> conclude the statement, or still, in the 'end'.

It sounds like you want something like ECMAScript's magic
you-don't-always-need-a-semicolon feature.
<http://bclary.com/2004/11/07/#a-7.9> describes how it works in detail.
The thrust of it is that "if you see an invalid token, but you saw a
newline before, automatically insert a semicolon to fix things."

There are more than a few people who believe that this feature should
not have been implemented.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#463

From	Gene Wirchenko <genew@ocis.net>
Date	2012-02-19 20:57 -0800
Message-ID	<12-02-023@comp.compilers>
In reply to	#457

On Sun, 12 Feb 2012 12:03:13 -0600, Joshua Cranmer
<Pidgeot18@verizon.invalid> wrote:

[snip]

>It sounds like you want something like ECMAScript's magic
>you-don't-always-need-a-semicolon feature.

     But please do not go there.

><http://bclary.com/2004/11/07/#a-7.9> describes how it works in detail.
>The thrust of it is that "if you see an invalid token, but you saw a
>newline before, automatically insert a semicolon to fix things."
>
>There are more than a few people who believe that this feature should
>not have been implemented.

     There is a bit more to this.  As a result of this kludge, it is
illegal to have newlines at certain points in some statements.  For
example:
          return
           <expression which I decided to put all on its own line>;
is not legal.  It is not permitted to have a newline immediately after
"return".

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]

#464

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2012-02-20 08:09 +0000
Message-ID	<12-02-024@comp.compilers>
In reply to	#463

Gene Wirchenko <genew@ocis.net> wrote:
(snip, someone wrote)
>> There are more than a few people who believe that this
>> feature should not have been implemented.

> There is a bit more to this.  As a result of this kludge, it is
> illegal to have newlines at certain points in some statements.
> For example:

>          return
>           <expression which I decided to put all on its own line>;
> is not legal.  It is not permitted to have a newline immediately after
> "return".

Sounds about like the way IBM's JCL from OS/360 and successors works.

You can split a statement after a comma in most cases, and continue
it on the next line, after the usual // and some spaces.

I believe the original (early) versions had a more usual system
with a continuation character in column 72, and then start the
next statement in column 16. I presume it was found hard to get
right so they changed it.

I believe that there are a few other languages with a similar
continuation method. That is, if you end a statement in a legal
end, no continuation is needed.

-- glen

[toc] | [prev] | [next] | [standalone]

#465

From	arnold@skeeve.com (Aharon Robbins)
Date	2012-02-23 21:51 +0000
Message-ID	<12-02-025@comp.compilers>
In reply to	#464

glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>I believe that there are a few other languages with a similar
>continuation method. That is, if you end a statement in a legal
>end, no continuation is needed.

Awk is like this. You can continue after a comma, && or ||.  Possibly
in other places too.  You can supply semicolons to separate statements
on the same line, if you want.

It tends to work fairly naturally in awk, I rarely use \ to continue
onto the next line.  :-)
--
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50 729-7545
D.N. Shimshon 99785	ISRAEL

[toc] | [prev] | [next] | [standalone]

#467

From	"Jonathan Thornburg" <jthorn@astro.indiana.edu>
Date	2012-02-27 03:49 +0000
Message-ID	<12-02-027@comp.compilers>
In reply to	#465

Aharon Robbins <arnold@skeeve.com> wrote:
> Awk is like this. You can continue after a comma, && or ||.  Possibly
> in other places too.  You can supply semicolons to separate statements
> on the same line, if you want.
>
> It tends to work fairly naturally in awk, I rarely use \ to continue
> onto the next line.  :-)

On the other hand pic (Kernighan's picture-drawing "little language")
is very finicky about where it accepts \ line-continuations, allowing
them in some places but forbidding them in others.  For example, the
pic code

for j = 2 to 6 by 2 do { \
  for i = 3 to 7 by 2 do { \
    fine_space_interp_point at grid_point(j,i) } }

does NOT allow a \ line-continuation between either "for" and the
following "{".  (Or more precisely, all my attempts to make such
produced the usual unhelpful pic syntax-error messages.)  :(

--
-- "Jonathan Thornburg
   Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA
   "Washing one's hands of the conflict between the powerful and the
    powerless means to side with the powerful, not to be neutral."
                                      -- quote by Freire / poster by Oxfam

[toc] | [prev] | [next] | [standalone]

#460

From	"BartC" <bc@freeuk.com>
Date	2012-02-14 00:25 +0000
Message-ID	<12-02-020@comp.compilers>
In reply to	#450

"Geovani de Souza" <geovanisouza92@gmail.com> wrote

> I'm trying write an parser to my compiler, and I'm interessed to ignore
> the break line (\n) sometimes. E.g:
>
> if true then [\n]
>  foo(); [\n]
> end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in the
> second "foo();" could replace the need of the semicolon to conclude the
> statement, or still, in the 'end'.
>
> To ignore '\n' in the white lines.

I've tried a few schemes. One just converts a newline to a semicolon,
*unless* the last symbol was (for example) a comma.

This requires some sort of continuation symbol for when a semicolon would be
inappropriate.

And it helps if the grammar is tolerant of extra semicolons, otherwise the
source code could be full of continuation symbols! (After 'then' for
example.)

Whatever scheme you choose, you'll know it works well when you have
thousands of lines of code without a single semicolon, and hardly any
continuations.  And that is perfectly clear to read.

--
Bartc

[toc] | [prev] | [standalone]

csiph-web

Ignore break line sometimes

Contents

#450 — Ignore break line sometimes

#451

#452

#455

#458

#456

#457

#463

#464

#465

#467

#460