Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #450 > unrolled thread

Ignore break line sometimes

Started byGeovani de Souza <geovanisouza92@gmail.com>
First post2012-02-11 06:56 -0800
Last post2012-02-14 00:25 +0000
Articles 12 — 12 participants

Back to article view | Back to comp.compilers


Contents

  Ignore break line sometimes Geovani de Souza <geovanisouza92@gmail.com> - 2012-02-11 06:56 -0800
    Re: Ignore break line sometimes Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-02-11 17:28 +0100
    Re: Ignore break line sometimes George Neuner <gneuner2@comcast.net> - 2012-02-11 12:59 -0500
    RE: Ignore break line sometimes "Karsten Nyblad" <uu3kw29sb7@snkmail.com> - 2012-02-12 09:21 +0100
      Re: Ignore break line sometimes Kaz Kylheku <kaz@kylheku.com> - 2012-02-13 00:16 +0000
    Re: Ignore break line sometimes Stefan Monnier <monnier@iro.umontreal.ca> - 2012-02-12 10:48 -0500
    Re: Ignore break line sometimes Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-02-12 12:03 -0600
      Re: Ignore break line sometimes Gene Wirchenko <genew@ocis.net> - 2012-02-19 20:57 -0800
        Re: Ignore break line sometimes glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-02-20 08:09 +0000
          Re: Ignore break line sometimes arnold@skeeve.com (Aharon Robbins) - 2012-02-23 21:51 +0000
            Re: Ignore break line sometimes "Jonathan Thornburg" <jthorn@astro.indiana.edu> - 2012-02-27 03:49 +0000
    Re: Ignore break line sometimes "BartC" <bc@freeuk.com> - 2012-02-14 00:25 +0000

#450 — Ignore break line sometimes

FromGeovani de Souza <geovanisouza92@gmail.com>
Date2012-02-11 06:56 -0800
SubjectIgnore break line sometimes
Message-ID<12-02-010@comp.compilers>
Hi all!

I'm trying write an parser to my compiler, and I'm interessed to ignore the break line (\n) sometimes. E.g:

if true then [\n]
  foo(); [\n]
end; [\n]

So, in the first line, the '\n' after 'then' isn't important, but in the second "foo();" could replace the need of the semicolon to conclude the statement, or still, in the 'end'.

Too ignore '\n' in the white lines.

How can I do this?

[toc] | [next] | [standalone]


#451

FromHans-Peter Diettrich <DrDiettrich1@aol.com>
Date2012-02-11 17:28 +0100
Message-ID<12-02-011@comp.compilers>
In reply to#450
Geovani de Souza schrieb:
> I'm trying write an parser to my compiler, and I'm interessed to
> ignore the break line (\n) sometimes. E.g:
>
> if true then [\n] foo(); [\n] end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in
> the second "foo();" could replace the need of the semicolon to
> conclude the statement, or still, in the 'end'.

That's why many (compiled) languages ignore line ends and other
whitespace, and require explicit statement termination, e.g. by a
semicolon. Interpreters instead often prefer the "one statement per
line" approach, with the option to concatenate statements by e.g. a colon.

IMO you should make a decision about the meaning of whitespace in
general, and of line endings in detail, in your language.

Please give an example that would compile differently when linefeeds are
  removed, and then answer yourself the question whether this really
will make sense.

DoDi

[toc] | [prev] | [next] | [standalone]


#452

FromGeorge Neuner <gneuner2@comcast.net>
Date2012-02-11 12:59 -0500
Message-ID<12-02-012@comp.compilers>
In reply to#450
On Sat, 11 Feb 2012 06:56:17 -0800 (PST), Geovani de Souza
<geovanisouza92@gmail.com> wrote:

>I'm trying write an parser to my compiler, and I'm interessed
> to ignore the break line (\n) sometimes. E.g:
>
>if true then [\n]
>  foo(); [\n]
>end; [\n]
>
>So, in the first line, the '\n' after 'then' isn't important, but in
>the second "foo();" could replace the need of the semicolon
>to conclude the statement, or still, in the 'end'.
>
>Too ignore '\n' in the white lines.
>
>How can I do this?

IMO making the newlines significant is a really bad idea ... but
leaving that aside I believe the most effective way would be to have
your lexer return a special "end-of-line" code for either semicolon or
newline and make the end-of-line code optional where it need not be.

You don't say whether your parser is handwritten or tool generated (or
which tools) ... so I can't really give an example.

George

[toc] | [prev] | [next] | [standalone]


#455

From"Karsten Nyblad" <uu3kw29sb7@snkmail.com>
Date2012-02-12 09:21 +0100
Message-ID<12-02-015@comp.compilers>
In reply to#450
> I'm trying write an parser to my compiler, and I'm interessed to
ignore the break line (\n) sometimes. E.g:
>
> if true then [\n]
>   foo(); [\n]
> end; [\n]

One option is to write a recursive descendent parser, and have two ways
of calling the lexer:  One that return line ends and one that does not.

An other option is to base your parsing on a parser generator like
bison, and modify the code that drives the automaton.  That code is
modified such that when the lexer returns a line feed token, you copy
the stack of states, and on the copy you simulate the actions that the
parser would have taken.  When the simulation stacks the line feed, you
throw away the copy and resume parsing on the real stack with the line
feed in the window.  When the simulation encounters an error, you throw
away the simulation AND the line feed and call the lexer again.

If you chose the second option, it is important that you chose the right
parser generator, because some parser generators already generate code
that can help you.  Many LR parser generators, e.g., bison, include
facilities for generalised LR parsing, and many LL parser generators
include facilities for backtracking.  That might help you.

Karsten Nyblad

[toc] | [prev] | [next] | [standalone]


#458

FromKaz Kylheku <kaz@kylheku.com>
Date2012-02-13 00:16 +0000
Message-ID<12-02-018@comp.compilers>
In reply to#455
On 2012-02-12, Karsten Nyblad <uu3kw29sb7@snkmail.com> wrote:
> that can help you.  Many LR parser generators, e.g., bison, include
> facilities for generalised LR parsing, and many LL parser generators
> include facilities for backtracking.  That might help you.

General LR and backtracking, just to make semicolons optional when
there are newlines?  LOL.

[toc] | [prev] | [next] | [standalone]


#456

FromStefan Monnier <monnier@iro.umontreal.ca>
Date2012-02-12 10:48 -0500
Message-ID<12-02-016@comp.compilers>
In reply to#450
> So, in the first line, the '\n' after 'then' isn't important, but in the
> second "foo();" could replace the need of the semicolon to conclude the
> statement, or still, in the 'end'.

A simple approach is to treat every newline as a semi-colon, and then to
adapt your grammar so as to accept (and ignore) extra semi-colons.
I.e. accept "if true then; foo(); ; end; ;"


        Stefan

[toc] | [prev] | [next] | [standalone]


#457

FromJoshua Cranmer <Pidgeot18@verizon.invalid>
Date2012-02-12 12:03 -0600
Message-ID<12-02-017@comp.compilers>
In reply to#450
On 2/11/2012 8:56 AM, Geovani de Souza wrote:
> Hi all!
>
> I'm trying write an parser to my compiler, and I'm interessed to
> ignore the break line (\n) sometimes. E.g:
>
> if true then [\n] foo(); [\n] end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in
> the second "foo();" could replace the need of the semicolon to
> conclude the statement, or still, in the 'end'.

It sounds like you want something like ECMAScript's magic
you-don't-always-need-a-semicolon feature.
<http://bclary.com/2004/11/07/#a-7.9> describes how it works in detail.
The thrust of it is that "if you see an invalid token, but you saw a
newline before, automatically insert a semicolon to fix things."

There are more than a few people who believe that this feature should
not have been implemented.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]


#463

FromGene Wirchenko <genew@ocis.net>
Date2012-02-19 20:57 -0800
Message-ID<12-02-023@comp.compilers>
In reply to#457
On Sun, 12 Feb 2012 12:03:13 -0600, Joshua Cranmer
<Pidgeot18@verizon.invalid> wrote:

[snip]

>It sounds like you want something like ECMAScript's magic
>you-don't-always-need-a-semicolon feature.

     But please do not go there.

><http://bclary.com/2004/11/07/#a-7.9> describes how it works in detail.
>The thrust of it is that "if you see an invalid token, but you saw a
>newline before, automatically insert a semicolon to fix things."
>
>There are more than a few people who believe that this feature should
>not have been implemented.

     There is a bit more to this.  As a result of this kludge, it is
illegal to have newlines at certain points in some statements.  For
example:
          return
           <expression which I decided to put all on its own line>;
is not legal.  It is not permitted to have a newline immediately after
"return".

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#464

Fromglen herrmannsfeldt <gah@ugcs.caltech.edu>
Date2012-02-20 08:09 +0000
Message-ID<12-02-024@comp.compilers>
In reply to#463
Gene Wirchenko <genew@ocis.net> wrote:
(snip, someone wrote)
>> There are more than a few people who believe that this
>> feature should not have been implemented.

> There is a bit more to this.  As a result of this kludge, it is
> illegal to have newlines at certain points in some statements.
> For example:

>          return
>           <expression which I decided to put all on its own line>;
> is not legal.  It is not permitted to have a newline immediately after
> "return".

Sounds about like the way IBM's JCL from OS/360 and successors works.

You can split a statement after a comma in most cases, and continue
it on the next line, after the usual // and some spaces.

I believe the original (early) versions had a more usual system
with a continuation character in column 72, and then start the
next statement in column 16. I presume it was found hard to get
right so they changed it.

I believe that there are a few other languages with a similar
continuation method. That is, if you end a statement in a legal
end, no continuation is needed.

-- glen

[toc] | [prev] | [next] | [standalone]


#465

Fromarnold@skeeve.com (Aharon Robbins)
Date2012-02-23 21:51 +0000
Message-ID<12-02-025@comp.compilers>
In reply to#464
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>I believe that there are a few other languages with a similar
>continuation method. That is, if you end a statement in a legal
>end, no continuation is needed.

Awk is like this. You can continue after a comma, && or ||.  Possibly
in other places too.  You can supply semicolons to separate statements
on the same line, if you want.

It tends to work fairly naturally in awk, I rarely use \ to continue
onto the next line.  :-)
--
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50 729-7545
D.N. Shimshon 99785	ISRAEL

[toc] | [prev] | [next] | [standalone]


#467

From"Jonathan Thornburg" <jthorn@astro.indiana.edu>
Date2012-02-27 03:49 +0000
Message-ID<12-02-027@comp.compilers>
In reply to#465
Aharon Robbins <arnold@skeeve.com> wrote:
> Awk is like this. You can continue after a comma, && or ||.  Possibly
> in other places too.  You can supply semicolons to separate statements
> on the same line, if you want.
>
> It tends to work fairly naturally in awk, I rarely use \ to continue
> onto the next line.  :-)

On the other hand pic (Kernighan's picture-drawing "little language")
is very finicky about where it accepts \ line-continuations, allowing
them in some places but forbidding them in others.  For example, the
pic code

for j = 2 to 6 by 2 do { \
  for i = 3 to 7 by 2 do { \
    fine_space_interp_point at grid_point(j,i) } }

does NOT allow a \ line-continuation between either "for" and the
following "{".  (Or more precisely, all my attempts to make such
produced the usual unhelpful pic syntax-error messages.)  :(

--
-- "Jonathan Thornburg
   Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA
   "Washing one's hands of the conflict between the powerful and the
    powerless means to side with the powerful, not to be neutral."
                                      -- quote by Freire / poster by Oxfam

[toc] | [prev] | [next] | [standalone]


#460

From"BartC" <bc@freeuk.com>
Date2012-02-14 00:25 +0000
Message-ID<12-02-020@comp.compilers>
In reply to#450
"Geovani de Souza" <geovanisouza92@gmail.com> wrote

> I'm trying write an parser to my compiler, and I'm interessed to ignore
> the break line (\n) sometimes. E.g:
>
> if true then [\n]
>  foo(); [\n]
> end; [\n]
>
> So, in the first line, the '\n' after 'then' isn't important, but in the
> second "foo();" could replace the need of the semicolon to conclude the
> statement, or still, in the 'end'.
>
> To ignore '\n' in the white lines.

I've tried a few schemes. One just converts a newline to a semicolon,
*unless* the last symbol was (for example) a comma.

This requires some sort of continuation symbol for when a semicolon would be
inappropriate.

And it helps if the grammar is tolerant of extra semicolons, otherwise the
source code could be full of continuation symbols! (After 'then' for
example.)

Whatever scheme you choose, you'll know it works well when you have
thousands of lines of code without a single semicolon, and hardly any
continuations.  And that is perfectly clear to read.

--
Bartc

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web