Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3153 > unrolled thread
| Started by | luser droog <luser.droog@gmail.com> |
|---|---|
| First post | 2022-09-07 09:47 -0700 |
| Last post | 2022-09-15 20:11 -0700 |
| Articles | 10 — 4 participants |
Back to article view | Back to comp.compilers
Wrestling with phase 1 of a C compiler luser droog <luser.droog@gmail.com> - 2022-09-07 09:47 -0700
Re: Wrestling with phase 1 of a C compiler luser droog <luser.droog@gmail.com> - 2022-09-09 20:47 -0700
Re: Wrestling with phase 1 of a C compiler luser droog <luser.droog@gmail.com> - 2022-09-11 20:15 -0700
Wrestling with phase 1 of a C compiler Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-09-12 21:45 +0300
Re: Wrestling with phase 1 of a C compiler gah4 <gah4@u.washington.edu> - 2022-09-12 13:01 -0700
Re: Wrestling with phase 1 of a C compiler Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-09-13 14:55 +0300
Re: Wrestling with phase 1 of a C compiler gah4 <gah4@u.washington.edu> - 2022-09-14 15:40 -0700
Re: source languages, was Wrestling with phase 1 of a C compiler George Neuner <gneuner2@comcast.net> - 2022-09-14 16:03 -0400
Re: Wrestling with phase 1 of a C compiler luser droog <luser.droog@gmail.com> - 2022-09-14 14:31 -0700
Re: Wrestling with phase 1 of a C compiler luser droog <luser.droog@gmail.com> - 2022-09-15 20:11 -0700
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Date | 2022-09-07 09:47 -0700 |
| Subject | Wrestling with phase 1 of a C compiler |
| Message-ID | <22-09-001@comp.compilers> |
At my tedious glacial pace, I have rewritten my parser library
for the umpteen-plus-one'th time only to stall out at an earlier
step than where I stalled out the last time around.
I'm trying to do phase 1 of the C compilation, which is just recognizing
newlines in the input.
The input is modeled as a lazy list which calls fgetc() to produce
integers as needed. Right now I have a tiny parser to recognize
the possible line termination sequences and normalize them to
a single newline.
static parser
position_grammar( void ){
return either( bind( ANY( str("\r\n"),
chr('\r'),
chr('\n') ),
Operator( NIL_, new_line ) ),
item() );
}
static object
new_line( list env, object input ){
return Int('\n');
}
So, I can run this parser and peel out the integer from the result.
And then I'm wrapping the result with this function to couple
each byte with its (row,col) information.
static list
position( object item ){
static int row = 0,
col = 0;
if( valid( eq_int( '\n', item ) ) )
return cons( item, cons( Int( ++ row ), Int( col = 0 ) ) );
else
return cons( item, cons( Int( row ), Int( ++ col ) ) );
}
But ... I guess my problem is the lack of functional programming
tools in the C language, which I already knew, and is nobody's fault
but my own. But I'm not happy with the static variables for row and col.
I don't have monadic sequencing to help route state through my
function graphs.
But ... can I extract the "position counting" part out and do it by
zipping the input stream with an iota stream to provide counting?
This feels like the right direction, but I'm not sure how to reset the
column counter when a newline is recognized. Has anyone navigated
these weeds before and blazed any trails?
[toc] | [next] | [standalone]
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Date | 2022-09-09 20:47 -0700 |
| Message-ID | <22-09-002@comp.compilers> |
| In reply to | #3153 |
On Wednesday, September 7, 2022 at 4:01:04 PM UTC-5, luser droog wrote: > At my tedious glacial pace, I have rewritten my parser library > for the umpteen-plus-one'th time only to stall out at an earlier > step than where I stalled out the last time around. Sorry for the noise. I'm prematurely optimizing, aren't I? The function that works just fine and is already reasonably short and readable is just fine for now. Folding and iotas will probably be a fun idea to try -- on the next rewrite. On to possibly important problems.... [It occurred to me that if you want to write in a functional language style, doing it in C is really painful. -John]
[toc] | [prev] | [next] | [standalone]
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Date | 2022-09-11 20:15 -0700 |
| Message-ID | <22-09-003@comp.compilers> |
| In reply to | #3154 |
On Saturday, September 10, 2022 at 1:36:57 PM UTC-5, luser droog wrote:
> On Wednesday, September 7, 2022 at 4:01:04 PM UTC-5, luser droog wrote:
> > At my tedious glacial pace, I have rewritten my parser library
> > for the umpteen-plus-one'th time only to stall out at an earlier
> > step than where I stalled out the last time around.
> Sorry for the noise. I'm prematurely optimizing, aren't I? The function
> that works just fine and is already reasonably short and readable is
> just fine for now. Folding and iotas will probably be a fun idea to try --
> on the next rewrite. On to possibly important problems....
> [It occurred to me that if you want to write in a functional language style,
> doing it in C is really painful. -John]
That's an accurate assessment. But it's pain I invited upon myself.
I'm hoping that a hot fire will forge some strong steel, or that it's
worth doing because it is hard. But it all depends upon my ability
to emulate or simulate the functional stuff as if it were the C runtime
for a Lisp interpreter that happens not to have a read() or eval(),
just print() and constructors and functions for the objects.
The most recent review post links back to previous reviews all the way
back to an "object oriented" implementation.
https://codereview.stackexchange.com/questions/277769/commented-parser-combinators-in-lisp-style-c
For the present task of pulling the state out of the function, the
obvious solution was to rewrite it using pointers and handle the
state variables externally, letting the calling function decide where
they live.
static list
position( object item, int *row, int *col ){
if( valid( eq_int( '\n', item ) ) )
return cons( item, cons( Int( ++ *row ), Int( *col = 0 ) ) );
else
return cons( item, cons( Int( *row ), Int( ++ *col ) ) );
}
For the longer term, I know I want a Monad and maybe Monad Transformers,
too. But I don't understand how they're implemented yet. Maybe I can do some
kind of monadic comprehension or "do" notation with the C preprocessor....
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2022-09-12 21:45 +0300 |
| Message-ID | <22-09-004@comp.compilers> |
| In reply to | #3153 |
While I hate being extremely pessimistic, you are going in absolutely the wrong direction. If you want a functional language, use one. Don't try to turn C into a functional language. Even if you succeed, you will have made a mess that no one (probably not even you) will actually understand. I say that from experience. Early on in writing Yacc++, we did macros that allowed us to write object-oriented C, which had C++-like semantics (ala 1986, pre-templates, pre STL, etc.) with nice ABC (abstract base classes, plus some inheritance and polymorphism). It worked. However, it was a challenge to understand. Our wisest move was at 2.0 rewriting everything in C++ and not trying to have a "C" layer with similar functionality. The code was an order-of-magnitude simpler and even making a translation that did C# was relatively trivial from that code. C is a nice small imperative language. It's fine for expressing those kinds of semantics. The C preprocessor is both simple and powerful, but it doesn't change the nature of C. You cannot really do "compilation" in the C preprocessor. And, if you really want a parser combinator library, you want it to do some compilation. Otherwise, you have just masked the fact that you are doing hand-written recursive descent. And, if you want to do hand-written recursive descent, do it. Don't put lipstick on a pig and take it to the opera. There is a reason your attempts are failing. You don't have good closures or co-routines in C. You cannot get them easily. You cannot really interact well with the function stack in C. And, your efforts to put the state behind pointers, while necessary only get you part of the way there. So, if you are lucky, you will have something that looks like a parser combinator language, but which actually has either - broken semantics (i.e. cases that don't work properly and don't warn you that they don't) or - is very complicated. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-09-12 13:01 -0700 |
| Message-ID | <22-09-005@comp.compilers> |
| In reply to | #3156 |
On Monday, September 12, 2022 at 12:46:47 PM UTC-7, christoph...@compiler-resources.com wrote: (snip) > C is a nice small imperative language. It's fine for expressing those > kinds of semantics. The C preprocessor is both simple and powerful, but it > doesn't change the nature of C. You cannot really do "compilation" in the > C preprocessor. On of the earlier languages I knew, and maybe still favorite, is PL/I. PL/I does have a powerful preprocessor, though I don't know so many actually using its power. It even has preprocessor procedures, if you need them. The most use of the power I have seen, is one for unrolling DO loops, which unrolls smaller loops, but not larger ones, with either a preprocessor %DO, or a real DO. I do remember, though, first knowing abuot no preprocessor in Java, and the trend of PL/I to C to Java, in decreasing preprocessor power. [PL/I had its charms but I wouldn't want to write functional code in it, either. -John]
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2022-09-13 14:55 +0300 |
| Message-ID | <22-09-006@comp.compilers> |
| In reply to | #3157 |
I never used the PL/I preprocessor myself. However, at my first job (at SofTech), my mentor (Carl Martin) made some PL/I macros that *loosely* translated a subset of Jovial into Multics PL/I. It was still PL/I semantics, but Jovial syntax. That allowed us to write our Jovial compilers in that subset of Jovial and get it working before it could become self-hosting. But notice that these were mostly minor syntactic changes and no error checking, no semantic changes, etc. It was only used for one project and one team and two compilers (one targeting Multics and the other targeting the Interdata 8/32). So, the fact that it wasn't very robust wasn't an issue. This is very different than trying to make a functional library in an imperative language. ---------- On a related note, I have heard stories from C++ compiler implementors about the various template libraries that have been created which attempt to do "Turing machine" style (NP-complete) computations via types and parameters, where the users wonder why the compilation process takes much longer than running the resultant program. This is the kind of mess one makes when one sees a hammer and treats screws as nails. Yes, with enough force you can pound a screw into wood (or a wall) but the result is NOT a good fastener. Screws are threaded for a reason and nails are not for the complementary reason. Different semantics. I rarely create my own C++ templates nor Rust macros for that reason. It is too easy to create things that look clever but are essentially undecipherable. As the saying goes: "With great power comes great responsibility." Kind regards, Chris -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-09-14 15:40 -0700 |
| Message-ID | <22-09-009@comp.compilers> |
| In reply to | #3158 |
On Wednesday, September 14, 2022 at 1:25:39 PM UTC-7, christoph...@compiler-resources.com wrote: (snip) > On a related note, I have heard stories from C++ compiler > implementors about the various template libraries that have been created > which attempt to do "Turing machine" style (NP-complete) computations via > types and parameters, where the users wonder why the compilation process > takes much longer than running the resultant program. Many years ago, there was a story about a Fortran benchmark program, very complicated with lots of statement functions evaluating many complicated expressions. (I probably don't have to mention the problems with designing good benchmarks here. This was years ago.) Then it was run though the IBM OS/360 Fortran H compiler. Among others, it expands statement functions inline, and does constant expression evaluation. It did the whole thing at compile time, except printing out a single number. Fortran now requires many complicated constant expressions be evaluated at compile time, and some have come up with some, though not intentionally, that evaluate very slowly. (OK, maybe some were intentional.)
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2022-09-14 16:03 -0400 |
| Subject | Re: source languages, was Wrestling with phase 1 of a C compiler |
| Message-ID | <22-09-007@comp.compilers> |
| In reply to | #3157 |
On Mon, 12 Sep 2022 13:01:21 -0700 (PDT), gah4 <gah4@u.washington.edu> wrote: > : >PL/I does have a powerful preprocessor, though I don't know so many >actually using its power. It even has preprocessor procedures, if you >need them. > : Back in the day I would have reached for Lisp ... certainly for rapid prototyping and/or experimentation with new compilation techniques. The trouble with Lisp in the (distant) past was the high cost of a workstation capable of running it acceptably. That no longer is an issue, so Lisp can be an excellent choice for compiler development. For various reasons I prefer Scheme over Lisp, so for a modern "batteries-included" Scheme environment I would reach for Racket. Certainly mileage varies, but unless you are hell bent on maximum performance [how many people *really* derive benefit from being able to compile 10K lines/second/core?], in my opinion almost any modern HLL would be a better choice than C for writing a compiler. George
[toc] | [prev] | [next] | [standalone]
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Date | 2022-09-14 14:31 -0700 |
| Message-ID | <22-09-008@comp.compilers> |
| In reply to | #3156 |
On Monday, September 12, 2022 at 2:46:47 PM UTC-5, christoph...@compiler-resources.com wrote: > While I hate being extremely pessimistic, you are going in absolutely the > wrong direction. > > If you want a functional language, use one. Don't try to turn C into a > functional language. Even if you succeed, you will have made a mess that > no one (probably not even you) will actually understand. > You may be right. But, but but ... I'm in too deep. And I've adopted a "larger scale methodology" where every time it becomes a mess, I stop. I go back to the papers by Hutton et al. I make new prototypes in my favorite dynamic language. And then I go back and merge the new ideas with the best parts of the old code and produce a new draft. It's getting better. And my last comment about using the preprocessor might have given a false clue about what I've been doing. I'm using a few preprocessor tricks (PP_NARG, X-Macros) but their usage has also evolved through the rewrites. And I daresay the usage of preprocessor in my code is fairly tame and readable. Particularly if I were to throw in a link to my SO post which explains exactly how I use the X-Macros: https://stackoverflow.com/questions/6635851/real-world-use-of-x-macros/6636596#6636596 > I say that from experience. Early on in writing Yacc++, we did macros that > allowed us to write object-oriented C, which had C++-like semantics (ala > 1986, pre-templates, pre STL, etc.) with nice ABC (abstract base classes, > plus some inheritance and polymorphism). It worked. However, it was a > challenge to understand. Our wisest move was at 2.0 rewriting everything > in C++ and not trying to have a "C" layer with similar functionality. The > code was an order-of-magnitude simpler and even making a translation that > did C# was relatively trivial from that code. > > C is a nice small imperative language. It's fine for expressing those > kinds of semantics. The C preprocessor is both simple and powerful, but it > doesn't change the nature of C. You cannot really do "compilation" in the > C preprocessor. And, if you really want a parser combinator library, you > want it to do some compilation. Otherwise, you have just masked the fact > that you are doing hand-written recursive descent. And, if you want to do > hand-written recursive descent, do it. Don't put lipstick on a pig and > take it to the opera. > You're right. I'm not planning on doing it in the preprocessor, but compilation is one of the big challenges I haven't yet conquered. I have the book by Burge which has a good description of the algorithms I'll need (I think). But my in-memory representation might be too clumsy to use directly. I may need to introduce a symbolic intermediate representation that's easier to do rewrites upon. You're right that (for now) it's just recursive descent in disguise. But already it's a pretty guise. The C interface is fairly lightweight and transparent, iff you can accept the practice of typedefing pointers and treating certain types as having reference semantics. > There is a reason your attempts are failing. Perhaps I'm in too deep to see it this way. But it seems to me that each draft has failed in a slightly different manner. >You don't have good closures or co-routines in C. I think my closures are pretty good. Serviceable, at least. It's an object that couples a function pointer with it's saved environment. I don't have co-routines, but I have lists where a node can be a "suspension" which can transparently (?) invoke the closure to produce elements of the list lazily. So I have a kind of continuation passing which serves many of the same uses that I would have for co-routines (unless I'm wrong). > You cannot get them easily. You cannot really > interact well with the function stack in C. That's true. But I can build up graphs of functions and then call the the thing to run all the doodads. >And, your efforts to put the > state behind pointers, while necessary only get you part of the way there. That's also true. For the present case, I'll also need to dynamically allocate integer objects and keep them in the environment for the calling function which is one of these closures, a suspension function that converts the first element of a stream and returns a list with a suspension in the cdr. That's the only way I'll get row and column counters to exist on a per-file basis. > So, if you are lucky, you will have something that looks like a parser > combinator language, but which actually has either > - broken semantics (i.e. cases that don't work properly and don't warn you > that they don't) > or > - is very complicated. > If it fails, it fails. I'll try to learn something from it and try something else. You may indeed be right. But I've steered past the island of the proprocessor sirens. It's a danger of different kind. Function spaghetti, or maybe "Function Orzo"(tm).
[toc] | [prev] | [next] | [standalone]
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Date | 2022-09-15 20:11 -0700 |
| Message-ID | <22-09-010@comp.compilers> |
| In reply to | #3160 |
On Thursday, September 15, 2022 at 11:17:50 AM UTC-5, luser droog wrote:
> On Monday, September 12, 2022 at 2:46:47 PM UTC-5, christoph...@compiler-resources.com wrote:
> >And, your efforts to put the
> > state behind pointers, while necessary only get you part of the way there.
> That's also true. For the present case, I'll also need to dynamically allocate
> integer objects and keep them in the environment for the calling
> function which is one of these closures, a suspension function that converts
> the first element of a stream and returns a list with a suspension in the cdr.
>
> That's the only way I'll get row and column counters to exist on a per-file
> basis.
So here's the rest of it. This ought to do the whole phase 1 of the C compilation
process while also supplementing each byte with its row and column numbers.
And it holds the state in the local environment in the closure,
although it has to create a new environment for each iteration because I don't
have a function for updating definitions (--supposed to be "functional" after all,
as much as practical).
(the header file defines the names POS_ROW, POS_COL, and POS_INPUT in an enum.)
static fSuspension force_chars_with_positions;
static list position( object item, int *row, int *col );
static parser position_grammar( void );
static fOperator new_line;
list
chars_with_positions( list input ){
return Suspension( env( NIL_, 3,
Symbol(POS_ROW), Int( 0 ),
Symbol(POS_COL), Int( 0 ),
Symbol(POS_INPUT), input ),
force_chars_with_positions );
}
list
force_chars_with_positions( list ev ){
list input = assoc_symbol( POS_INPUT, ev );
integer row = assoc_symbol( POS_ROW, ev );
integer col = assoc_symbol( POS_COL, ev );
static parser position_parser;
if( ! position_parser ) position_parser = position_grammar();
object result = parse( position_parser, input );
if( not_ok( result ) ) return rest( rest( result ) );
object payload = rest( result );
list pos = position( first( payload ), &row->Int.i, &col->Int.i );
return cons( pos,
Suspension( env( NIL_, 3,
Symbol(POS_ROW), row,
Symbol(POS_COL), col,
Symbol(POS_INPUT), rest( payload ) ),
force_chars_with_positions ) );
}
static list
position( object item, int *row, int *col ){
if( valid( eq_int( '\n', item ) ) )
return cons( item, cons( Int( ++ *row ), Int( *col = 0 ) ) );
else
return cons( item, cons( Int( *row ), Int( ++ *col ) ) );
}
static parser
position_grammar( void ){
return either( bind( ANY( str("\r\n"),
chr('\r'),
chr('\n') ),
Operator( NIL_, new_line ) ),
item() );
}
static object
new_line( list env, object input ){
return Int('\n');
}
I think it's pretty nice and readable, while hiding some of the magic.
One big unsolved issue is how to use a GC in C without access to the
stack. My only solution is to only call the GC from the top level, or
otherwise carefully cultivating the root set and calling from near the
top level where the root set is easy to manage. But as this is off topic
here, I'd invite any thoughts about user space GC over in comp.lang.c.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web