Path: csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: luser droog Newsgroups: comp.compilers Subject: Wrestling with phase 1 of a C compiler Date: Wed, 7 Sep 2022 09:47:29 -0700 (PDT) Organization: Compilers Central Lines: 52 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-09-001@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="31723"; mail-complaints-to="abuse@iecc.com" Keywords: C, parse, question Posted-Date: 07 Sep 2022 17:01:00 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3153 At my tedious glacial pace, I have rewritten my parser library for the umpteen-plus-one'th time only to stall out at an earlier step than where I stalled out the last time around. I'm trying to do phase 1 of the C compilation, which is just recognizing newlines in the input. The input is modeled as a lazy list which calls fgetc() to produce integers as needed. Right now I have a tiny parser to recognize the possible line termination sequences and normalize them to a single newline. static parser position_grammar( void ){ return either( bind( ANY( str("\r\n"), chr('\r'), chr('\n') ), Operator( NIL_, new_line ) ), item() ); } static object new_line( list env, object input ){ return Int('\n'); } So, I can run this parser and peel out the integer from the result. And then I'm wrapping the result with this function to couple each byte with its (row,col) information. static list position( object item ){ static int row = 0, col = 0; if( valid( eq_int( '\n', item ) ) ) return cons( item, cons( Int( ++ row ), Int( col = 0 ) ) ); else return cons( item, cons( Int( row ), Int( ++ col ) ) ); } But ... I guess my problem is the lack of functional programming tools in the C language, which I already knew, and is nobody's fault but my own. But I'm not happy with the static variables for row and col. I don't have monadic sequencing to help route state through my function graphs. But ... can I extract the "position counting" part out and do it by zipping the input stream with an iota stream to provide counting? This feels like the right direction, but I'm not sure how to reset the column counter when a newline is recognized. Has anyone navigated these weeds before and blazed any trails?