Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED!nerds-end From: Hans-Peter Diettrich Newsgroups: comp.compilers Subject: Re: lexer speed, was Bison Date: Tue, 21 Aug 2012 07:40:25 +0100 Organization: Compilers Central Lines: 49 Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <12-08-012@comp.compilers> References: <12-08-005@comp.compilers> <12-08-006@comp.compilers> <12-08-008@comp.compilers> <12-08-011@comp.compilers> NNTP-Posting-Host: news.iecc.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: leila.iecc.com 1345597039 14621 64.57.183.58 (22 Aug 2012 00:57:19 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Wed, 22 Aug 2012 00:57:19 +0000 (UTC) Keywords: performance Posted-Date: 21 Aug 2012 20:57:19 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:729 BGB schrieb: > 1: actually, my C parser uses a pretty big hash for type-name lookup, > namely 16k entry IIRC, whereas for most other things 1024 or 4096 entry > is plenty sufficient (for a chain-hash). The global (top level) symbol table can become very big, in detail in C compilers. A separate table for type names is not required. > the main reason for this is the > large number of typedefs in headers like "windows.h", which can easily > saturate a 1024 or 4096 entry hash. Right. AFAIR I added capabilities to skip already processed header files, at least when a guard is detected. Such (not fully conforming) behaviour can influence the time spent with loading the same files moreoften. > I have used bigger hash tables though, namely for things like interning > strings (64k) and dynamic+interface method-dispatch (256k, but this one > is an open-address hash). > > >> DoDi >> [The benchmarks I did were a while ago, but they showed a large >> fraction of time in the lexer. I wouldn't disagree that building the >> symbol table is slow, but figure out some estimate of the ratio of >> the number of characters in a source file to the number of tokens, >> and that is a rough estimate of how much slower the lexer will be >> than the parser. I agree that some current analyses would be useful. >> -John] >> > > yep. > > can't compare exactly, as my parsers tend to be recursive-descent and > build ASTs directly. The same for my compilers. My C to Pascal converter uses an LL(1) parser, with only one exception where LL(2) lookahead is required. Expressions are parsed differently, using an table-driven parser for operator precedence and associativity. The preprocessor is implemented in a couple of filter stages between the lexer and parser. P.S.: When the time for loading source files can be separated from other lexer tasks, I'd expect that this operation takes most of the time. DoDi