Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED!nerds-end
From: Hans-Peter Diettrich <DrDiettrich1@aol.com>
Newsgroups: comp.compilers
Subject: Re: lexer speed, was Bison
Date: Mon, 20 Aug 2012 01:01:53 +0100
Organization: Compilers Central
Lines: 24
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <12-08-008@comp.compilers>
References: <12-08-005@comp.compilers> <12-08-006@comp.compilers>
NNTP-Posting-Host: news.iecc.com
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: leila.iecc.com 1345419801 60318 64.57.183.58 (19 Aug 2012 23:43:21 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Sun, 19 Aug 2012 23:43:21 +0000 (UTC)
Keywords: parse, performance
Posted-Date: 19 Aug 2012 19:43:21 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:725

> [Compilers spend a lot of time in the lexer, because that's the only
> phase that has to look at the input one character at a time. -John]

When the source code resides in a memory buffer, the time for reading
e.g. the characters of an identifier (in the lexer) is neglectable vs.
the time spent in lookup and entering the identifier into a symbol table
(in the parser).

Even if a lexer reads single characters from a file, most OSs maintain
their own file buffer, so that little overhead is added over the
program-buffered solution.

I really would like to see some current benchmarks about the behaviour
of current compilers and systems.

DoDi
[The benchmarks I did were a while ago, but they showed a large
fraction of time in the lexer.  I wouldn't disagree that building the
symbol table is slow, but figure out some estimate of the ratio of
the number of characters in a source file to the number of tokens,
and that is a rough estimate of how much slower the lexer will be
than the parser. I agree that some current analyses would be useful.
-John]