Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end
From: Kaz Kylheku <kaz@kylheku.com>
Newsgroups: comp.compilers
Subject: Re: Expected Token Density in Random Stream
Date: Sun, 11 Dec 2011 17:56:42 +0000 (UTC)
Organization: A noiseless patient Spider
Lines: 11
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <11-12-016@comp.compilers>
References: <11-12-015@comp.compilers>
NNTP-Posting-Host: news.iecc.com
X-Trace: leila.iecc.com 1323639755 69024 64.57.183.58 (11 Dec 2011 21:42:35 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Sun, 11 Dec 2011 21:42:35 +0000 (UTC)
Keywords: parse
Posted-Date: 11 Dec 2011 16:42:34 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: x330-a1.tempe.blueboxinc.net comp.compilers:388

On 2011-12-07, Andrew Tomazos <andrew@tomazos.com> wrote:
> Summary: We want to find out how often a given token appears in a
> random stream formed by concatenating randomly chosen strings from a
> given set of strings.

> (Note hits can overlap each other)

But tokens do not overlap, so you're not actually extracting tokens.  Using C
tokens as an example, the C token >>= is one hit, not four.  The longest match
calls for extracting three characters and moving on.