Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end From: Kaz Kylheku Newsgroups: comp.compilers Subject: Re: Expected Token Density in Random Stream Date: Sun, 11 Dec 2011 17:56:42 +0000 (UTC) Organization: A noiseless patient Spider Lines: 11 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-12-016@comp.compilers> References: <11-12-015@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: leila.iecc.com 1323639755 69024 64.57.183.58 (11 Dec 2011 21:42:35 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 11 Dec 2011 21:42:35 +0000 (UTC) Keywords: parse Posted-Date: 11 Dec 2011 16:42:34 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:388 On 2011-12-07, Andrew Tomazos wrote: > Summary: We want to find out how often a given token appears in a > random stream formed by concatenating randomly chosen strings from a > given set of strings. > (Note hits can overlap each other) But tokens do not overlap, so you're not actually extracting tokens. Using C tokens as an example, the C token >>= is one hit, not four. The longest match calls for extracting three characters and moving on.