Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Andy <borucki.andrzej@gmail.com>
Newsgroups: comp.compilers
Subject: Fragments
Date: Sat, 21 Dec 2019 01:52:26 -0800 (PST)
Organization: Compilers Central
Lines: 10
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <19-12-013@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="81191"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 21 Dec 2019 13:10:24 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2394

In examples is usually used very small alphabet: 3 to 5 letters but in
lexical analysing is not only Ascii but many thousands of Unicode.
Many chars are grouped by the same action: for example digits->a
letter->b whitepsaces->c
We can use "fragments" [A-Za-z], [0-9] instead of alone letters.
Problem that fragments not always are disjoint: digits and all chars, letters and letter 'a', etc.

How to handle with not disjoint fragments? on input we get regular
expression in Posix standard and we want make DFA with a few
transitions.