Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Andy Newsgroups: comp.compilers Subject: Fragments Date: Sat, 21 Dec 2019 01:52:26 -0800 (PST) Organization: Compilers Central Lines: 10 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-12-013@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="81191"; mail-complaints-to="abuse@iecc.com" Keywords: lex Posted-Date: 21 Dec 2019 13:10:24 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2394 In examples is usually used very small alphabet: 3 to 5 letters but in lexical analysing is not only Ascii but many thousands of Unicode. Many chars are grouped by the same action: for example digits->a letter->b whitepsaces->c We can use "fragments" [A-Za-z], [0-9] instead of alone letters. Problem that fragments not always are disjoint: digits and all chars, letters and letter 'a', etc. How to handle with not disjoint fragments? on input we get regular expression in Posix standard and we want make DFA with a few transitions.