Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3047 > unrolled thread
| Started by | Roger L Costello <costello@mitre.org> |
|---|---|
| First post | 2022-06-06 10:48 +0000 |
| Last post | 2022-06-06 12:25 -0700 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.compilers
Re: State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-06 10:48 +0000
Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-06 10:03 -0700
Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-06 12:25 -0700
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2022-06-06 10:48 +0000 |
| Subject | Re: State-of-the-art algorithms for lexical analysis? |
| Message-ID | <22-06-009@comp.compilers> |
gah4 wrote: > Pattern Specification Language (PSL) is > much more powerful than the usual > regular expression. Neat! > I suspect that if regexes hadn't previously > been defined, we might come up with > something different today. Wow! That is a remarkable statement. I will look into PSL. There are algorithms for converting regexes to DFA and then using the DFA to tokenize the input. Are there algorithms for converting PSL to (what?) and then using the (what?) to tokenize the input? /Roger
[toc] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-06-06 10:03 -0700 |
| Message-ID | <22-06-011@comp.compilers> |
| In reply to | #3047 |
On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote: (snip) > I will look into PSL. There are algorithms for converting regexes to DFA > and then using the DFA to tokenize the input. Are there algorithms for > converting PSL to (what?) and then using the (what?) to tokenize the input? The approximate searches are done using dynamic programming. The penalty is 1 for insertion, deletion, or substitution and the score is in 3 bits, so up to six spelling errors. The whole query is then compiled into code for a systolic array, which then runs as fast as the data comes off disk. FDF2 is a 9U VME board that runs in a VME based Sun system. FDF3 connects directly to a SCSI disk, and also to a Sun workstation. In searching, it transfers directly from the disk. To load data into the disk, the disk is accessed indirectly through the FDF3. It is a desktop box, about the size of a large external SCSI disk. Some of it is described here: https://aclanthology.org/X93-1011.pdf along with its use for searching Japanese text, and: https://trec.nist.gov/pubs/trec3/papers/paper.ps.gz
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-06-06 12:25 -0700 |
| Message-ID | <22-06-014@comp.compilers> |
| In reply to | #3047 |
On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote: (snip, I wrote) > > I suspect that if regexes hadn't previously > > been defined, we might come up with > > something different today. > Wow! That is a remarkable statement. Well, mostly, regex were defined based on what was reasonable to do on computers at the time. It seems reasonable, then, with the more powerful computers of today, to expect that more features would have been added. Some of that was done in the later ERE, Extended Regular Expression. But there is a strong tendency not to break backward compatibility, and so not add new features later. [See my note about DFAs a few messages back. EREs are just syntactic sugar on regular REs so sure. PCREs are swell but they are a lot slower since backreferences mean you need to be able to back up. -John]
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web