Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Johann 'Myrkraverk' Oskarsson Newsgroups: comp.compilers Subject: Spell checking identifiers Date: Wed, 24 Jun 2020 01:38:11 +0800 Organization: Easynews - www.easynews.com Lines: 41 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-06-010@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="10774"; mail-complaints-to="abuse@iecc.com" Keywords: lex, errors, question, comment Posted-Date: 23 Jun 2020 14:40:37 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-GB Xref: csiph.com comp.compilers:2531 Dear c.compilers, While experimenting with Rust, I came across this suggestion. --> foo.rs:5:9 | 5 | return j; // the variable, not the type. | ^ help: a local variable with a similar name exists: `i` Here it is suggesting i where I typed j. This is the same problem as spell checking identifiers with fuzzy matching, so apologies for a po- tentially misleading subject. So, without going through the source of rustc to find out, I'm curious about what general techniques people use to make this work? In particu- lar the Damerau–Levenshtein distance algorithm is not appropriate for dictionary lookups, as far as I know. I've come across a survey of fuzzy matching algorithms, some of which work with dictionaries but I have no idea which data structures would be appropriate in a compiler, nor do I know what criteria I'd use to choose an appropriate algorithm from such a survey. As an added bonus, the same technique can of course be used to spell check identifiers against a natural language dictionary. But since such a dictionary is more static than the list of identifiers in the current source file, a precomputed database will work, and a more expensive indexing method can be used. Is there an indexing method that works for this, but would not be appropriate for fuzzy matching against identifiers? [Apologies for not responding to my other topic yet, I should be able to reply soon.] -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk [There's a vast amount of work on edit distance. My guess is they use something like Levenshtein, but rather than use a constant distance of 1 between different letters, the distance varies depending on how different the letters look. -John]