Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Johann 'Myrkraverk' Oskarsson Newsgroups: comp.compilers Subject: Re: Spell checking identifiers Date: Thu, 25 Jun 2020 22:33:37 +0800 Organization: Easynews - www.easynews.com Lines: 45 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-06-019@comp.compilers> References: <20-06-010@comp.compilers> <20-06-011@comp.compilers> <20-06-012@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42502"; mail-complaints-to="abuse@iecc.com" Keywords: lex, errors Posted-Date: 25 Jun 2020 11:54:47 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <20-06-012@comp.compilers> Content-Language: en-GB Xref: csiph.com comp.compilers:2540 On 24/06/2020 7:51 am, gah4@u.washington.edu wrote: > On Tuesday, June 23, 2020 at 12:59:35 PM UTC-7, Johann 'Myrkraverk' Oskarsson wrote: > > (snip) > >> This clang blog specifically mentions Levenshtein, > >> http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker > >> and it looks like what people do is to go through the entire symbol >> table and compute it against the individual erroneous identifier. > >> I thought that'd be a bit on the expensive side, > > With either constant weighting or character dependent weighting > it is easy to do with dynamic programming. The time is then O(m n) > where m and n are the two lengths. Are you talking about doing this one by one through the entire symbol table? > It seems most obvious to do only variable that are in the appropriate > scope to be misspelled, but I suspect catching variables used out > of scope is also worth doing. Well, in the latter case, you could > hope that they at least spell them the same. Depending on context, one would also want to do this for type names (as per the blog above). Depending on the language* and culture**, there can be thousands of type names in scope. > I think you should turn it off for one character names, though, > even though I suspect those are more likely. Too many false > positives! rustc obviously does this for one character names, at least in the case for i and j. I don't know if it's useful to compare a and k. * C++ and Java come to mind. ** Programming culture, some of them have a name such as Agile, and eXtreme Programming; others don't have a name. -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk