Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups: comp.compilers
Subject: Re: Spell checking identifiers
Date: Thu, 25 Jun 2020 22:33:37 +0800
Organization: Easynews - www.easynews.com
Lines: 45
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-06-019@comp.compilers>
References: <20-06-010@comp.compilers> <20-06-011@comp.compilers> <20-06-012@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42502"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, errors
Posted-Date: 25 Jun 2020 11:54:47 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <20-06-012@comp.compilers>
Content-Language: en-GB
Xref: csiph.com comp.compilers:2540

On 24/06/2020 7:51 am, gah4@u.washington.edu wrote:
> On Tuesday, June 23, 2020 at 12:59:35 PM UTC-7, Johann 'Myrkraverk' Oskarsson wrote:
>
> (snip)
>
>> This clang blog specifically mentions Levenshtein,
>
>> http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker
>
>> and it looks like what people do is to go through the entire symbol
>> table and compute it against the individual erroneous identifier.
>
>> I thought that'd be a bit on the expensive side,
>
> With either constant weighting or character dependent weighting
> it is easy to do with dynamic programming. The time is then O(m n)
> where m and n are the two lengths.

Are you talking about doing this one by one through the entire symbol
table?

> It seems most obvious to do only variable that are in the appropriate
> scope to be misspelled, but I suspect catching variables used out
> of scope is also worth doing.  Well, in the latter case, you could
> hope that they at least spell them the same.

Depending on context, one would also want to do this for type names (as
per the blog above).  Depending on the language* and culture**, there
can be thousands of type names in scope.

> I think you should turn it off for one character names, though,
> even though I suspect those are more likely. Too many false
> positives!

rustc obviously does this for one character names, at least in the
case for i and j.  I don't know if it's useful to compare a and k.

* C++ and Java come to mind.

** Programming culture, some of them have a name such as Agile, and
eXtreme Programming; others don't have a name.

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk