Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2531

Spell checking identifiers

From Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups comp.compilers
Subject Spell checking identifiers
Date 2020-06-24 01:38 +0800
Organization Easynews - www.easynews.com
Message-ID <20-06-010@comp.compilers> (permalink)

Show all headers | View raw


Dear c.compilers,

While experimenting with Rust, I came across this suggestion.

  --> foo.rs:5:9
   |
5 |     return j; // the variable, not the type.
   |            ^ help: a local variable with a similar name exists: `i`

Here it is suggesting i where I typed j.  This is the same problem as
spell checking identifiers with fuzzy matching, so apologies for a po-
tentially misleading subject.

So, without going through the source of rustc to find out, I'm curious
about what general techniques people use to make this work?  In particu-
lar the Damerau–Levenshtein distance algorithm is not appropriate for
dictionary lookups, as far as I know.

I've come across a survey of fuzzy matching algorithms, some of which
work with dictionaries but I have no idea which data structures would
be appropriate in a compiler, nor do I know what criteria I'd use to
choose an appropriate algorithm from such a survey.

As an added bonus, the same technique can of course be used to spell
check identifiers against a natural language dictionary.  But since
such a dictionary is more static than the list of identifiers in the
current source file, a precomputed database will work, and a more
expensive indexing method can be used.  Is there an indexing method
that works for this, but would not be appropriate for fuzzy matching
against identifiers?

[Apologies for not responding to my other topic yet, I should be able
to reply soon.]

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
[There's a vast amount of work on edit distance.  My guess is they
use something like Levenshtein, but rather than use a constant
distance of 1 between different letters, the distance varies depending
on how different the letters look. -John]

Back to comp.compilers | Previous | NextNext in thread | Find similar


Thread

Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 01:38 +0800
  Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 03:56 +0800
    Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-23 16:51 -0700
      Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 22:33 +0800
  Re: Spell checking identifiers "Derek M. Jones" <derek@_NOSPAM_knosof.co.uk.invalid> - 2020-06-24 11:02 +0100
    Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-24 18:28 -0700
      Re: Spell checking identifiers mac <acolvin@efunct.com> - 2020-07-09 16:07 +0000
        Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-07-10 07:12 +0000
          Re: Spell checking identifiers gah4@u.washington.edu - 2020-07-10 13:17 -0700
  Re: Spell checking identifiers Kaz Kylheku <937-053-0959@kylheku.com> - 2020-06-24 18:12 +0000
    Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-06-24 20:08 +0000
      Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 21:44 +0800
  Re: Spell checking identifiers gautier_niouzes@hotmail.com - 2020-06-24 13:08 -0700

csiph-web