Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2532

Re: Spell checking identifiers

From Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups comp.compilers
Subject Re: Spell checking identifiers
Date 2020-06-24 03:56 +0800
Organization Easynews - www.easynews.com
Message-ID <20-06-011@comp.compilers> (permalink)
References <20-06-010@comp.compilers>

Show all headers | View raw


> [There's a vast amount of work on edit distance.  My guess is they
> use something like Levenshtein, but rather than use a constant
> distance of 1 between different letters, the distance varies depending
> on how different the letters look. -John]

This clang blog specifically mentions Levenshtein,


http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker

and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.

I thought that'd be a bit on the expensive side, because C++ files
can have 100k+ (or millions?) of lines after preprocessing, so one
translation unit really can go up to million identifiers in practice.
[I don't know if that actually happens but I don't think it's safe
to assume it doesn't.]

In the 10 years since, people may have changed from standard Levenshtein
as you mention.

But then, maybe compilation speed for erroneous input isn't really
important.  rustc is slow for a short input file in both cases [which
could be the startup cost.]

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 01:38 +0800
  Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 03:56 +0800
    Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-23 16:51 -0700
      Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 22:33 +0800
  Re: Spell checking identifiers "Derek M. Jones" <derek@_NOSPAM_knosof.co.uk.invalid> - 2020-06-24 11:02 +0100
    Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-24 18:28 -0700
      Re: Spell checking identifiers mac <acolvin@efunct.com> - 2020-07-09 16:07 +0000
        Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-07-10 07:12 +0000
          Re: Spell checking identifiers gah4@u.washington.edu - 2020-07-10 13:17 -0700
  Re: Spell checking identifiers Kaz Kylheku <937-053-0959@kylheku.com> - 2020-06-24 18:12 +0000
    Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-06-24 20:08 +0000
      Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 21:44 +0800
  Re: Spell checking identifiers gautier_niouzes@hotmail.com - 2020-06-24 13:08 -0700

csiph-web