Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2532
| Path | csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> |
| Newsgroups | comp.compilers |
| Subject | Re: Spell checking identifiers |
| Date | Wed, 24 Jun 2020 03:56:56 +0800 |
| Organization | Easynews - www.easynews.com |
| Lines | 29 |
| Sender | news@iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <20-06-011@comp.compilers> (permalink) |
| References | <20-06-010@comp.compilers> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=utf-8; format=flowed |
| Content-Transfer-Encoding | 8bit |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42091"; mail-complaints-to="abuse@iecc.com" |
| Keywords | lex, errors |
| Posted-Date | 23 Jun 2020 15:59:33 EDT |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| In-Reply-To | <20-06-010@comp.compilers> |
| Content-Language | en-GB |
| Xref | csiph.com comp.compilers:2532 |
Show key headers only | View raw
> [There's a vast amount of work on edit distance. My guess is they > use something like Levenshtein, but rather than use a constant > distance of 1 between different letters, the distance varies depending > on how different the letters look. -John] This clang blog specifically mentions Levenshtein, http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker and it looks like what people do is to go through the entire symbol table and compute it against the individual erroneous identifier. I thought that'd be a bit on the expensive side, because C++ files can have 100k+ (or millions?) of lines after preprocessing, so one translation unit really can go up to million identifiers in practice. [I don't know if that actually happens but I don't think it's safe to assume it doesn't.] In the 10 years since, people may have changed from standard Levenshtein as you mention. But then, maybe compilation speed for erroneous input isn't really important. rustc is slow for a short input file in both cases [which could be the startup cost.] -- Johann | email: invalid -> com | www.myrkraverk.com/blog/ I'm not from the Internet, I just work there. | twitter: @myrkraverk
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 01:38 +0800
Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-24 03:56 +0800
Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-23 16:51 -0700
Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 22:33 +0800
Re: Spell checking identifiers "Derek M. Jones" <derek@_NOSPAM_knosof.co.uk.invalid> - 2020-06-24 11:02 +0100
Re: Spell checking identifiers gah4@u.washington.edu - 2020-06-24 18:28 -0700
Re: Spell checking identifiers mac <acolvin@efunct.com> - 2020-07-09 16:07 +0000
Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-07-10 07:12 +0000
Re: Spell checking identifiers gah4@u.washington.edu - 2020-07-10 13:17 -0700
Re: Spell checking identifiers Kaz Kylheku <937-053-0959@kylheku.com> - 2020-06-24 18:12 +0000
Re: Spell checking identifiers Thomas Koenig <tkoenig@netcologne.de> - 2020-06-24 20:08 +0000
Re: Spell checking identifiers Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> - 2020-06-25 21:44 +0800
Re: Spell checking identifiers gautier_niouzes@hotmail.com - 2020-06-24 13:08 -0700
csiph-web