Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9795
| From | Ben Bacarisse <ben@bsb.me.uk> |
|---|---|
| Newsgroups | comp.lang.awk |
| Subject | Re: (Long post) Metaphone Algorithm In AWK |
| Date | 2024-08-19 00:46 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <878qwts8bd.fsf@bsb.me.uk> (permalink) |
| References | <v9qbgh$1u7qe$1@dont-email.me> |
porkchop@invalid.foo (Mike Sanders) writes:
> Hi folks, hope you all are doing well.
>
> Please excuse long post, wanted to share this, some might find
> it handy given a certain context. Must run, I'm very behind in
> my work (hey I'm always running behind!)
Using a word list, I found some odd matches. For example:
$ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
drunkeness
indigestion
Are these really metaphone matches for "texas"? It's possible (I don't
know the algorithm at all well) but I found it surprising.
> # metaphone.awk: Michael Sanders - 2024
> #
> # example invocation:
> #
> # echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
> #
> # notes:
> #
> # ever notice when you search for (say):
> #
> # 'i went to the zu'
> #
> # & your chosen search engine suggests something like:
> #
> # 'did you mean i went to the zoo'
> #
> # the metaphone algorithm handles such cases pretty well actually...
> #
> # Metaphone is a phonetic algorithm, published by Lawrence Philips in
> # 1990, for indexing words by their English pronunciation. It
> # fundamentally improves on the Soundex algorithm by using information
> # about variations and inconsistencies in English spelling and
> # pronunciation to produce a more accurate encoding, which does a
> # better job of matching words and names which sound similar.
> # https://en.wikipedia.org/wiki/Metaphone
> #
> # english only (sorry)
> #
> # not extensively tested, nevertheless a solid start, if you
> # improve this code please share your results
> #
> # other implentations...
> #
> # gist: https://gist.github.com/Rostepher/b688f709587ac145a0b3
> #
> # BASIC: http://aspell.net/metaphone/metaphone.basic
> #
> # C: http://aspell.net/metaphone/metaphone-kuhn.txt
I wanted a "reference" implementation I could try, but this is not a
useful C program. It's in a odd dialect (it uses void but has K&R
function definitions) and has loads of undefined behaviours (strcpy of
overlapping strings, use if uninitialised variables etc).
> # check if a character is a vowel
> function isvowel(c, is_vowel) {
> is_vowel = c ~ /[AEIOU]/
> return is_vowel
> }
I was not going to comment on the code, but this hit me just before I
posted. Given the odd way AWK functions have to define locals, I tend
to use them only when really needed. Here I think I would just write
function isvowel(c) {
return c ~ /[AEIOU]/
}
--
Ben.
Back to comp.lang.awk | Previous | Next — Previous in thread | Next in thread | Find similar
(Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-17 14:18 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-19 00:46 +0100
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-19 02:15 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-19 03:22 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-19 04:34 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-20 05:45 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-21 00:58 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 01:07 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 02:50 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-21 09:15 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 19:13 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-20 11:33 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 02:42 +0000
AWK language trivia (Was: (Long post) Metaphone Algorithm In AWK) gazelle@shell.xmission.com (Kenny McCormack) - 2024-08-21 03:13 +0000
Re: AWK language trivia porkchop@invalid.foo (Mike Sanders) - 2024-08-21 05:32 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 19:03 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-23 06:13 +0000
csiph-web