Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9821
| From | porkchop@invalid.foo (Mike Sanders) |
|---|---|
| Newsgroups | comp.lang.awk |
| Subject | Soundex Algorithm in AWK |
| Date | 2024-08-23 05:11 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <va95m5$q367$1@dont-email.me> (permalink) |
# Soundex Algorithm in AWK: Michael Sanders 2024
# example usage: awk -f soundex.awk < words.txt
# see also: https://en.wikipedia.org/wiki/Soundex
{ print $0 " : " soundex($0) }
function soundex(word, i, code, c, firstLetter, lastCode, buf) {
word = toupper(word) # convert word to uppercase
firstLetter = substr(word, 1, 1)
code = buf = ""
# map of letters to soundex digits
for (i = 2; i <= length(word); i++) {
c = substr(word, i, 1)
if (c ~ /[BFPV]/) code = "1"
else if (c ~ /[CGJKQSXZ]/) code = "2"
else if (c ~ /[DT]/) code = "3"
else if (c ~ /[L]/) code = "4"
else if (c ~ /[MN]/) code = "5"
else if (c ~ /[R]/) code = "6"
else code = "" # skip A, E, I, O, U, H, W, Y
# ignore consecutive identical codes
if (code != lastCode && code != "") {
buf = buf code
lastCode = code
}
}
# combine 1st letter with buf, pad with zeros or truncate to 4 characters
return substr(firstLetter buf "000", 1, 4)
}
# eof
--
:wq
Mike Sanders
Back to comp.lang.awk | Previous | Next | Find similar
Soundex Algorithm in AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-23 05:11 +0000
csiph-web