Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9822
| From | porkchop@invalid.foo (Mike Sanders) |
|---|---|
| Newsgroups | comp.lang.awk |
| Subject | Re: (Long post) Metaphone Algorithm In AWK |
| Date | 2024-08-23 06:13 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <va999t$qg9k$1@dont-email.me> (permalink) |
| References | <v9qbgh$1u7qe$1@dont-email.me> |
Final iteration (for me). Please post any improvements in this group.
# Metaphone Algorithm in AWK v6: Michael Sanders - 2024
# usage example: echo poluphloisboiotatotic | awk -f metaphone.awk
# valid output should be: POLUPHLOISBOIOTATOTIC : PLFLSBTTTK
# see also: https://en.wikipedia.org/wiki/Metaphone
{ print $0 " : " metaphone($0) }
function isvowel(char) { return char ~ /[AEIOU]/ }
function metaphone(w, m, c, n, z, i) {
w = toupper(w)
# strip non-alphabetic characters
gsub(/[^A-Z]/, "", w)
z = length(w)
# handle initial letters
if (substr(w, 1, 2) ~ /^(KN|GN|PN|WR|PS)/) {
w = substr(w, 2)
z--
}
for (i = 1; i <= z; i++) {
c = substr(w, i, 1)
n = (i < z) ? substr(w, i + 1, 1) : ""
# skip duplicate letters except for 'C'
if (i > 1 && c == substr(w, i - 1, 1) && c != "C") continue
# handle vowels: retain only if it's the 1st letter
if (isvowel(c)) {
if (i == 1) m = m c
}
# consonants...
else if (c == "B") {
if (!(i == z && substr(w, i - 1, 1) == "M")) m = m "B"
}
else if (c == "C") {
if (substr(w, i, 2) == "CH") {
m = m "X"
i++
} else if (substr(w, i, 2) ~ /^(CI|CE|CY)/) {
m = m "S"
} else {
m = m "K"
}
}
else if (c == "D") {
if (substr(w, i, 2) == "DG" && isvowel(substr(w, i + 2, 1))) {
m = m "J"
i += 2
} else {
m = m "T"
}
}
else if (c == "F") { # Handling for 'F'
m = m "F"
}
else if (c == "G") {
if (substr(w, i, 2) == "GH" && (i == 1 || !isvowel(substr(w, i - 1, 1)))) {
i++
} else if (substr(w, i, 2) == "GN" || (i == z && c == "G")) {
continue
} else if (substr(w, i, 3) ~ /^(GIA|GIE|GEY)/) {
m = m "J"
} else {
m = m "K"
}
}
else if (c == "H") {
if (i == 1 || substr(w, i - 1, 1) !~ /[CSPTG]/) {
if (i < z && !isvowel(n)) {
m = m "H"
}
}
}
else if (c == "K") {
if (i == 1 || substr(w, i - 1, 1) != "C") m = m "K"
}
else if (c == "P") {
if (substr(w, i, 2) == "PH") {
m = m "F"
i++
} else {
m = m "P"
}
}
else if (c == "Q") {
m = m "K"
}
else if (c == "R") {
if (i == 1 || isvowel(substr(w, i - 1, 1))) {
m = m "R"
}
}
else if (c == "S") {
if (substr(w, i, 2) == "SH") {
m = m "X"
i++
} else if (substr(w, i, 3) == "TIA" || substr(w, i, 3) == "TIO") {
m = m "X"
i += 2
} else {
m = m "S"
}
}
else if (c == "T") {
if (substr(w, i, 2) == "TH") {
m = m "0" # add '0' for 'TH' digraph to distinguish from regular 'T'
i++
} else if (substr(w, i, 3) == "TIA" || substr(w, i, 3) == "TIO") {
m = m "X"
i += 2
} else {
m = m "T"
}
}
else if (c == "V") {
m = m "F"
}
else if (c == "W" || c == "Y") {
if (i < z && isvowel(n)) m = m c
}
else if (c == "X") {
m = m "KS"
}
else if (c == "Z") {
m = m "S"
}
# handle M, N, L, J, G more generally
else if (c == "M" || c == "N" || c == "L" || c == "J" || c == "G") {
m = m c
}
}
return m
}
# eof
--
:wq
Mike Sanders
Back to comp.lang.awk | Previous | Next — Previous in thread | Find similar
(Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-17 14:18 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-19 00:46 +0100
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-19 02:15 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-19 03:22 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-19 04:34 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-20 05:45 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-21 00:58 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 01:07 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 02:50 +0000
Re: (Long post) Metaphone Algorithm In AWK Ben Bacarisse <ben@bsb.me.uk> - 2024-08-21 09:15 +0100
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 19:13 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-20 11:33 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 02:42 +0000
AWK language trivia (Was: (Long post) Metaphone Algorithm In AWK) gazelle@shell.xmission.com (Kenny McCormack) - 2024-08-21 03:13 +0000
Re: AWK language trivia porkchop@invalid.foo (Mike Sanders) - 2024-08-21 05:32 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-21 19:03 +0000
Re: (Long post) Metaphone Algorithm In AWK porkchop@invalid.foo (Mike Sanders) - 2024-08-23 06:13 +0000
csiph-web