Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9820
| Path | csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | porkchop@invalid.foo (Mike Sanders) |
| Newsgroups | comp.lang.awk |
| Subject | Longest Common Substring |
| Date | Fri, 23 Aug 2024 04:39:59 -0000 (UTC) |
| Organization | A noiseless patient Spider |
| Lines | 57 |
| Sender | Mike Sanders <busybox@sdf.org> |
| Message-ID | <va93qv$pome$1@dont-email.me> (permalink) |
| Injection-Date | Fri, 23 Aug 2024 06:39:59 +0200 (CEST) |
| Injection-Info | dont-email.me; posting-host="b537a019881267c51ad5afadf5b95d49"; logging-data="844494"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DS7a8XEF+lFOAZtVW1z8k" |
| User-Agent | tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64)) |
| Cancel-Lock | sha1:Pf32RaD6eKe8ijP8H0RYqPAb9cY= |
| Xref | csiph.com comp.lang.awk:9820 |
Show key headers only | View raw
# largest q-gram (longest common substring) in AWK: Michael Sanders - 2024
# A q-gram is a contiguous sequence of characters of fixed length
# see also: https://en.wikipedia.org/wiki/N-gram
BEGIN {
str1 = "MAWK"
str2 = "gawk"
q = largest_qgram(str1, str2)
l = length(q)
print str1 " : " str2
if (length(q) > 0) {
printf("largest q-gram (longest common substring): \"%s\" size: %d\n", q, l)
} else {
printf("no common substring found\n")
}
}
# find largest q-gram (longest common substring) between 2 words
function largest_qgram(str1, str2, i, j, maxlen, largest, tmp) {
maxlen = 0
largest = ""
# for convenience's sake
str1 = tolower(str1)
str2 = tolower(str2)
# loop over each character of 1st string
for (i = 1; i <= length(str1); i++) {
# loop over each character of 2nd string
for (j = 1; j <= length(str2); j++) {
tmp = ""
# check for matching substrings at positions i and j
while (substr(str1, i + length(tmp), 1) == substr(str2,
j + length(tmp), 1) && i + length(tmp) <=
length(str1) && j + length(tmp) <= length(str2)) {
tmp = tmp substr(str1, i + length(tmp), 1)
}
# if matching substring is longer than previous maximum
if (length(tmp) > maxlen) {
maxlen = length(tmp)
largest = tmp
}
}
}
return largest
}
# eof
--
:wq
Mike Sanders
Back to comp.lang.awk | Previous | Next | Find similar
Longest Common Substring porkchop@invalid.foo (Mike Sanders) - 2024-08-23 04:39 +0000
csiph-web