Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Koichi Murase Newsgroups: gnu.bash.bug Subject: Re: [PATCH] Implement rehashing for associative arrays (Re: speeding up hash_search?) Date: Mon, 20 Apr 2020 18:48:44 +0900 Lines: 92 Approved: bug-bash@gnu.org Message-ID: References: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="000000000000b2977505a3b5ce97" X-Trace: usenet.stanford.edu 1587376156 25277 209.51.188.17 (20 Apr 2020 09:49:16 GMT) X-Complaints-To: action@cs.stanford.edu Cc: bug-bash@gnu.org To: George Jones Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rybR0ri+P2tFVBSkwjZVLgqFb9RdCUjDbyQl57LZekQ=; b=jgAdJuf9QkkZ10qFVJFfFeuCWFTwyGZw1/4Yu9A+6KsMitO5ukTwFusg8a0OUryuVb O44slEqHjCadyPsNK/wbu2SS8RclRlcjsJ3pVsGS39h53i4l8mQeEJTTRjTXMVFXzg4i 1MRb1WFrJ2MfY/mLhwuvT7uu78gSPLHVDuXOChcRHxpz8Ql+Ct4kDWkvP/BLrbkSwMXn D6oSgXZ9Tg1yHm9sM82R9mcEJvWxlaGCPp4FJoCnA06GM0DGJrGmIi9+cew+bbjnHlJP 86k7WVkIgVu9G6p6yiuAETa3l8+q5tgUnxfaNmeE7VLywVcGrLobDDOY5ehsPG/yJ0T+ SWbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rybR0ri+P2tFVBSkwjZVLgqFb9RdCUjDbyQl57LZekQ=; b=OpOl/R6ENffDRY+ZxpPW0YGUFjstC1HNIRIArV0lWdZiVrUuv3ZYNGGWOvAVQnV4hH ghso7xJDI0qWkpaGic3QKGnrPrwOCbGMXVf4/s9HIO3wjQHXNkG9Mp5Ik3CYFF+zJADD ivGwsfVjYZy9aXQfJVykJ82PRxVsGLlFzf+IJTKcXtHMUVa3tJH4jfqukzK1AMh6Il4J 2Ksvewa92f0cYSwW3It5jH0p3MbVUUiJKJXlsEbagS7jnnJEA2wpc7Xa/8UAZ0t6X8lM ThauQF1jm8y3P2RIyMQ3uJtbIIo2kt+IkjpW6GekblIGouDH/Eq2HMZHgUDgtAxqaor2 CQ4g== X-Gm-Message-State: AGi0Puar3RlpKjNVGrFwRA6JlEGxW7Xf86WiDPKHRtyUDN39Y1qLhHNJ VMDGaGhX7/gjB2nft9Vc1mZX8CUco+9lWhUKB68= X-Google-Smtp-Source: APiQypK5VmhjH8jfvGpLhkjwF5bWBmEC7tNv6EQdtpz60J865J3fcdCxuCu0XDLGYmVW9t3gBOY8z8atYkcuvtUpCqY= X-Received: by 2002:a2e:95d2:: with SMTP id y18mr5219656ljh.65.1587376135749; Mon, 20 Apr 2020 02:48:55 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2a00:1450:4864:20::242; envelope-from=myoga.murase@gmail.com; helo=mail-lj1-x242.google.com X-detected-operating-system: by eggs1p.gnu.org: Error: [-] PROGRAM ABORT : Malformed IPv6 address (bad octet value). Location : parse_addr6(), p0f-client.c:67 X-Received-From: 2a00:1450:4864:20::242 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: Xref: csiph.com gnu.bash.bug:16184 --000000000000b2977505a3b5ce97 Content-Type: text/plain; charset="UTF-8" 2020-04-20 10:00 George Jones : > Thank you. Patch applied and (performance) tested with come tests > I was working on > https://github.com/eludom/snippits/tree/master/bash/tests > .... bottom line: Thank you for the measurements. Also, I am sorry that I disturbed your plan for contributing to Bash. I actually initially doubted that the insertion with the current implementation is O(N), so I created the test first and then found that it is an easy fix rather than reimplementing it by B-tree or other data structures. I couldn't stop my interest in how much it is improved by the easy fix. Nevertheless, I have not tuned the parameters of rehashing. Actually it is a tradeoff between the memory consumption and the computational time, so it is a matter of preference to some extent. I attach an updated patch which exposes some parameters. If you have an interest, you can play by changing the value of the parameters `HASH_REHASH_MULTIPLIER' and `HASH_MAX_LOADFACTOR' defined in `hashlib.h'. -- Koichi --000000000000b2977505a3b5ce97 Content-Type: application/octet-stream; name="0001-hashlib-Implement-rehash.v2.patch" Content-Disposition: attachment; filename="0001-hashlib-Implement-rehash.v2.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_k98ao48w0 RnJvbSBmYzNjZDE5YTM4Mjk4YmE2NzMxZGZjNTNjZWNjNzkxMzA0MmM4MDgzIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBLb2ljaGkgTXVyYXNlIDxteW9nYS5tdXJhc2VAZ21haWwuY29t PgpEYXRlOiBNb24sIDIwIEFwciAyMDIwIDE4OjM3OjAyICswOTAwClN1YmplY3Q6IFtQQVRDSF0g aGFzaGxpYjogSW1wbGVtZW50IHJlaGFzaAoKLS0tCiBoYXNobGliLmMgfCA0MSArKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKwogaGFzaGxpYi5oIHwgIDIgKysKIDIgZmls ZXMgY2hhbmdlZCwgNDMgaW5zZXJ0aW9ucygrKQoKZGlmZiAtLWdpdCBhL2hhc2hsaWIuYyBiL2hh c2hsaWIuYwppbmRleCBmOGUzYjA5YS4uMzgzNGYzZDIgMTAwNjQ0Ci0tLSBhL2hhc2hsaWIuYwor KysgYi9oYXNobGliLmMKQEAgLTM5LDYgKzM5LDcgQEAKICNkZWZpbmUgSEFTSF9CVUNLRVQocywg dCwgaCkgKCgoaCkgPSBoYXNoX3N0cmluZyAocykpICYgKCh0KS0+bmJ1Y2tldHMgLSAxKSkKIAog c3RhdGljIEJVQ0tFVF9DT05URU5UUyAqY29weV9idWNrZXRfYXJyYXkgX19QKChCVUNLRVRfQ09O VEVOVFMgKiwgc2hfc3RyaW5nX2Z1bmNfdCAqKSk7CitzdGF0aWMgdm9pZCBoYXNoX3JlaGFzaCBf X1AoKEhBU0hfVEFCTEUgKikpOwogCiAvKiBNYWtlIGEgbmV3IGhhc2ggdGFibGUgd2l0aCBCVUNL RVRTIG51bWJlciBvZiBidWNrZXRzLiAgSW5pdGlhbGl6ZQogICAgZWFjaCBzbG90IGluIHRoZSB0 YWJsZSB0byBOVUxMLiAqLwpAQCAtMTA1LDYgKzEwNiw0MCBAQCBjb3B5X2J1Y2tldF9hcnJheSAo YmEsIGNwZGF0YSkKICAgcmV0dXJuIG5ld19idWNrZXQ7ICAKIH0KIAorc3RhdGljIHZvaWQKK2hh c2hfcmVoYXNoICh0YWJsZSkKKyAgICAgSEFTSF9UQUJMRSAqdGFibGU7Cit7CisgIGludCBvbGRf bmJ1Y2tldHMsIGksIGo7CisgIEJVQ0tFVF9DT05URU5UUyAqKm9sZF9idWNrZXRfYXJyYXksICpp dGVtLCAqbmV4dDsKKyAgdW5zaWduZWQgaW50IGh2OworCisgIGlmICh0YWJsZSA9PSBOVUxMIHx8 IHRhYmxlLT5uZW50cmllcyA+IElOVF9NQVggLyBIQVNIX1JFSEFTSF9NVUxUSVBMSUVSKSByZXR1 cm47CisKKyAgb2xkX25idWNrZXRzID0gdGFibGUtPm5idWNrZXRzOworICBvbGRfYnVja2V0X2Fy cmF5ID0gdGFibGUtPmJ1Y2tldF9hcnJheTsKKworICB0YWJsZS0+bmJ1Y2tldHMgPSB0YWJsZS0+ bmJ1Y2tldHMgKiBIQVNIX1JFSEFTSF9NVUxUSVBMSUVSOworICB0YWJsZS0+YnVja2V0X2FycmF5 ID0KKyAgICAoQlVDS0VUX0NPTlRFTlRTICoqKXhtYWxsb2MgKHRhYmxlLT5uYnVja2V0cyAqIHNp emVvZiAoQlVDS0VUX0NPTlRFTlRTICopKTsKKyAgZm9yIChpID0gMDsgaSA8IHRhYmxlLT5uYnVj a2V0czsgaSsrKQorICAgIHRhYmxlLT5idWNrZXRfYXJyYXlbaV0gPSAoQlVDS0VUX0NPTlRFTlRT ICopTlVMTDsKKworICBmb3IgKGogPSAwOyBqIDwgb2xkX25idWNrZXRzOyBqKyspCisgICAgewor ICAgICAgZm9yIChpdGVtID0gb2xkX2J1Y2tldF9hcnJheVtqXTsgaXRlbTsgaXRlbSA9IG5leHQp CisJeworCSAgbmV4dCA9IGl0ZW0tPm5leHQ7CisJICBpID0gSEFTSF9CVUNLRVQgKGl0ZW0tPmtl eSwgdGFibGUsIGh2KTsKKwkgIGl0ZW0tPmtoYXNoID0gaHY7CisJICBpdGVtLT5uZXh0ID0gdGFi bGUtPmJ1Y2tldF9hcnJheVtpXTsKKwkgIHRhYmxlLT5idWNrZXRfYXJyYXlbaV0gPSBpdGVtOwor CX0KKyAgICB9CisKKyAgZnJlZSAob2xkX2J1Y2tldF9hcnJheSk7Cit9CisKIEhBU0hfVEFCTEUg KgogaGFzaF9jb3B5ICh0YWJsZSwgY3BkYXRhKQogICAgICBIQVNIX1RBQkxFICp0YWJsZTsKQEAg LTE5OCw2ICsyMzMsOSBAQCBoYXNoX3NlYXJjaCAoc3RyaW5nLCB0YWJsZSwgZmxhZ3MpCiAKICAg aWYgKGZsYWdzICYgSEFTSF9DUkVBVEUpCiAgICAgeworICAgICAgaWYgKHRhYmxlLT5uZW50cmll cyA+PSB0YWJsZS0+bmJ1Y2tldHMgKiBIQVNIX01BWF9MT0FERkFDVE9SKQorCWhhc2hfcmVoYXNo ICh0YWJsZSk7CisKICAgICAgIGxpc3QgPSAoQlVDS0VUX0NPTlRFTlRTICopeG1hbGxvYyAoc2l6 ZW9mIChCVUNLRVRfQ09OVEVOVFMpKTsKICAgICAgIGxpc3QtPm5leHQgPSB0YWJsZS0+YnVja2V0 X2FycmF5W2J1Y2tldF07CiAgICAgICB0YWJsZS0+YnVja2V0X2FycmF5W2J1Y2tldF0gPSBsaXN0 OwpAQCAtMjY5LDYgKzMwNyw5IEBAIGhhc2hfaW5zZXJ0IChzdHJpbmcsIHRhYmxlLCBmbGFncykK IAogICBpZiAoaXRlbSA9PSAwKQogICAgIHsKKyAgICAgIGlmICh0YWJsZS0+bmVudHJpZXMgPj0g dGFibGUtPm5idWNrZXRzICogSEFTSF9NQVhfTE9BREZBQ1RPUikKKwloYXNoX3JlaGFzaCAodGFi bGUpOworCiAgICAgICBidWNrZXQgPSBIQVNIX0JVQ0tFVCAoc3RyaW5nLCB0YWJsZSwgaHYpOwog CiAgICAgICBpdGVtID0gKEJVQ0tFVF9DT05URU5UUyAqKXhtYWxsb2MgKHNpemVvZiAoQlVDS0VU X0NPTlRFTlRTKSk7CmRpZmYgLS1naXQgYS9oYXNobGliLmggYi9oYXNobGliLmgKaW5kZXggODhl YTc3OGYuLjFhZGNkZDdjIDEwMDY0NAotLS0gYS9oYXNobGliLmgKKysrIGIvaGFzaGxpYi5oCkBA IC03NCw2ICs3NCw4IEBAIGV4dGVybiB1bnNpZ25lZCBpbnQgaGFzaF9zdHJpbmcgX19QKChjb25z dCBjaGFyICopKTsKIAogLyogRGVmYXVsdCBudW1iZXIgb2YgYnVja2V0cyBpbiB0aGUgaGFzaCB0 YWJsZS4gKi8KICNkZWZpbmUgREVGQVVMVF9IQVNIX0JVQ0tFVFMgMTI4CS8qIG11c3QgYmUgcG93 ZXIgb2YgdHdvICovCisjZGVmaW5lIEhBU0hfUkVIQVNIX01VTFRJUExJRVIgNAkJLyogbXVzdCBi ZSBwb3dlciBvZiB0d28gKi8KKyNkZWZpbmUgSEFTSF9NQVhfTE9BREZBQ1RPUiAxLjAKIAogI2Rl ZmluZSBIQVNIX0VOVFJJRVMoaHQpCSgoaHQpID8gKGh0KS0+bmVudHJpZXMgOiAwKQogCi0tIAoy LjIxLjEKCg== --000000000000b2977505a3b5ce97--