Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #16181

Re: [PATCH] Implement rehashing for associative arrays (Re: speeding up hash_search?)

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From George Jones <fooologist@gmail.com>
Newsgroups gnu.bash.bug
Subject Re: [PATCH] Implement rehashing for associative arrays (Re: speeding up hash_search?)
Date Sun, 19 Apr 2020 21:00:08 -0400
Lines 102
Approved bug-bash@gnu.org
Message-ID <mailman.755.1587344426.3066.bug-bash@gnu.org> (permalink)
References <CALv3B7aiTbhpyUL17Eg5prH39EgMP8uSDdD554OdND3LFETAQg@mail.gmail.com> <CAFLRLk8ydO6ev8wRcJ35kzzN3Yt2_9sjsc6L+C0CC6dCxud_NA@mail.gmail.com> <CALv3B7bzh3degKPCe5c_avUc90L+bE8jqKKcaVo13zwqEyB5-A@mail.gmail.com>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
X-Trace usenet.stanford.edu 1587344426 6098 209.51.188.17 (20 Apr 2020 01:00:26 GMT)
X-Complaints-To action@cs.stanford.edu
Cc bug-bash@gnu.org
To Koichi Murase <myoga.murase@gmail.com>
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OoCrE/aJ6as0h3tocQ18IpGRjiL+dqVuxCh9gUJ4yYk=; b=MYqN8hjNFlFdcDdLKD1F2ZP9LfLFB22v+am7jjP7f2KJc4UviwT5VJh1kL6MjU9/DM Inf00yOWiIoEW4aFO7eKaiFH03KqI3x1eC3czV5a6z3+TIykOzFwi1QecwPEK8KzD+0u uQ6XvMe+RIsvpi/awpClrtRPyyACOuPoZVJZ01ODNcpV6Da/QTKXu3oFt3buZ65Vsqq6 pviSvUIfEeBTeYwTvj7iU3vU2iVQMP6mUYdJC5lA6eF+SXAvQgU+ZFvt/0lqoPiKo+7g MoNoRxTCnZG3LOeo8NmeWZNMTU2nCtggCH4yQH5z4bu2u8SO5KDfH4LxluQ3JjE9hHPF dPsw==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OoCrE/aJ6as0h3tocQ18IpGRjiL+dqVuxCh9gUJ4yYk=; b=uRTYuYqQ4qgczfkb16Pp/ikCEmm66ECqOTfm6CVPaTLXpCIxjCtarOjMdpBWEQVYui RrQdc4XRHmOrJPKJoSo277MziRCmfqZ8zOKdAcU3COnd4+wWBYsRAJOQpoEz+WjjImdU FG21TnDn3dcZtCc93Caw8/Hg372QJWyuS+VFDJH8Z48iVmJiDpxVsUK/Y41vjYla3KCv Zt7RM+MdGuPO2yYzytA7i/lPG/ygfSvmXnrrMFXlK3ewr5YNhqW173wxTHYXt/dPCLIt 8J91yCV3aIm7ipHgSTdlKXfNZzzVNGIjpI7AF+LrahPW+z2i2dmDzofgYkWBz+SsnAlT qODg==
X-Gm-Message-State AGi0PuY+ybOuEwJ332d2Cu4+JizY5EOr6LDToXV3W14eV93f23j1eNH4 zCO1uu0Rkpb28s2npD+bNmu7yfI9KOKUaGGv8wU=
X-Google-Smtp-Source APiQypIJnrGIIqPuHVwq6RnJq5lrFm0YlWxR4YbZOx5HD9NrPfgR6yr47ra4Tia4CFQzRsoOjTEc7kG6zgoX+/MpDss=
X-Received by 2002:aca:f50e:: with SMTP id t14mr8825515oih.156.1587344419133; Sun, 19 Apr 2020 18:00:19 -0700 (PDT)
In-Reply-To <CAFLRLk8ydO6ev8wRcJ35kzzN3Yt2_9sjsc6L+C0CC6dCxud_NA@mail.gmail.com>
Received-SPF pass client-ip=2607:f8b0:4864:20::244; envelope-from=fooologist@gmail.com; helo=mail-oi1-x244.google.com
X-detected-operating-system by eggs1p.gnu.org: Genre and OS details not recognized.
X-Received-From 2607:f8b0:4864:20::244
X-Content-Filtered-By Mailman/MimeDel 2.1.23
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <CALv3B7bzh3degKPCe5c_avUc90L+bE8jqKKcaVo13zwqEyB5-A@mail.gmail.com>
X-Mailman-Original-References <CALv3B7aiTbhpyUL17Eg5prH39EgMP8uSDdD554OdND3LFETAQg@mail.gmail.com> <CAFLRLk8ydO6ev8wRcJ35kzzN3Yt2_9sjsc6L+C0CC6dCxud_NA@mail.gmail.com>
Xref csiph.com gnu.bash.bug:16181

Show key headers only | View raw


Thank you.   Patch applied and (performance) tested with come tests I was
working on
https://github.com/eludom/snippits/tree/master/bash/tests .... bottom line:

Before:
...

lines_per_sec 993.37 LINES 2700000 ELAPSED 2718
lines_per_sec 955.95 LINES 2800000 ELAPSED 2929
lines_per_sec 921.51 LINES 2900000 ELAPSED 3147
lines_per_sec 889.41 LINES 3000000 ELAPSED 3373
lines_per_sec 859.43 LINES 3100000 ELAPSED 3607


After:

...

lines_per_sec 50000.00 LINES 2600000 ELAPSED 52
lines_per_sec 50000.00 LINES 2700000 ELAPSED 54
lines_per_sec 50000.00 LINES 2800000 ELAPSED 56
lines_per_sec 50000.00 LINES 2900000 ELAPSED 58
lines_per_sec 50000.00 LINES 3000000 ELAPSED 60
lines_per_sec 50000.00 LINES 3100000 ELAPSED 62


---George Jones


On Sun, Apr 19, 2020 at 3:52 PM Koichi Murase <myoga.murase@gmail.com>
wrote:

> 2020-04-19 23:54 George Jones <fooologist@gmail.com>:
> > It looks like hash_search just does a linear walk if array entries
> > to find elements in a list.
>
> https://eludom.github.io/blog/20200418/
> > and there it is, the linear search walking the list in hash_search()
> >
> > ```
> > [...]
> >
> >   bucket = HASH_BUCKET (string, table, hv);
> >
> >   for (list = table->bucket_array ? table->bucket_array[bucket] : 0;
> list; list = list->next)
> >     {
> > [...]
> > ```
>
> The associative arrays in `hashlib.c' are implemented by hash tables
> as is clear from its name.  The main lookup of hash table algorithm is
> done by the following line
>
>   bucket = HASH_BUCKET (string, table, hv);
>
> but not by the subsequent linear search.  The linear search is just a
> workaround for the collision of hashes.  As far as the load factor of
> the hash table is maintained properly, the linear search is O(1)
> because the length of the list is O(1).
>
> 2020-04-19 23:54 George Jones <fooologist@gmail.com>:
> > This slows down (order N?) new inserts when the number of entries
> > gets large.
>
> I looked into `hashlib.c' and found that rehashing is actually not
> implemented.  I just added a function to perform rehashing, and
> the performance has been improved.
>
> > Would there be any interest in merging a patch to add an option for
> > making this faster (maybe using b-trees?)
>
> https://eludom.github.io/blog/20200418/
> > TODO Look for appropriate in-memory hash insert/lookup functions
> > -  btrees ?
>
> The B-tree is not the most appropriate option here.  The hash table is
> more appropriate.  The B-tree can be used in the case that we want to
> keep the ordering of the keys of associative arrays (e.g. we want to
> enumerate items in ascending/descending order).  Bash associative
> arrays do not ensure the ordering of the items, so the hash table can
> be used as a more efficient choice and is already implemented in
> `hashlib.c'.  We can simply add rehashing.
>
> ------------------------------------------------------------------------
>
> I attach a patch `0001-hashlib-Implement-rehash.patch' which
> implements the rehashing.
>
> I also tested the performance.  The attached `test1.png' shows the
> computational time of insertion of items before and after the fix.
> The lines are fitted by the function `Time = C Size^alpha' where alpha
> is the exponent.  Before the fix, there are two regimes depending on
> the number of items: linear regime O(N) (alpha ~ 1) and quadratic
> regime O(N^2) (alpha ~ 2).  The quadratic regime signals the linear
> scaling of a single insertion.  After the fix, the prformance has been
> improved, and the computational time scales linearly in entire region.
> I also attach a script `test1.sh' which was used to measure the time.
>
> --
> Koichi
>

Back to gnu.bash.bug | Previous | Next | Find similar


Thread

Re: [PATCH] Implement rehashing for associative arrays (Re: speeding up hash_search?) George Jones <fooologist@gmail.com> - 2020-04-19 21:00 -0400

csiph-web