Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.perl.modules > #225 > unrolled thread

a new Tree::Interval-like module on top of Tree::RB?

Started byIvan Shmakov <oneingray@gmail.com>
First post2013-06-13 17:13 +0000
Last post2013-07-02 19:37 +0000
Articles 9 — 2 participants

Back to article view | Back to comp.lang.perl.modules


Contents

  a new Tree::Interval-like module on top of Tree::RB? Ivan Shmakov <oneingray@gmail.com> - 2013-06-13 17:13 +0000
    Tree::Range 0.1 released Ivan Shmakov <oneingray@gmail.com> - 2013-06-27 05:27 +0000
      Re: Tree::Range 0.1 released Ben Morrow <ben@morrow.me.uk> - 2013-06-27 07:47 +0100
        Re: Tree::Range 0.1 released Ivan Shmakov <oneingray@gmail.com> - 2013-06-27 08:12 +0000
          Re: Tree::Range 0.1 released Ben Morrow <ben@morrow.me.uk> - 2013-06-27 11:24 +0100
            Re: Tree::Range 0.1 released Ivan Shmakov <oneingray@gmail.com> - 2013-06-27 11:05 +0000
              Re: Tree::Range 0.1 released Ben Morrow <ben@morrow.me.uk> - 2013-06-27 12:50 +0100
                Re: Tree::Range 0.1 released Ivan Shmakov <oneingray@gmail.com> - 2013-06-27 13:11 +0000
                  Benchmark-ing Tree::Range::RB? Ivan Shmakov <oneingray@gmail.com> - 2013-07-02 19:37 +0000

#225 — a new Tree::Interval-like module on top of Tree::RB?

FromIvan Shmakov <oneingray@gmail.com>
Date2013-06-13 17:13 +0000
Subjecta new Tree::Interval-like module on top of Tree::RB?
Message-ID<87ip1igchz.fsf@violet.siamics.net>
	So, I've crafted the base for a Tree::Interval-like module,
	implemented on top of Tree::RB, and I'm open to suggestions of
	what'd be a suitable name for it?

	The obvious candidates are Tree::RB::Interval and
	Tree::Interval::RB, but I'm uncertain if it'd be nice to
	infiltrate someone's else CPAN namespace in this case.

	FWIW, Tree::Interval::RB seems more appropriate, as the code as
	it is requires only the compare, delete, insert and lookup
	methods from the "backend," and thus could probably be not that
	hard to generalize.  (Although the lookup method is expected to
	allow for "imprecise" matching, and then it demands a way to
	iterate onwards from the node found...)

	A nice feature of this code is that it relies on the tree's own
	"compare" function, which could be rather arbitrary.  Thus, it's
	possible to, say:

    my $map
        = XXX->new (...);
    $map->set ("from", "to", "value");
    my $a
        = $map->lookup ("fuzzy");
    ## => "value"

	(provided that the tree is built around the "cmp" function.)

	TIA.

-- 
FSF associate member #7257	np. faceday.mod

[toc] | [next] | [standalone]


#226 — Tree::Range 0.1 released

FromIvan Shmakov <oneingray@gmail.com>
Date2013-06-27 05:27 +0000
SubjectTree::Range 0.1 released
Message-ID<87a9mc9l80.fsf@violet.siamics.net>
In reply to#225
>>>>> Ivan Shmakov <oneingray@gmail.com> writes:

 > So, I've crafted the base for a Tree::Interval-like module,
 > implemented on top of Tree::RB, and I'm open to suggestions of what'd
 > be a suitable name for it?

[...]

 > FWIW, Tree::Interval::RB seems more appropriate, as the code as it is
 > requires only the compare, delete, insert and lookup methods from the
 > "backend," and thus could probably be not that hard to generalize.

	So, I've ended up with Tree::Range::RB, and Tree::Range::base
	for the ->get_range () and ->range_set () methods.  Both are now
	released as Tree::Range 0.1.  The CPAN page for the version is:

http://search.cpan.org/~onegray/Tree-Range-0.1/

[...]

PS.  The namespace registration request is filed and pending approval.

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]


#227 — Re: Tree::Range 0.1 released

FromBen Morrow <ben@morrow.me.uk>
Date2013-06-27 07:47 +0100
SubjectRe: Tree::Range 0.1 released
Message-ID<oo3t9a-pe72.ln1@anubis.morrow.me.uk>
In reply to#226
Quoth Ivan Shmakov <oneingray@gmail.com>:
> 
> 	So, I've ended up with Tree::Range::RB, and Tree::Range::base
> 	for the ->get_range () and ->range_set () methods.  Both are now
> 	released as Tree::Range 0.1.  The CPAN page for the version is:
> 
> http://search.cpan.org/~onegray/Tree-Range-0.1/
> 
> [...]
> 
> PS.  The namespace registration request is filed and pending approval.

The namespace registration process is pretty-much defunct at this point.
Both Tree::Range::RB and Tree::Range::base are already as 'properly
registered' as they need to be; that is, they are listed in
02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
will find your distributions for those package names.

However, it goes against convention to have a distribution called
Tree-Range that does not contain a module Tree::Range. You should
consider either renaming Tree::Range::base or at least providing a
documentation-only Tree::Range explaining why there isn't a module
there.

Ben

[toc] | [prev] | [next] | [standalone]


#228 — Re: Tree::Range 0.1 released

FromIvan Shmakov <oneingray@gmail.com>
Date2013-06-27 08:12 +0000
SubjectRe: Tree::Range 0.1 released
Message-ID<87ppv87z1j.fsf@violet.siamics.net>
In reply to#227
>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:

 >> So, I've ended up with Tree::Range::RB, and Tree::Range::base for
 >> the ->get_range () and ->range_set () methods.  Both are now
 >> released as Tree::Range 0.1.  The CPAN page for the version is:

 >> http://search.cpan.org/~onegray/Tree-Range-0.1/

 >> PS.  The namespace registration request is filed and pending
 >> approval.

 > The namespace registration process is pretty-much defunct at this
 > point.  Both Tree::Range::RB and Tree::Range::base are already as
 > 'properly registered' as they need to be; that is, they are listed in
 > 02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
 > will find your distributions for those package names.

	Yes.

	Still, it's recommended by [1] (though feel free to file a bug
	report if you think it shouldn't be):

    [...] A registration is not a prerequisite for uploading.  It is
    just recommended for better searchability of the CPAN and to avoid
    namespace clashes.  [...]

	Besides, it'll add some useful bits of information to [2].

[1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod
[2] http://search.cpan.org/~onegray/

 > However, it goes against convention to have a distribution called
 > Tree-Range that does not contain a module Tree::Range.  You should
 > consider either renaming Tree::Range::base

	Won't this go against the example set by Digest::base? (which I
	tend to think as a good enough to stick to.)

 > or at least providing a documentation-only Tree::Range explaining why
 > there isn't a module there.

	I guess I'd consider the latter for 0.2.  (Which I hope'll be
	the release fixing only the documentation and the metadata.)

	The other option would be to have Tree::Range->new () behave
	like Digest->new ().

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]


#229 — Re: Tree::Range 0.1 released

FromBen Morrow <ben@morrow.me.uk>
Date2013-06-27 11:24 +0100
SubjectRe: Tree::Range 0.1 released
Message-ID<3ggt9a-22b2.ln1@anubis.morrow.me.uk>
In reply to#228
Quoth Ivan Shmakov <oneingray@gmail.com>:
> >>>>> Ben Morrow <ben@morrow.me.uk> writes:
> 
>  > The namespace registration process is pretty-much defunct at this
>  > point.  Both Tree::Range::RB and Tree::Range::base are already as
>  > 'properly registered' as they need to be; that is, they are listed in
>  > 02packages.details.txt.gz on CPAN, so clients like CPAN.pm and cpanm
>  > will find your distributions for those package names.
> 
> 	Yes.
> 
> 	Still, it's recommended by [1] (though feel free to file a bug
> 	report if you think it shouldn't be):
> 
>     [...] A registration is not a prerequisite for uploading.  It is
>     just recommended for better searchability of the CPAN and to avoid
>     namespace clashes.  [...]

See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/master/
lancaster-consensus.md#module-registration> (and indeed the rest of that
document). While this is a 'new' agreement, in the Perl world agreements
and specs usually come after implementation, so it is describing the
way things currently work in practice.

> 	Besides, it'll add some useful bits of information to [2].

...which noone but you will ever look at :). 

>  > However, it goes against convention to have a distribution called
>  > Tree-Range that does not contain a module Tree::Range.  You should
>  > consider either renaming Tree::Range::base
> 
> 	Won't this go against the example set by Digest::base? (which I
> 	tend to think as a good enough to stick to.)

It's not a convention I've seen before, but there's nothing wrong with
it. (It would be more usual to be ::Base, at least; the analogy with
base.pm is not really relevant any more.)

>  > or at least providing a documentation-only Tree::Range explaining why
>  > there isn't a module there.
> 
> 	I guess I'd consider the latter for 0.2.  (Which I hope'll be
> 	the release fixing only the documentation and the metadata.)
> 
> 	The other option would be to have Tree::Range->new () behave
> 	like Digest->new ().

Yeah, perhaps. I haven't really got a sense of what this module's useful
for, so I can't comment.

Ben

[toc] | [prev] | [next] | [standalone]


#230 — Re: Tree::Range 0.1 released

FromIvan Shmakov <oneingray@gmail.com>
Date2013-06-27 11:05 +0000
SubjectRe: Tree::Range 0.1 released
Message-ID<87a9mb95l0.fsf@violet.siamics.net>
In reply to#229
>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
>>>>> Ben Morrow <ben@morrow.me.uk> writes:

[...]

 >> Still, it's recommended by [1] (though feel free to file a bug
 >> report if you think it shouldn't be):

 >> [...] A registration is not a prerequisite for uploading.  It is
 >> just recommended for better searchability of the CPAN and to avoid
 >> namespace clashes.  [...]

 > See <https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/
 > master/lancaster-consensus.md#module-registration> (and indeed the
 > rest of that document).  While this is a 'new' agreement, in the Perl
 > world agreements and specs usually come after implementation, so it
 > is describing the way things currently work in practice.

	So, I guess [1] is going to be updated within a few months from
	now.  Thanks.

 >> [1] https://pause.perl.org/pause/authenquery?ACTION=apply_mod

[...]

 >>> However, it goes against convention to have a distribution called
 >>> Tree-Range that does not contain a module Tree::Range.  You should
 >>> consider either renaming Tree::Range::base

 >> Won't this go against the example set by Digest::base? (which I tend
 >> to think as a good enough to stick to.)

 > It's not a convention I've seen before, but there's nothing wrong
 > with it.  (It would be more usual to be ::Base, at least; the analogy
 > with base.pm is not really relevant any more.)

	AIUI, the "capital-case" names under Digest:: are reserved for
	the actual message digest algorithms.  For instance,
	Digest->new ("SHA-1") ends up in a call to Digest::SHA->new (1).

 >>> or at least providing a documentation-only Tree::Range explaining
 >>> why there isn't a module there.

 >> I guess I'd consider [it] for 0.2.  (Which I hope'll be the release
 >> fixing only the documentation and the metadata.)

 >> The other option would be to have Tree::Range->new () behave like
 >> Digest->new ().

 > Yeah, perhaps.  I haven't really got a sense of what this module's
 > useful for, so I can't comment.

	One of the possible uses of Tree::Range::RB specifically is to
	operate on .newsrc data.  Consider, e. g.:

### gxmj7tsetytsnbnp85yhxyjror.perl  -*- Perl -*-

### Ivan Shmakov, 2013

### Code:

use common::sense;

require Tree::Range::RB;

sub ranges_parse {
    my ($in, @ranges) = @_;
    foreach my $range (@ranges) {
        my ($a, $b)
            = ($range =~ /^([0-9]+)(?:-([0-9]+))?$/)
            or die ($range, ": is not a vaild range");
        $in->range_set ($a, 1 + ($b // $a), 1);
    }
    ## .
    $in;
}

my @comp_lang_perl_misc_s = qw {
    1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
    27638-27996
    27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
    28036-28093
};

my $comp_lang_perl_misc
    = Tree::Range::RB->new ({ "cmp"     => sub { $_[0] <=> $_[1]; },
                              "equal-p" => sub { $_[0] eq  $_[1]; } });
ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);

{
    local ($,, $\)
        = (", ", "\n");

    ## check if 28007 was read? (i. e., what range 28007 belongs to?)
    print ($comp_lang_perl_misc->get_range (28007));
    ## => 1, 28006, 28009

    ## now the user has marked some more articles as read
    ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

    ## check if 28011 was read? (i. e., what range 28011 belongs to?)
    print ($comp_lang_perl_misc->get_range (28011));
    ## => 1, 28004, 28014

    ## note how the range was extended as the gaps were filled
}

### gxmj7tsetytsnbnp85yhxyjror.perl ends here

	Similarly, by utilizing the Tree::RB's own iterator, a simple
	and efficient serializer for this kind of data can be built.

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]


#231 — Re: Tree::Range 0.1 released

FromBen Morrow <ben@morrow.me.uk>
Date2013-06-27 12:50 +0100
SubjectRe: Tree::Range 0.1 released
Message-ID<ohlt9a-upb2.ln1@anubis.morrow.me.uk>
In reply to#230
Quoth Ivan Shmakov <oneingray@gmail.com>:
> 
> 	One of the possible uses of Tree::Range::RB specifically is to
> 	operate on .newsrc data.  Consider, e. g.:
> 
> 
> my @comp_lang_perl_misc_s = qw {
>     1-27551 27567 27575 27602-27606 27627-27629 27632 27634-27636
>     27638-27996
>     27998-28002 28004 28006-28008 28011-28013 28015-28027 28029 28032
>     28036-28093
> };
> 
> my $comp_lang_perl_misc
>     = Tree::Range::RB->new ({ "cmp"     => sub { $_[0] <=> $_[1]; },
>                               "equal-p" => sub { $_[0] eq  $_[1]; } });
> ranges_parse ($comp_lang_perl_misc, @comp_lang_perl_misc_s);
> 
> {
>     local ($,, $\)
>         = (", ", "\n");
> 
>     ## check if 28007 was read? (i. e., what range 28007 belongs to?)
>     print ($comp_lang_perl_misc->get_range (28007));
>     ## => 1, 28006, 28009
> 
>     ## now the user has marked some more articles as read
>     ranges_parse ($comp_lang_perl_misc, qw (28005 28009 28010));

Is this importantly different from Set::IntSpan/::Fast/::XS?

Ben

[toc] | [prev] | [next] | [standalone]


#232 — Re: Tree::Range 0.1 released

FromIvan Shmakov <oneingray@gmail.com>
Date2013-06-27 13:11 +0000
SubjectRe: Tree::Range 0.1 released
Message-ID<871u7n8zrc.fsf@violet.siamics.net>
In reply to#231
>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:

 >> One of the possible uses of Tree::Range::RB specifically is to
 >> operate on .newsrc data.  Consider, e. g.:

[...]

 >> my $comp_lang_perl_misc
 >>     = Tree::Range::RB->new ({ "cmp"     => sub { $_[0] <=> $_[1]; },
 >>                               "equal-p" => sub { $_[0] eq  $_[1]; } });

[...]

 >>     print ($comp_lang_perl_misc->get_range (28007));
 >>     ## => 1, 28006, 28009

[...]

 > Is this importantly different from Set::IntSpan/::Fast/::XS?

	Yes, although not in the case of .newsrc.

	First of all, each range (span, interval, etc.) may have an
	associated value.  It's "1" (or "undef", for the "complementary"
	ranges) in the example above, but it needs not to be.

	Moreover, unlike Set::IntSpan, Tree::Range::RB is applicable to
	an arbitrary ordered set.  (Note the "cmp" option above.)

	Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
	documentation) implemented on top of an ordered list seems to
	imply the insertion and deletion times of O (N), while for a
	red-black tree they're both O (log N).  Though ::XS may still
	outperform a pure-Perl tree-based implementation on reasonable
	use cases.

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]


#234 — Benchmark-ing Tree::Range::RB?

FromIvan Shmakov <oneingray@gmail.com>
Date2013-07-02 19:37 +0000
SubjectBenchmark-ing Tree::Range::RB?
Message-ID<87r4fg3g8u.fsf_-_@violet.siamics.net>
In reply to#232
>>>>> Ivan Shmakov <oneingray@gmail.com> writes:

[...]

 > Also, having Set::IntSpan (and Set::IntSpan::Fast, as per the
 > documentation) implemented on top of an ordered list seems to imply
 > the insertion and deletion times of O (N), while for a red-black tree
 > they're both O (log N).  Though ::XS may still outperform a pure-Perl
 > tree-based implementation on reasonable use cases.

	So, I've ran a simplistic benchmark over the three, which is,
	roughly (the exact code is as shown below):

   # my ($size, $iter) = (4194304, 5);
   my @vec
       = (map { int (rand ($size)); } (1 .. $size));

   my $set
       = Set::IntSpan::Fast->new ();
   $set->add ($_)
       foreach (@vec);

	It took awhile to complete, but the results seem to suggest that
	Tree::Range::RB is indeed faster than both Set::IntSpan::Fast
	and Set::IntSpan, should the problem size be sufficiently large.

Size: 4194304 Iterations: 5 (negative is time in seconds)
Tree::Range::RB: 8003 wallclock secs (7944.34 usr  5.88 sys +  0.00 cusr  0.00 csys = 7950.22 CPU) @  0.00/s (n=5)
Set::IntSpan: 17759 wallclock secs (17556.78 usr 10.12 sys +  0.00 cusr  0.00 csys = 17566.90 CPU) @  0.00/s (n=5)
Set::IntSpan::Fast: 19318 wallclock secs (19060.33 usr  9.25 sys +  0.00 cusr  0.00 csys = 19069.58 CPU) @  0.00/s (n=5)
                   s/iter Set::IntSpan::Fast      Set::IntSpan   Tree::Range::RB
Set::IntSpan::Fast   3814                 --               -8%              -58%
Set::IntSpan         3513                 9%                --              -55%
Tree::Range::RB      1590               140%              121%                --
44565.74user 26.12system 12:31:28elapsed 98%CPU (0avgtext+0avgdata 972212maxresident)k
224inputs+0outputs (1major+316600minor)pagefaults 0swaps

	Or did I make some silly mistake with that?

	TIA.

### Code:

use common::sense;

use Benchmark qw (cmpthese timethis);
use Scalar::Util qw (looks_like_number);

require Set::IntSpan;
require Set::IntSpan::Fast;
require Tree::Range::RB;

my ($size, $iter)
    = (map { (/^0/ ? oct ($_) : $_); } (@ARGV));
$size
    //= 0x100000;
$iter
    //= -521;
looks_like_number ($size)
    or die ($size, ": Size given is not a number\n");
looks_like_number ($iter)
    or die ($iter, ": Iterations count given is not a number\n");
warn ("Size: ", 0 + $size, " Iterations: ", 0 + $iter,
      " (negative is time in seconds)\n");
my @vec
    = (map { int (rand ($size)); } (1 .. $size));

sub xcmp {
    ## .
    $_[0] <=> $_[1];
}

sub xeq {
    ## .
    $_[0] eq $_[1];
}

my $trr
    = timethis ($iter,
                sub {
                    my $rat_opt = {
                        "cmp"     => \&xcmp,
                        "equal-p" => \&xeq
                    };
                    my $value
                        = [ "*value*" ];
                    my $rat
                        = Tree::Range::RB->new ($rat_opt);
                    $rat->range_set ($_, 1 + $_, $value)
                        foreach (@vec);
                },
                "Tree::Range::RB", "all");
my $sis
    = timethis ($iter,
                sub {
                    my $set
                        = Set::IntSpan->new ();
                    $set->insert ($_)
                        foreach (@vec);
                },
                "Set::IntSpan", "all");
my $sisf
    = timethis ($iter,
                sub {
                    my $set
                        = Set::IntSpan::Fast->new ();
                    $set->add ($_)
                        foreach (@vec);
                },
                "Set::IntSpan::Fast", "all");

cmpthese ({
              "Set::IntSpan"    => $sis,
              "Set::IntSpan::Fast"  => $sisf,
              "Tree::Range::RB" => $trr
          },
          "all");

-- 
FSF associate member #7257	np. ybjsaf.it

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.perl.modules


csiph-web