Groups > comp.lang.python > #12652 > unrolled thread

Algorithms Library - Asking for Pointers

Started by	Travis Parks <jehugaleahsa@gmail.com>
First post	2011-09-02 09:59 -0700
Last post	2011-09-03 20:35 -0700
Articles	7 — 4 participants

Back to article view | Back to comp.lang.python

  Algorithms Library - Asking for Pointers Travis Parks <jehugaleahsa@gmail.com> - 2011-09-02 09:59 -0700
    Re: Algorithms Library - Asking for Pointers Ian Kelly <ian.g.kelly@gmail.com> - 2011-09-02 14:09 -0600
      Re: Algorithms Library - Asking for Pointers Travis Parks <jehugaleahsa@gmail.com> - 2011-09-02 15:49 -0700
        Re: Algorithms Library - Asking for Pointers Travis Parks <jehugaleahsa@gmail.com> - 2011-09-02 18:39 -0700
          Re: Algorithms Library - Asking for Pointers Chris Rebert <clp2@rebertia.com> - 2011-09-02 19:23 -0700
          Re: Algorithms Library - Asking for Pointers Chris Torek <nospam@torek.net> - 2011-09-03 04:35 +0000
            Re: Algorithms Library - Asking for Pointers Travis Parks <jehugaleahsa@gmail.com> - 2011-09-03 20:35 -0700

#12652 — Algorithms Library - Asking for Pointers

From	Travis Parks <jehugaleahsa@gmail.com>
Date	2011-09-02 09:59 -0700
Subject	Algorithms Library - Asking for Pointers
Message-ID	<beb607c3-1642-4663-a0f7-626d0814f0bd@dl2g2000vbb.googlegroups.com>

Hello:

I am working on an algorithms library. It provides LINQ like
functionality to Python iterators. Eventually, I plan on having
feaures that work against sequences and mappings.

I have the code up at http://code.google.com/p/py-compass.

This is my first project in Python, so I'd like some feedback. I want
to know if I am following conventions (overall style and quality of
code).

Thanks,
Travis Parks

[toc] | [next] | [standalone]

#12670

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2011-09-02 14:09 -0600
Message-ID	<mailman.718.1314994196.27778.python-list@python.org>
In reply to	#12652

On Fri, Sep 2, 2011 at 10:59 AM, Travis Parks <jehugaleahsa@gmail.com> wrote:
> Hello:
>
> I am working on an algorithms library. It provides LINQ like
> functionality to Python iterators. Eventually, I plan on having
> feaures that work against sequences and mappings.
>
> I have the code up at http://code.google.com/p/py-compass.
>
> This is my first project in Python, so I'd like some feedback. I want
> to know if I am following conventions (overall style and quality of
> code).

Sure, here are my comments.

In the "forever" and "__forever" functions, your use of the term
"generator" is confusing.  "__forever" is a generator function,
because it has a yield statement.  Its argument, called "generator",
appears to be a callable, not a generator or even necessarily a
generator function.  Also, note that __forever(lambda: value) is
functionally equivalent to the more efficient itertools.repeat(value).

The staticmethod __next(iterator) accesses the class it is defined in,
which suggests that it might be better written as a classmethod
__next(cls, iterator).

Each of the LINQ-style methods is divided into two parts: the public
method that contains the docstring and some argument checks, and a
private staticmethod that contains the implementation.  I'm not
certain what the purpose of that is.  If it's to facilitate overriding
the implementation in subclasses, then you need to change the names of
the private methods to start with only one _ character so that they
won't be mangled by the compiler.

The comments before each method that only contain the name of the
immediately following method are redundant.

aggregate: the default aggregator is unintuitive to me.  I would make
it a required field and add a separate method called sum that calls
aggregate with the operator.add aggregator.
Also, the implementation doesn't look correct.  Instead of passing in
each item to the aggregator, you're passing in the number of items
seen so far?  The LINQ Aggregate method is basically reduce, so rather
than reinvent the wheel I would suggest this:

# MISSING is a unique object solely defined to represent missing arguments.
# Unlike None we can safely assume it will never be passed as actual data.
MISSING = object()

def aggregate(self, aggregator, seed=MISSING):
    if seed is self.MISSING:
        return reduce(aggregator, self._iterable)
    else:
        return reduce(aggregator, self._iterable, seed)

Note for compatibility that in Python 3 the reduce function has been
demoted from a builtin to a member of the functools module.

any: the name of this method could cause some confusion with the "any"
builtin that does something a bit different.

compare: the loop would more DRY as a for loop:

def __compare(first, second, comparison):
    for firstval, secondval in itertools.izip_longest(first, second,
fillvalue=self.MISSING):
        if firstval is self.MISSING:
            return -1
        elif secondval is self.MISSING:
            return 1
        else:
            result = comparison(firstval, secondval)
            if result != 0:
                return result
    return 0

concatenate: again, no need to reinvent the wheel.  This should be
more efficient:

def concatenate(self, other):
    return extend(itertools.chain(self.__iterable, other))

equals: could be just "return self.compare(other, comparison) == 0"

__last: the loop could be a for loop:

        # assume we're looking at the last item and try moving to the next
        item = result.Value
        for item in iterator: pass
        return item

lastOrDefault: there's a lot of repeated logic here.  This could just be:

def lastOrDefault(self, default=None):
    try:
        return self.last()
    except ValueError:
        return default

map / forEach: .NET has to separate these into separate methods due to
static typing.  It seems a bit silly to have both of them in Python.
Also, map would be more efficient as "return itertools.imap(mapper,
self.__iterable)"

max / min: it would be more efficient to use the builtin:
def max(self, key):
    return max(self.__iterable, key=key)
If somebody really needs to pass a comparison function instead of a
key function, they can use functools.cmp_to_key.

randomSamples: a more canonical way to pass the RNG would be to pass
an instance of random.Random rather than an arbitrary function. Then
to get a random integer you can just call generator.randrange(total).
Note that for the default you can just use the random module itself,
which provides default implementations of all the Random methods.
Also, for loops:

    def __randomSamples(iterable, sampleCount, generator):
        samples = []
        iterator = iter(iterable)
        # fill up the samples with the items from the beginning of the iterable
        for i, value in zip(xrange(sampleCount), iterator):
            samples.append(value)
        # replace items if the generated number is less than the total
        total = len(samples)
        for value in iterator:
            total += 1
            index = generator.randrange(total)
            if index < len(samples):
                samples[index] = result
        return samples

__reverse: you could just "return reversed(list(iterable))"

__rotateLeft:
def __rotateLeft(iterable, shift):
    iterator = iter(iterable)
    front = list(itertools.islice(iterator, shift))
    return itertools.chain(iterator, front)

skipWhile: suggest using itertools.dropwhile
take: suggest using itertools.islice
takeWhile: suggest using itertools.takewhile
__toList: "return list(iterable)"
__toSet: "return set(iterable)"
__toTuple: "return tuple(iterable)".  Note that as currently written
this actually returns a generator, not a tuple.
__where: suggest using itertools.ifilter
__zip: suggest using a for loop over itertools.izip(first, second)

Lookup: is inconsistent.  The overridden __iter__ method returns an
iterator over the values of the groups, but all the inherited methods
are going to iterate over the keys.  Probably you want to pass
groups.values() to the superclass __init__ method.

Cheers,
Ian

[toc] | [prev] | [next] | [standalone]

#12679

From	Travis Parks <jehugaleahsa@gmail.com>
Date	2011-09-02 15:49 -0700
Message-ID	<8f893753-abbc-47df-9d90-77263e96be1b@p10g2000yqi.googlegroups.com>
In reply to	#12670

On Sep 2, 4:09 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Fri, Sep 2, 2011 at 10:59 AM, Travis Parks <jehugalea...@gmail.com> wrote:
> > Hello:
>
> > I am working on an algorithms library. It provides LINQ like
> > functionality to Python iterators. Eventually, I plan on having
> > feaures that work against sequences and mappings.
>
> > I have the code up athttp://code.google.com/p/py-compass.
>
> > This is my first project in Python, so I'd like some feedback. I want
> > to know if I am following conventions (overall style and quality of
> > code).
>
> Sure, here are my comments.
>
> In the "forever" and "__forever" functions, your use of the term
> "generator" is confusing.  "__forever" is a generator function,
> because it has a yield statement.  Its argument, called "generator",
> appears to be a callable, not a generator or even necessarily a
> generator function.  Also, note that __forever(lambda: value) is
> functionally equivalent to the more efficient itertools.repeat(value).
>
> The staticmethod __next(iterator) accesses the class it is defined in,
> which suggests that it might be better written as a classmethod
> __next(cls, iterator).
>
> Each of the LINQ-style methods is divided into two parts: the public
> method that contains the docstring and some argument checks, and a
> private staticmethod that contains the implementation.  I'm not
> certain what the purpose of that is.  If it's to facilitate overriding
> the implementation in subclasses, then you need to change the names of
> the private methods to start with only one _ character so that they
> won't be mangled by the compiler.
>
> The comments before each method that only contain the name of the
> immediately following method are redundant.
>
> aggregate: the default aggregator is unintuitive to me.  I would make
> it a required field and add a separate method called sum that calls
> aggregate with the operator.add aggregator.
> Also, the implementation doesn't look correct.  Instead of passing in
> each item to the aggregator, you're passing in the number of items
> seen so far?  The LINQ Aggregate method is basically reduce, so rather
> than reinvent the wheel I would suggest this:
>
> # MISSING is a unique object solely defined to represent missing arguments.
> # Unlike None we can safely assume it will never be passed as actual data.
> MISSING = object()
>
> def aggregate(self, aggregator, seed=MISSING):
>     if seed is self.MISSING:
>         return reduce(aggregator, self._iterable)
>     else:
>         return reduce(aggregator, self._iterable, seed)
>
> Note for compatibility that in Python 3 the reduce function has been
> demoted from a builtin to a member of the functools module.
>
> any: the name of this method could cause some confusion with the "any"
> builtin that does something a bit different.
>
> compare: the loop would more DRY as a for loop:
>
> def __compare(first, second, comparison):
>     for firstval, secondval in itertools.izip_longest(first, second,
> fillvalue=self.MISSING):
>         if firstval is self.MISSING:
>             return -1
>         elif secondval is self.MISSING:
>             return 1
>         else:
>             result = comparison(firstval, secondval)
>             if result != 0:
>                 return result
>     return 0
>
> concatenate: again, no need to reinvent the wheel.  This should be
> more efficient:
>
> def concatenate(self, other):
>     return extend(itertools.chain(self.__iterable, other))
>
> equals: could be just "return self.compare(other, comparison) == 0"
>
> __last: the loop could be a for loop:
>
>         # assume we're looking at the last item and try moving to the next
>         item = result.Value
>         for item in iterator: pass
>         return item
>
> lastOrDefault: there's a lot of repeated logic here.  This could just be:
>
> def lastOrDefault(self, default=None):
>     try:
>         return self.last()
>     except ValueError:
>         return default
>
> map / forEach: .NET has to separate these into separate methods due to
> static typing.  It seems a bit silly to have both of them in Python.
> Also, map would be more efficient as "return itertools.imap(mapper,
> self.__iterable)"
>
> max / min: it would be more efficient to use the builtin:
> def max(self, key):
>     return max(self.__iterable, key=key)
> If somebody really needs to pass a comparison function instead of a
> key function, they can use functools.cmp_to_key.
>
> randomSamples: a more canonical way to pass the RNG would be to pass
> an instance of random.Random rather than an arbitrary function. Then
> to get a random integer you can just call generator.randrange(total).
> Note that for the default you can just use the random module itself,
> which provides default implementations of all the Random methods.
> Also, for loops:
>
>     def __randomSamples(iterable, sampleCount, generator):
>         samples = []
>         iterator = iter(iterable)
>         # fill up the samples with the items from the beginning of the iterable
>         for i, value in zip(xrange(sampleCount), iterator):
>             samples.append(value)
>         # replace items if the generated number is less than the total
>         total = len(samples)
>         for value in iterator:
>             total += 1
>             index = generator.randrange(total)
>             if index < len(samples):
>                 samples[index] = result
>         return samples
>
> __reverse: you could just "return reversed(list(iterable))"
>
> __rotateLeft:
> def __rotateLeft(iterable, shift):
>     iterator = iter(iterable)
>     front = list(itertools.islice(iterator, shift))
>     return itertools.chain(iterator, front)
>
> skipWhile: suggest using itertools.dropwhile
> take: suggest using itertools.islice
> takeWhile: suggest using itertools.takewhile
> __toList: "return list(iterable)"
> __toSet: "return set(iterable)"
> __toTuple: "return tuple(iterable)".  Note that as currently written
> this actually returns a generator, not a tuple.
> __where: suggest using itertools.ifilter
> __zip: suggest using a for loop over itertools.izip(first, second)
>
> Lookup: is inconsistent.  The overridden __iter__ method returns an
> iterator over the values of the groups, but all the inherited methods
> are going to iterate over the keys.  Probably you want to pass
> groups.values() to the superclass __init__ method.
>
> Cheers,
> Ian

Awesome tips. I appreciate the time you spent commenting on just about
every function. I really like your suggestions about using itertools
more, and for loops. I was feeling like some things were becoming way
too complicated.

I also noted the bugs you discovered. I will incorporate your
suggestions.

Thanks again!

[toc] | [prev] | [next] | [standalone]

#12685

From	Travis Parks <jehugaleahsa@gmail.com>
Date	2011-09-02 18:39 -0700
Message-ID	<18fe4afd-569b-4580-a629-50f6c74829e8@c29g2000yqd.googlegroups.com>
In reply to	#12679

On Sep 2, 6:49 pm, Travis Parks <jehugalea...@gmail.com> wrote:
> On Sep 2, 4:09 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>
>
>
>
>
> > On Fri, Sep 2, 2011 at 10:59 AM, Travis Parks <jehugalea...@gmail.com> wrote:
> > > Hello:
>
> > > I am working on an algorithms library. It provides LINQ like
> > > functionality to Python iterators. Eventually, I plan on having
> > > feaures that work against sequences and mappings.
>
> > > I have the code up athttp://code.google.com/p/py-compass.
>
> > > This is my first project in Python, so I'd like some feedback. I want
> > > to know if I am following conventions (overall style and quality of
> > > code).
>
> > Sure, here are my comments.
>
> > In the "forever" and "__forever" functions, your use of the term
> > "generator" is confusing.  "__forever" is a generator function,
> > because it has a yield statement.  Its argument, called "generator",
> > appears to be a callable, not a generator or even necessarily a
> > generator function.  Also, note that __forever(lambda: value) is
> > functionally equivalent to the more efficient itertools.repeat(value).
>
> > The staticmethod __next(iterator) accesses the class it is defined in,
> > which suggests that it might be better written as a classmethod
> > __next(cls, iterator).
>
> > Each of the LINQ-style methods is divided into two parts: the public
> > method that contains the docstring and some argument checks, and a
> > private staticmethod that contains the implementation.  I'm not
> > certain what the purpose of that is.  If it's to facilitate overriding
> > the implementation in subclasses, then you need to change the names of
> > the private methods to start with only one _ character so that they
> > won't be mangled by the compiler.
>
> > The comments before each method that only contain the name of the
> > immediately following method are redundant.
>
> > aggregate: the default aggregator is unintuitive to me.  I would make
> > it a required field and add a separate method called sum that calls
> > aggregate with the operator.add aggregator.
> > Also, the implementation doesn't look correct.  Instead of passing in
> > each item to the aggregator, you're passing in the number of items
> > seen so far?  The LINQ Aggregate method is basically reduce, so rather
> > than reinvent the wheel I would suggest this:
>
> > # MISSING is a unique object solely defined to represent missing arguments.
> > # Unlike None we can safely assume it will never be passed as actual data.
> > MISSING = object()
>
> > def aggregate(self, aggregator, seed=MISSING):
> >     if seed is self.MISSING:
> >         return reduce(aggregator, self._iterable)
> >     else:
> >         return reduce(aggregator, self._iterable, seed)
>
> > Note for compatibility that in Python 3 the reduce function has been
> > demoted from a builtin to a member of the functools module.
>
> > any: the name of this method could cause some confusion with the "any"
> > builtin that does something a bit different.
>
> > compare: the loop would more DRY as a for loop:
>
> > def __compare(first, second, comparison):
> >     for firstval, secondval in itertools.izip_longest(first, second,
> > fillvalue=self.MISSING):
> >         if firstval is self.MISSING:
> >             return -1
> >         elif secondval is self.MISSING:
> >             return 1
> >         else:
> >             result = comparison(firstval, secondval)
> >             if result != 0:
> >                 return result
> >     return 0
>
> > concatenate: again, no need to reinvent the wheel.  This should be
> > more efficient:
>
> > def concatenate(self, other):
> >     return extend(itertools.chain(self.__iterable, other))
>
> > equals: could be just "return self.compare(other, comparison) == 0"
>
> > __last: the loop could be a for loop:
>
> >         # assume we're looking at the last item and try moving to the next
> >         item = result.Value
> >         for item in iterator: pass
> >         return item
>
> > lastOrDefault: there's a lot of repeated logic here.  This could just be:
>
> > def lastOrDefault(self, default=None):
> >     try:
> >         return self.last()
> >     except ValueError:
> >         return default
>
> > map / forEach: .NET has to separate these into separate methods due to
> > static typing.  It seems a bit silly to have both of them in Python.
> > Also, map would be more efficient as "return itertools.imap(mapper,
> > self.__iterable)"
>
> > max / min: it would be more efficient to use the builtin:
> > def max(self, key):
> >     return max(self.__iterable, key=key)
> > If somebody really needs to pass a comparison function instead of a
> > key function, they can use functools.cmp_to_key.
>
> > randomSamples: a more canonical way to pass the RNG would be to pass
> > an instance of random.Random rather than an arbitrary function. Then
> > to get a random integer you can just call generator.randrange(total).
> > Note that for the default you can just use the random module itself,
> > which provides default implementations of all the Random methods.
> > Also, for loops:
>
> >     def __randomSamples(iterable, sampleCount, generator):
> >         samples = []
> >         iterator = iter(iterable)
> >         # fill up the samples with the items from the beginning of the iterable
> >         for i, value in zip(xrange(sampleCount), iterator):
> >             samples.append(value)
> >         # replace items if the generated number is less than the total
> >         total = len(samples)
> >         for value in iterator:
> >             total += 1
> >             index = generator.randrange(total)
> >             if index < len(samples):
> >                 samples[index] = result
> >         return samples
>
> > __reverse: you could just "return reversed(list(iterable))"
>
> > __rotateLeft:
> > def __rotateLeft(iterable, shift):
> >     iterator = iter(iterable)
> >     front = list(itertools.islice(iterator, shift))
> >     return itertools.chain(iterator, front)
>
> > skipWhile: suggest using itertools.dropwhile
> > take: suggest using itertools.islice
> > takeWhile: suggest using itertools.takewhile
> > __toList: "return list(iterable)"
> > __toSet: "return set(iterable)"
> > __toTuple: "return tuple(iterable)".  Note that as currently written
> > this actually returns a generator, not a tuple.
> > __where: suggest using itertools.ifilter
> > __zip: suggest using a for loop over itertools.izip(first, second)
>
> > Lookup: is inconsistent.  The overridden __iter__ method returns an
> > iterator over the values of the groups, but all the inherited methods
> > are going to iterate over the keys.  Probably you want to pass
> > groups.values() to the superclass __init__ method.
>
> > Cheers,
> > Ian
>
> Awesome tips. I appreciate the time you spent commenting on just about
> every function. I really like your suggestions about using itertools
> more, and for loops. I was feeling like some things were becoming way
>too complicated.
>
> I also noted the bugs you discovered. I will incorporate your
> suggestions.
>
> Thanks again!- Hide quoted text -
>
> - Show quoted text -

You commented that the itertools algorithms will perform faster than
the hand-written ones. Are these algorithms optimized internally?

[toc] | [prev] | [next] | [standalone]

#12687

From	Chris Rebert <clp2@rebertia.com>
Date	2011-09-02 19:23 -0700
Message-ID	<mailman.723.1315016594.27778.python-list@python.org>
In reply to	#12685

On Fri, Sep 2, 2011 at 6:39 PM, Travis Parks <jehugaleahsa@gmail.com> wrote:
<snip>
> You commented that the itertools algorithms will perform faster than
> the hand-written ones. Are these algorithms optimized internally?

For one thing, they are written in C.

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]

#12690

From	Chris Torek <nospam@torek.net>
Date	2011-09-03 04:35 +0000
Message-ID	<j3sapp01jeh@news1.newsguy.com>
In reply to	#12685

In article <18fe4afd-569b-4580-a629-50f6c74829e8@c29g2000yqd.googlegroups.com>
Travis Parks  <jehugaleahsa@gmail.com> wrote:
>[Someone] commented that the itertools algorithms will perform
>faster than the hand-written ones. Are these algorithms optimized
>internally?

They are written in C, so avoid a lot of CPython interpreter
overhead.  Mileage in Jython, etc., may vary...
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

[toc] | [prev] | [next] | [standalone]

#12734

From	Travis Parks <jehugaleahsa@gmail.com>
Date	2011-09-03 20:35 -0700
Message-ID	<112f5d1d-fbf7-44da-a0dd-cbbeb3a59c6d@dq7g2000vbb.googlegroups.com>
In reply to	#12690

On Sep 3, 12:35 am, Chris Torek <nos...@torek.net> wrote:
> In article <18fe4afd-569b-4580-a629-50f6c7482...@c29g2000yqd.googlegroups.com>
> Travis Parks  <jehugalea...@gmail.com> wrote:
>
> >[Someone] commented that the itertools algorithms will perform
> >faster than the hand-written ones. Are these algorithms optimized
> >internally?
>
> They are written in C, so avoid a lot of CPython interpreter
> overhead.  Mileage in Jython, etc., may vary...
> --
> In-Real-Life: Chris Torek, Wind River Systems
> Intel require I note that my opinions are not those of WRS or Intel
> Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
> email: gmail (figure it out)      http://web.torek.net/torek/index.html

I thought I would point out that many of the itertools functions
change between 2.x and 3.x versions. Since 2.7 is supposed to be the
last 2.x language, I suppose I will wait until 3.2 becomes the norm
before I incorporate some of these changes. In the mean time, I will
starting working on algorithms that work against Sequences.

I think a really important lesson is that Python really doesn't need
an algorithms library, like many others do. A lot of the common
algorithms are supported by the syntax itself. All my library did was
allow for easier function composition.

[toc] | [prev] | [standalone]

csiph-web

Algorithms Library - Asking for Pointers

Contents

#12652 — Algorithms Library - Asking for Pointers

#12670

#12679

#12685

#12687

#12690

#12734