Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #105041 > unrolled thread

Bash-like pipes in Python

Started bySteven D'Aprano <steve@pearwood.info>
First post2016-03-17 01:57 +1100
Last post2016-03-17 19:20 +0200
Articles 16 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  Bash-like pipes in Python Steven D'Aprano <steve@pearwood.info> - 2016-03-17 01:57 +1100
    Re: Bash-like pipes in Python Joel Goldstick <joel.goldstick@gmail.com> - 2016-03-16 11:09 -0400
      Re: Bash-like pipes in Python Christian Gollwitzer <auriocus@gmx.de> - 2016-03-16 16:16 +0100
    Re: Bash-like pipes in Python Random832 <random832@fastmail.com> - 2016-03-16 11:20 -0400
      Re: Bash-like pipes in Python Steven D'Aprano <steve@pearwood.info> - 2016-03-17 21:58 +1100
        Re: Bash-like pipes in Python Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 13:49 +0200
          Re: Bash-like pipes in Python Sivan Greenberg <sivan@vitakka.co> - 2016-03-17 14:10 +0200
            Re: Bash-like pipes in Python Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 14:44 +0200
    Re: Bash-like pipes in Python "Sven R. Kunze" <srkunze@mail.de> - 2016-03-16 16:20 +0100
    Re: Bash-like pipes in Python Random832 <random832@fastmail.com> - 2016-03-16 11:21 -0400
    Re: Bash-like pipes in Python Steven D'Aprano <steve@pearwood.info> - 2016-03-17 02:39 +1100
    Re: Bash-like pipes in Python Marko Rauhamaa <marko@pacujo.net> - 2016-03-16 19:04 +0200
      Re: Bash-like pipes in Python Steven D'Aprano <steve@pearwood.info> - 2016-03-18 01:10 +1100
        Re: Bash-like pipes in Python Chris Angelico <rosuav@gmail.com> - 2016-03-18 01:36 +1100
        Re: Bash-like pipes in Python Random832 <random832@fastmail.com> - 2016-03-17 12:31 -0400
        Re: Bash-like pipes in Python Sivan Greenberg <sivan@vitakka.co> - 2016-03-17 19:20 +0200

#105041 — Bash-like pipes in Python

FromSteven D'Aprano <steve@pearwood.info>
Date2016-03-17 01:57 +1100
SubjectBash-like pipes in Python
Message-ID<56e97459$0$1600$c3e8da3$5496439d@news.astraweb.com>
There's a powerful technique used in shell-scripting languages like bash:
pipes. The output of one function is piped in to become the input to the
next function.

According to Martin Fowler, this was also used extensively in Smalltalk:

http://martinfowler.com/articles/collection-pipeline/

and can also be done in Ruby, using method chaining.

Here is a way to do functional-programming-like pipelines to collect and
transform values from an iterable:

https://code.activestate.com/recipes/580625-collection-pipeline-in-python/

For instance, we can take a string, extract all the digits, convert them to
ints, and finally multiply the digits to give a final result:

py> from operator import mul
py> "abcd12345xyz" | Filter(str.isdigit) | Map(int) | Reduce(mul)
120


(For the definitions of Filter, Map and Reduce, see the code at the
ActiveState recipe, linked above). In my opinion, this is much nicer
looking that the standard Python `filter`, `map` and `reduce`:

py> reduce(mul, map(int, filter(str.isdigit, "abcd12345xyz")))
120

as this requires the operations to be written in the opposite order to the
order that they are applied.



-- 
Steven

[toc] | [next] | [standalone]


#105044

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2016-03-16 11:09 -0400
Message-ID<mailman.214.1458140946.12893.python-list@python.org>
In reply to#105041
On Wed, Mar 16, 2016 at 10:57 AM, Steven D'Aprano <steve@pearwood.info>
wrote:

> There's a powerful technique used in shell-scripting languages like bash:
> pipes. The output of one function is piped in to become the input to the
> next function.
>
> According to Martin Fowler, this was also used extensively in Smalltalk:
>
> http://martinfowler.com/articles/collection-pipeline/
>
> and can also be done in Ruby, using method chaining.
>
> Here is a way to do functional-programming-like pipelines to collect and
> transform values from an iterable:
>
> https://code.activestate.com/recipes/580625-collection-pipeline-in-python/
>
> For instance, we can take a string, extract all the digits, convert them to
> ints, and finally multiply the digits to give a final result:
>
> py> from operator import mul
> py> "abcd12345xyz" | Filter(str.isdigit) | Map(int) | Reduce(mul)
> 120
>
>
> (For the definitions of Filter, Map and Reduce, see the code at the
> ActiveState recipe, linked above). In my opinion, this is much nicer
> looking that the standard Python `filter`, `map` and `reduce`:
>
> py> reduce(mul, map(int, filter(str.isdigit, "abcd12345xyz")))
> 120
>
> as this requires the operations to be written in the opposite order to the
> order that they are applied.
>
>
> This is interesting, but the part I'm missing is the use of the Pipe
symbol '|' in python.  Can you elaborate

>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
Joel Goldstick
http://joelgoldstick.com/ <http://joelgoldstick.com/stats/birthdays>
http://cc-baseballstats.info/

[toc] | [prev] | [next] | [standalone]


#105046

FromChristian Gollwitzer <auriocus@gmx.de>
Date2016-03-16 16:16 +0100
Message-ID<ncbt66$jo7$1@dont-email.me>
In reply to#105044
Am 16.03.16 um 16:09 schrieb Joel Goldstick:
> On Wed, Mar 16, 2016 at 10:57 AM, Steven D'Aprano <steve@pearwood.info>
> wrote:
>> py> from operator import mul
>> py> "abcd12345xyz" | Filter(str.isdigit) | Map(int) | Reduce(mul)
>> 120
>>
>> This is interesting, but the part I'm missing is the use of the Pipe
> symbol '|' in python.  Can you elaborate


It's an overloaded "or" operator

	def __ror__(self, iterable): ....

	
Christian

[toc] | [prev] | [next] | [standalone]


#105048

FromRandom832 <random832@fastmail.com>
Date2016-03-16 11:20 -0400
Message-ID<mailman.217.1458141621.12893.python-list@python.org>
In reply to#105041

On Wed, Mar 16, 2016, at 10:57, Steven D'Aprano wrote:
> For instance, we can take a string, extract all the digits, convert them
> to
> ints, and finally multiply the digits to give a final result:
> 
> py> from operator import mul
> py> "abcd12345xyz" | Filter(str.isdigit) | Map(int) | Reduce(mul)
> 120

How about:

from functools import partial, reduce
from operator import mul
def rcall(arg, func): return func(arg)
def fpipe(*args): return reduce(rcall, args)
pfilter = partial(partial, filter)
pmap = partial(partial, map)
preduce = partial(partial, reduce)

fpipe("abcd12345xyz", pfilter(str.isdigit), pmap(int), preduce(mul))

> (For the definitions of Filter, Map and Reduce, see the code at the
> ActiveState recipe, linked above). In my opinion, this is much nicer
> looking that the standard Python `filter`, `map` and `reduce`:
> 
> py> reduce(mul, map(int, filter(str.isdigit, "abcd12345xyz")))
> 120
> 
> as this requires the operations to be written in the opposite order to
> the
> order that they are applied.
> 
> 
> 
> -- 
> Steven
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#105083

FromSteven D'Aprano <steve@pearwood.info>
Date2016-03-17 21:58 +1100
Message-ID<56ea8ded$0$1614$c3e8da3$5496439d@news.astraweb.com>
In reply to#105048
On Thu, 17 Mar 2016 02:20 am, Random832 wrote:

> How about:
> 
> from functools import partial, reduce
> from operator import mul
> def rcall(arg, func): return func(arg)
> def fpipe(*args): return reduce(rcall, args)
> pfilter = partial(partial, filter)
> pmap = partial(partial, map)
> preduce = partial(partial, reduce)
> 
> fpipe("abcd12345xyz", pfilter(str.isdigit), pmap(int), preduce(mul))

Intriguing! Thank you for the suggestion.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#105085

FromMarko Rauhamaa <marko@pacujo.net>
Date2016-03-17 13:49 +0200
Message-ID<87mvpxb9zi.fsf@elektro.pacujo.net>
In reply to#105083
Steven D'Aprano <steve@pearwood.info>:

> On Thu, 17 Mar 2016 02:20 am, Random832 wrote:
>> fpipe("abcd12345xyz", pfilter(str.isdigit), pmap(int), preduce(mul))
>
> Intriguing! Thank you for the suggestion.

Still want the pipeline syntax!


Marko

[toc] | [prev] | [next] | [standalone]


#105089

FromSivan Greenberg <sivan@vitakka.co>
Date2016-03-17 14:10 +0200
Message-ID<mailman.271.1458216615.12893.python-list@python.org>
In reply to#105085
If I understand correctly, the binary right or overloading that's seen here
can be applied to any other computational objects.

I could also think of implementing it for input / output pipes overloading
the __ror__ method with .communicate() method of the Popen object [0].

-Sivan

[0]: https://docs.python.org/2/library/subprocess.html#popen-objects

On Thu, Mar 17, 2016 at 1:49 PM, Marko Rauhamaa <marko@pacujo.net> wrote:

> Steven D'Aprano <steve@pearwood.info>:
>
> > On Thu, 17 Mar 2016 02:20 am, Random832 wrote:
> >> fpipe("abcd12345xyz", pfilter(str.isdigit), pmap(int), preduce(mul))
> >
> > Intriguing! Thank you for the suggestion.
>
> Still want the pipeline syntax!
>
>
> Marko
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
Sivan Greenberg
Co founder & CTO
Vitakka Consulting

[toc] | [prev] | [next] | [standalone]


#105090

FromMarko Rauhamaa <marko@pacujo.net>
Date2016-03-17 14:44 +0200
Message-ID<87d1qtb7fr.fsf@elektro.pacujo.net>
In reply to#105089
Sivan Greenberg <sivan@vitakka.co>:

> If I understand correctly, the binary right or overloading that's seen
> here can be applied to any other computational objects.
>
> I could also think of implementing it for input / output pipes
> overloading the __ror__ method with .communicate() method of the Popen
> object [0].

I'm thinking it's all of the above and more!

At the bottom is the generic pipelining of generators.

Then there's the special case of integrating external commands with the
generic framework:

    cmd('cat /etc/passwd', stdin=None) | List

(Is "stdin=None" needed?)

Then there's the enriching of data formats. The line-based delineation
is old-school and unsafe, and should be considered for legacy only. The
internal data format is arbitrary Python object sequences, but
externally, JSON should be preferred. Thus, we'll need converters
from/to JSON.

What is still missing is the true generator lambdas:

    produce_data | (
        lambda name, value:
            yield value) | \
        consume_values

as in bash:

    produce_data |
    while read name value; do
        echo $value
    done |
    consume_values


Marko

[toc] | [prev] | [next] | [standalone]


#105049

From"Sven R. Kunze" <srkunze@mail.de>
Date2016-03-16 16:20 +0100
Message-ID<mailman.218.1458141657.12893.python-list@python.org>
In reply to#105041
On 16.03.2016 16:09, Joel Goldstick wrote:
> symbol '|' in python.  Can you elaborate

bitwise or

[toc] | [prev] | [next] | [standalone]


#105050

FromRandom832 <random832@fastmail.com>
Date2016-03-16 11:21 -0400
Message-ID<mailman.219.1458141706.12893.python-list@python.org>
In reply to#105041
On Wed, Mar 16, 2016, at 11:09, Joel Goldstick wrote:
> > This is interesting, but the part I'm missing is the use of the Pipe
> symbol '|' in python.  Can you elaborate

His "Filter", "Map", and "Reduce" are classes which define __ror__
methods, obviously.

[toc] | [prev] | [next] | [standalone]


#105051

FromSteven D'Aprano <steve@pearwood.info>
Date2016-03-17 02:39 +1100
Message-ID<56e97e46$0$1591$c3e8da3$5496439d@news.astraweb.com>
In reply to#105041
On Thu, 17 Mar 2016 02:22 am, Omar Abou Mrad wrote:

> Would be nice if this was possible:
> 
>>>> get_digits = Filter(str.isdigit) | Map(int)
>>>> 'kjkjsdf399834' | get_digits


Yes it would. I'll work on that.


> Also, how about using '>>' instead of '|' for "Forward chaining"

Any particular reason you prefer >> over | as the operator?




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#105053

FromMarko Rauhamaa <marko@pacujo.net>
Date2016-03-16 19:04 +0200
Message-ID<87egbacq2j.fsf@elektro.pacujo.net>
In reply to#105041
Steven D'Aprano <steve@pearwood.info>:

> Here is a way to do functional-programming-like pipelines to collect
> and transform values from an iterable:
>
> https://code.activestate.com/recipes/580625-collection-pipeline-in-python/

Nice. The other day we talked about Python replacing bash. Pipelining is
a big step in that direction.

Note also the Scheme Shell (scsh):
URL: https://scsh.net/docu/html/man-Z-H-3.html>.

Question: Could the generators define __repr__ so you wouldn't need to
terminate the pipeline with "List" in interactive use?


Marko

[toc] | [prev] | [next] | [standalone]


#105092

FromSteven D'Aprano <steve@pearwood.info>
Date2016-03-18 01:10 +1100
Message-ID<56eabae8$0$1599$c3e8da3$5496439d@news.astraweb.com>
In reply to#105053
On Thu, 17 Mar 2016 04:04 am, Marko Rauhamaa wrote:

> Steven D'Aprano <steve@pearwood.info>:
> 
>> Here is a way to do functional-programming-like pipelines to collect
>> and transform values from an iterable:
>>
>>
https://code.activestate.com/recipes/580625-collection-pipeline-in-python/
> 
> Nice. The other day we talked about Python replacing bash. Pipelining is
> a big step in that direction.
> 
> Note also the Scheme Shell (scsh):
> URL: https://scsh.net/docu/html/man-Z-H-3.html>.
> 
> Question: Could the generators define __repr__ so you wouldn't need to
> terminate the pipeline with "List" in interactive use?


Short answer: no.

Long answer: well, technically it could be possible, but not the way it is
written at the moment.

At the moment, the data being processed by the Map, Filter, etc. are
ordinary lists or iterators. In order to give them a customer __repr__, I
would have to change the Map and Filter __ror__ method to return some
custom type which behaves as an iterable but has the appropriate __repr__.
I don't want to do that: I want the pipeline functions to return ordinary
lists or iterators, whichever is appropriate.

There's also the problem that __repr__ shouldn't mutate an object. Suppose
we did give iterators a __repr__ that displays their content. That would
exhaust the iterator, and you would have something like this:


it = iter([1, 2, 3])
repr(it)
=> prints "[1; 2; 3]"
repr(it)  # iterator is now exhausted
=> prints "[]"


I don't think this is a good idea.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#105094

FromChris Angelico <rosuav@gmail.com>
Date2016-03-18 01:36 +1100
Message-ID<mailman.272.1458225402.12893.python-list@python.org>
In reply to#105092
On Fri, Mar 18, 2016 at 1:10 AM, Steven D'Aprano <steve@pearwood.info> wrote:
> At the moment, the data being processed by the Map, Filter, etc. are
> ordinary lists or iterators. In order to give them a customer __repr__, I
> would have to change the Map and Filter __ror__ method to return some
> custom type which behaves as an iterable but has the appropriate __repr__.
> I don't want to do that: I want the pipeline functions to return ordinary
> lists or iterators, whichever is appropriate.

They don't have to be iterators, just iterables, right? They could
return a one of these:

class QuantumList:
    def __init__(self, iterable):
        self.iter = iter(iterable)
        self.list = []
    def __iter__(self):
        yield from self.list
        while "moar stuff":
            try: val = next(self.iter)
            except StopIteration: return
            self.list.append(val)
            yield val
    def __repr__(self):
        self.list.extend(self.iter)
        return repr(self.list)

This object has a generator/list duality, but if you observe it, it
collapses to a list. When used interactively, it'd be pretty much the
same as calling list() as the last step, but in a script, they'd
operate lazily.

Quantum computing is here already!

ChrisA

[toc] | [prev] | [next] | [standalone]


#105114

FromRandom832 <random832@fastmail.com>
Date2016-03-17 12:31 -0400
Message-ID<mailman.277.1458232294.12893.python-list@python.org>
In reply to#105092

On Thu, Mar 17, 2016, at 10:36, Chris Angelico wrote:
> This object has a generator/list duality, but if you observe it, it
> collapses to a list. When used interactively, it'd be pretty much the
> same as calling list() as the last step, but in a script, they'd
> operate lazily.
> 
> Quantum computing is here already!

Might as well add the sequence protocol while we're at it.

[toc] | [prev] | [next] | [standalone]


#105123

FromSivan Greenberg <sivan@vitakka.co>
Date2016-03-17 19:20 +0200
Message-ID<mailman.279.1458235266.12893.python-list@python.org>
In reply to#105092
++1 !

On Thu, Mar 17, 2016 at 6:31 PM, Random832 <random832@fastmail.com> wrote:

>
>
> On Thu, Mar 17, 2016, at 10:36, Chris Angelico wrote:
> > This object has a generator/list duality, but if you observe it, it
> > collapses to a list. When used interactively, it'd be pretty much the
> > same as calling list() as the last step, but in a script, they'd
> > operate lazily.
> >
> > Quantum computing is here already!
>
> Might as well add the sequence protocol while we're at it.
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
Sivan Greenberg
Co founder & CTO
Vitakka Consulting

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web